Skip to main navigation Skip to search Skip to main content

MPS: improving exact string matching through pattern character frequency

    Research output: Contribution to journalArticlepeer-review

    Abstract

    One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and establish its relevance. Intrinsic to this big data analysis problem is the need for tools to scale well in proportion to the growth of data and have the flexibility to operate across disparate datasets. Exact string matching is a fundamental tool used to search through data, though many existing algorithms do not scale well since their computational cost occurs and grows during reading of data. A proof of concept is presented in this paper which transfers all the required calculations to a pre-processing stage, removing all calculations during the reading stage; creating a search trie which incorporates the statistical distribution of characters in the search pattern to reduce overall calculation and lookups in a search. The resulting algorithm produces a total calculation costs which is independent of data (source) size. Preliminary results shows the new algorithm out performing existing general algorithms, as the search pattern becomes large for natural English text and when searching a small alphabet source (DNA).
    Original languageEnglish
    Pages (from-to)127-137
    JournalJournal of Data Processing
    Volume3
    Issue number3
    Publication statusPublished - Sept 2013

    Fingerprint

    Dive into the research topics of 'MPS: improving exact string matching through pattern character frequency'. Together they form a unique fingerprint.

    Cite this