Quasilinear-Time Search and Comparison for Sequential Data

Ilhan Karić; Zanin Vejzović

AUTHOR(S):

Ilhan Karić, Zanin Vejzović

TITLE

Quasilinear-Time Search and Comparison for Sequential Data

PDF

ABSTRACT

This paper proposes a new algorithm for the evaluation of similarity between two sequences in quasilinear time. It describes the theoretical, practical and implementational aspects of the algorithm. The proposed method is a new approach dedicated to the computation of sequential similarity in contrast to other methods like the Jaccard Index which although designed for the computation of similarity of sets have been frequently used on sequences. The method is generalizable and applicable to any form of sequential data of a finite alphabet (binary files, DNA sequences, natural language etc.)

KEYWORDS

Sequence Similarity, Comparison, Contextual Similarity, Quasilinear-Time Complexity

REFERENCES

[1] X1. Konrad Rieck, Pavel Laskov, “Linear-Time Computation of Similarity Measures for Sequential Data”, Journal of Machine Learning Research 9 (2008) 23-48 pp. 1

[2] S. T. Piantadosi, “Zipf’s word frequency law in natural language: A critical review and future directions”, Psychonomic Bulletin & Review, vol. 21, 2014, pp. 1112-1130

Cite this paper

Ilhan Karić, Zanin Vejzović. (2017) Quasilinear-Time Search and Comparison for Sequential Data. International Journal of Computers, 2, 161-165