Semi-Lossless Text Compression: a Case Study

Bruno Carpentieri

AUTHOR(S):

Bruno Carpentieri

TITLE

Semi-Lossless Text Compression: a Case Study

PDF

ABSTRACT

Text compression is generally considered only as lossless compression. Kaufman and Klein in [1] introduce the idea of semi-lossless text compression: the decompressed text will not be identical to the original text, but, just as for a decompressed JPEG image of good quality that is not identical to the original but can be used in the place of the original in many applications, our brain will adjust the data to make it usable and understandable. In this paper we experiment with semi-lossless compression on a case study of small text files in Italian language.

KEYWORDS

Text Compression, Lossless Compression, Semi-Lossless Compression

REFERENCES

[1] Y. Kaufman and S. T. Klein, “Semi-lossless text compression”, International Journal of Foundations of Computer Science, Vol. 16, No. 6, 2005, pp. 1167-1178.

[2] I. H. Witten, A. Moffat, and T. Bell, Managing Gigabytes. NY: Van Nostrand Reinhold, 1994.

[3] The Gzip home page, www.gzip.org.

[4] Bzip2: home, www.bzip.org.

Cite this paper

Bruno Carpentieri. (2016) Semi-Lossless Text Compression: a Case Study. International Journal of Computers, 1, 130-134