Bruno Carpentieri



Semi-Lossless Text Compression: a Case Study

pdf PDF


Text compression is generally considered only as lossless compression. Kaufman and Klein in [1] introduce the idea of semi-lossless text compression: the decompressed text will not be identical to the original text, but, just as for a decompressed JPEG image of good quality that is not identical to the original but can be used in the place of the original in many applications, our brain will adjust the data to make it usable and understandable. In this paper we experiment with semi-lossless compression on a case study of small text files in Italian language.


Text Compression, Lossless Compression, Semi-Lossless Compression


[1] Y. Kaufman and S. T. Klein, “Semi-lossless text compression”, International Journal of Foundations of Computer Science, Vol. 16, No. 6, 2005, pp. 1167-1178.

[2] I. H. Witten, A. Moffat, and T. Bell, Managing Gigabytes. NY: Van Nostrand Reinhold, 1994.

[3] The Gzip home page, www.gzip.org.

[4] Bzip2: home, www.bzip.org.

Cite this paper

Bruno Carpentieri. (2016) Semi-Lossless Text Compression: a Case Study. International Journal of Computers, 1, 130-134


Copyright © 2017 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0