Lempel-Ziv Parsing for Sequences of Blocks

Kosolobov, D.; Valenzuela, D.

doi:10.3390/a14120359

Please use this identifier to cite or link to this item: http://elar.urfu.ru/handle/10995/111669

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kosolobov, D.	en
dc.contributor.author	Valenzuela, D.	en
dc.date.accessioned	2022-05-12T08:20:18Z	-
dc.date.available	2022-05-12T08:20:18Z	-
dc.date.issued	2021	-
dc.identifier.citation	Kosolobov D. Lempel-Ziv Parsing for Sequences of Blocks / D. Kosolobov, D. Valenzuela // Algorithms. — 2021. — Vol. 14. — Iss. 12. — 359.	en
dc.identifier.issn	1999-4893	-
dc.identifier.other	All Open Access, Gold, Green	3
dc.identifier.uri	http://elar.urfu.ru/handle/10995/111669	-
dc.description.abstract	The Lempel-Ziv parsing (LZ77) is a widely popular construction lying at the heart of many compression algorithms. These algorithms usually treat the data as a sequence of bytes, i.e., blocks of fixed length 8. Another common option is to view the data as a sequence of bits. We investigate the following natural question: what is the relationship between the LZ77 parsings of the same data interpreted as a sequence of fixed-length blocks and as a sequence of bits (or other “elementary” letters)? In this paper, we prove that, for any integer b > 1, the number z of phrases in the LZ77 parsing of a string of length n and the number zb of phrases in the LZ77 parsing of the same string in which blocks of length b are interpreted as separate letters (e.g., b = 8 in case of bytes) are related as zb = O(bz lognz ). The bound holds for both “overlapping” and “non-overlapping” versions of LZ77. Further, we establish a tight bound zb = O(bz) for the special case when each phrase in the LZ77 parsing of the string has a “phrase-aligned” earlier occurrence (an occurrence equal to the concatenation of consecutive phrases). The latter is an important particular case of parsing produced, for instance, by grammar-based compression methods. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.	en
dc.description.sponsorship	Funding: This research was funded by the Ministry of Science and Higher Education of the Russian Federation (Ural Mathematical Center project No. 075-02-2021-1387).	en
dc.format.mimetype	application/pdf	en
dc.language.iso	en	en
dc.publisher	MDPI	en1
dc.publisher	MDPI AG	en
dc.rights	info:eu-repo/semantics/openAccess	en
dc.source	Algorithms	2
dc.source	Algorithms	en
dc.subject	BLOCKS	en
dc.subject	GRAMMAR	en
dc.subject	LEMPEL-ZIV	en
dc.subject	LZ77	en
dc.subject	SLP	en
dc.subject	BLOCK	en
dc.subject	COMPRESSION ALGORITHMS	en
dc.subject	COMPRESSION METHODS	en
dc.subject	GRAMMAR	en
dc.subject	GRAMMAR-BASED COMPRESSION	en
dc.subject	LEMPEL-ZIV	en
dc.subject	LZ77	en
dc.subject	SLP	en
dc.subject	TIGHT BOUND	en
dc.title	Lempel-Ziv Parsing for Sequences of Blocks	en
dc.type	Article	en
dc.type	info:eu-repo/semantics/article	en
dc.type	info:eu-repo/semantics/publishedVersion	en
dc.identifier.doi	10.3390/a14120359	-
dc.identifier.scopus	85121426311	-
local.contributor.employee	Kosolobov, D., Department of Physical and Mathematical Sciences, Ural Federal University, Ekaterinburg, 620000, Russian Federation; Valenzuela, D., Department of Computer Science, University of Helsinki, Helsinki, FI-00014, Finland	en
local.issue	12	-
local.volume	14	-
dc.identifier.wos	000913789500001	-
local.contributor.department	Department of Physical and Mathematical Sciences, Ural Federal University, Ekaterinburg, 620000, Russian Federation; Department of Computer Science, University of Helsinki, Helsinki, FI-00014, Finland	en
local.identifier.pure	29146409	-
local.description.order	359	-
local.identifier.eid	2-s2.0-85121426311	-
local.identifier.wos	WOS:000913789500001	-
Appears in Collections:	Научные публикации ученых УрФУ, проиндексированные в SCOPUS и WoS CC

Files in This Item:

File	Description	Size	Format
2-s2.0-85121426311.pdf		330,09 kB	Adobe PDF	View/Open

Show simple item record