SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors
Researchers have introduced SemanticZip, a novel framework for lossy text compression that leverages Large Language Models (LLMs) for decompression. This approach focuses on recovering task-relevant semantic meaning rather than exact byte-for-byte reconstruction. The pilot study evaluated six representation methods, finding that structured prose offered the highest recoverability, while a SemanticZip ASCII representation achieved the most significant compression with acceptable semantic recovery. AI
IMPACT Introduces a new method for compressing text data for LLMs, potentially reducing storage and transmission costs.