MimeLens: Position-Agnostic Content-Type Detection for Binary Fragments
Researchers have developed MimeLens, a new system designed to accurately identify the content type of binary data fragments, even when they lack headers or are sampled from arbitrary positions within a file. Unlike previous methods that require whole-file access, MimeLens utilizes BERT-style encoders pretrained on randomly sampled binary chunks. This approach significantly outperforms existing tools like Magika and libmagic on challenging datasets, including mid-stream network packets and random disk blocks, though it comes with a higher latency cost on CPUs. AI
IMPACT Enhances data analysis in security and forensics by enabling content-type detection on fragmented binary data.