PulseAugur
LIVE 13:01:08
tool · [1 source] ·
6
tool

New statistical features improve string similarity computation

Researchers have proposed and studied new statistical features, co-occurrence matrix (COM) and run-length matrix (RLM), for computing string similarity. These features, adapted from visual computing, are language-agnostic and perform well across various contexts including words, phrases, and code. Experiments showed that COM and RLM features outperformed existing state-of-the-art statistical measures, including edit distances and longest common subsequence, on both synthetic datasets and a real text plagiarism dataset. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces novel statistical features that could enhance natural language processing tasks requiring string comparison.

RANK_REASON Academic paper proposing new statistical features for string similarity. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Panos Liatsis ·

    Proposal and study of statistical features for string similarity computation and classification

    Adaptations of features commonly applied in the field of visual computing, co-occurrence matrix (COM) and run-length matrix (RLM), are proposed for the similarity computation of strings in general (words, phrases, codes and texts). The proposed features are not sensitive to langu…