ENTITY tf–idf

tf–idf

PulseAugur coverage of tf–idf — every cluster mentioning tf–idf across labs, papers, and developer communities, ranked by signal.

Total · 30d

22

22 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

20

20 over 90d

TIER MIX · 90D

research 13
tool 8
commentary 1

TOPICS

RELATIONSHIPS

used by Martin Clinton Tosima Manullang 70%

SENTIMENT · 30D

8 day(s) with sentiment data

RECENT · PAGE 1/2 · 22 TOTAL

TOOL · CL_106487 · Jun 21 · 21:05

Recall tool offers local, private project memory for Claude Code

Recall is a new, locally-run project memory tool designed for Claude Code users. It addresses the issue of Claude Code starting each session without context by creating a condensed summary of past interactions. This sum…
RESEARCH · CL_106008 · Jun 19 · 16:43

New ASR techniques tackle phonetic errors and judge reliability

Researchers are developing advanced methods to improve Automatic Speech Recognition (ASR) systems, particularly for low-resource languages and to address specific types of errors. One approach, Error-Aware TF-IDF, uses …
TOOL · CL_97957 · Jun 18 · 04:14

Recall tool adds local, private memory to Claude Code sessions

A new tool called Recall has been developed to provide Claude Code with persistent, local project memory. Unlike other tools that send data to external models, Recall uses a classical Python summarizer (TF-IDF and TextR…
COMMENTARY · CL_90216 · Jun 14 · 14:04

LLMs: From Text Processing to Semiotics and Linguistic Layers

This cluster explores the linguistic and computational underpinnings of Large Language Models (LLMs). It delves into how computers process text, moving from basic tokenization and statistical methods like TF-IDF and Mar…
RESEARCH · CL_93522 · Jun 13 · 19:46

AI models improve healthcare data binding for prior authorization

A new research paper explores methods for binding Fast Healthcare Interoperability Resources (FHIR) Questionnaire items with Logical Observation Identifiers Names and Codes (LOINC) to improve electronic prior authorizat…
TOOL · CL_82390 · Jun 10 · 05:21

Embedding drift degrades dense retrieval performance by 14%

A recent experiment explored how embedding drift impacts retrieval system performance, particularly when new terminology emerges in a domain. The study simulated a scenario where a retrieval system trained on older mach…
RESEARCH · CL_79125 · Jun 7 · 01:41

New LLM steganography methods bypass text, activation defenses

Researchers have identified novel methods for embedding hidden messages within Large Language Models (LLMs) that bypass traditional text-based security measures. One technique involves transporting payloads as structure…
RESEARCH · CL_76818 · Jun 5 · 03:46

Expanded dataset boosts transformer models in smishing detection

Researchers have developed COVA-X, an expanded dataset containing 10,985 synthetic conversations designed to detect multi-turn smishing attacks, particularly those targeting the elderly. This new dataset, an improvement…
RESEARCH · CL_56161 · May 26 · 20:16

New AI System Enhances Job Recommendations with Semantic Retrieval

Researchers have developed a new job recommendation system that leverages both keyword-based and semantic retrieval techniques to improve accuracy. The system utilizes structured metadata such as job title, company, and…
TOOL · CL_46096 · May 23 · 16:29

Small TF-IDF classifier beats large fine-tuned model on tweet classification

A smaller, 1.9 MB classifier model, utilizing TF-IDF and Logistic Regression, outperformed a larger, 269 MB fine-tuned model in classifying customer support tweets. The smaller model achieved this by focusing on efficie…
RESEARCH · CL_44804 · May 18 · 17:59

AI struggles with nuanced tasks like peer review and expert identification

Two new research papers explore the limitations of current AI models in specialized academic tasks. One study, Sem-Detect, proposes a method to distinguish AI-generated peer reviews from human-written ones by analyzing …
TOOL · CL_36553 · May 15 · 07:02

LLMs show promise for patient inquiry triage, but not autonomous deployment

Researchers have explored the use of few-shot large language models for categorizing online patient inquiries, aiming to improve clinical triage. They compared prompted LLMs against traditional methods like TF-IDF and B…
TOOL · CL_25581 · May 8 · 14:31

Hybrid model achieves strong Indonesian sentiment analysis results

Researchers have developed a hybrid approach for Indonesian sentiment analysis, combining TF-IDF text features with logistic regression and a neural network baseline. The study focused on classifying social media text i…
TOOL · CL_25609 · May 8 · 05:34

New defense framework tackles multilingual prompt injection attacks

Researchers have developed MIPIAD, a defense framework to combat indirect prompt injection attacks in multilingual large language model systems. The framework combines a Qwen2.5-1.5B model fine-tuned with LoRA, TF-IDF l…
RESEARCH · CL_20612 · May 6 · 13:20

XGBoost algorithm predicts e-commerce customer satisfaction from YouTube comments

This research paper introduces a predictive model for customer satisfaction using the XGBoost algorithm and TF-IDF vectorization on YouTube comments from Indonesian e-commerce review videos. The study found that the PyC…
RESEARCH · CL_20610 · May 6 · 13:20

CNN-BiLSTM outperforms AutoML for Indonesian Twitter hate speech detection

This paper compares PyCaret AutoML and a CNN-BiLSTM model for detecting hate speech on Indonesian Twitter. The CNN-BiLSTM model achieved superior performance, with an accuracy of 83.8% and an F1-score of 81.2%, outperfo…
TOOL · CL_15855 · May 5 · 04:00

Researchers use BiLSTM with attention to improve game review sentiment analysis

Researchers have developed an attention-based Bidirectional Long Short-Term Memory (BiLSTM) model to improve sentiment classification of Steam game reviews. This deep learning approach, implemented in PyTorch, was train…
RESEARCH · CL_15895 · May 4 · 09:44

Hungarian student essays automatically classified for reflection levels using ML

Researchers have developed a system for automatically classifying reflection levels in Hungarian student essays, addressing a gap in automated analysis for the language. The study utilized a dataset of 1,954 essays, exp…
RESEARCH · CL_11454 · Apr 30 · 05:25

Indonesian students show positive sentiment towards AI in higher education

A new study analyzed Indonesian student sentiment regarding AI adoption in higher education, comparing traditional machine learning with Transformer-based deep learning models. The research utilized a dataset of 2,295 l…
RESEARCH · CL_09831 · Apr 29 · 02:14

Study compares AutoML and BiLSTM for Indonesian Instagram cyberbullying detection

This research paper compares automated machine learning (AutoML) and Bidirectional Long Short-Term Memory (BiLSTM) models for detecting cyberbullying in Indonesian Instagram comments. The study utilized a dataset of 650…