PulseAugur
EN
LIVE 11:54:51

New research probes LLM context understanding and confidence calibration

Researchers are developing new methods to evaluate and enhance Large Language Models (LLMs). Apple's research proposes a benchmark to test LLMs' understanding of context, finding that quantized models and pre-trained dense models struggle with nuanced contextual features. Meanwhile, a new technique called Retrieval-Augmented Linguistic Calibration (RALC) improves how LLMs express confidence in their answers, enhancing faithfulness and calibration. Other research explores LLMs for clinical action extraction, demonstrating comparable performance to supervised models but highlighting limitations in clinical reasoning, and introduces Listwise Policy Optimization for more stable and diverse LLM training. AI

IMPACT New benchmarks and calibration techniques aim to improve LLM reliability and reasoning, potentially impacting their application in critical domains like healthcare and scientific discovery.

RANK_REASON The cluster contains multiple academic papers and research initiatives focused on evaluating and improving LLM capabilities.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 113 sources. How we write summaries →

New research probes LLM context understanding and confidence calibration

COVERAGE [113]

  1. Apple Machine Learning Research TIER_1 English(EN) ·

    Can Large Language Models Understand Context?

    Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language …

  2. Hugging Face Blog TIER_1 English(EN) ·

    PRX Part 3 — Training a Text-to-Image Model in 24h!

  3. Hugging Face Blog TIER_1 English(EN) ·

    Training Design for Text-to-Image Models: Lessons from Ablations

  4. Hugging Face Blog TIER_1 English(EN) ·

    Introducing HELMET: Holistically Evaluating Long-context Language Models

  5. Hugging Face Blog TIER_1 English(EN) ·

    A Dive into Text-to-Video Models

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    Retrieval-Augmented Linguistic Calibration

    Linguistic cues such as "I believe" and "probably" offer an intuitive interface for communicating confidence, yet a generalisable, principled calibration framework for linguistic confidence expressions remains underexplored. In particular, co-occurring linguistic cues, contextual…

  7. arXiv cs.CL TIER_1 English(EN) · Chang Xu ·

    Retrieval-Augmented Linguistic Calibration

    Linguistic cues such as "I believe" and "probably" offer an intuitive interface for communicating confidence, yet a generalisable, principled calibration framework for linguistic confidence expressions remains underexplored. In particular, co-occurring linguistic cues, contextual…

  8. arXiv cs.LG TIER_1 English(EN) · Yun Qu, Qi Wang, Yixiu Mao, Heming Zou, Yuhang Jiang, Yingyue Li, Wutong Xu, Lizhou Cai, Weijie Liu, Clive Bai, Kai Yang, Yangkun Chen, Saiyong Yang, Xiangyang Ji ·

    Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

    arXiv:2605.06139v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent,…

  9. arXiv cs.AI TIER_1 English(EN) · Shivali Dalmia, Ananya Mantravadi, Prasanna Desikan ·

    Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction

    arXiv:2605.06191v1 Announce Type: new Abstract: The work in this paper evaluates zero-shot and few-shot large language models (LLMs) for safety-critical clinical action extraction using the CLIP discharge-note dataset, with particular emphasis on transitions of care and post-disc…

  10. arXiv cs.AI TIER_1 English(EN) · Aritra Roy, Kevin Shen, Andrew MacBride, Awwal Oladipupo, Mudassra Taskeen, Wojtek Treyde, Ruaa A. E. A. Abakar, Ahmad D. Abbas, Elsayed Abdelfatah, Abbas A. Abdullahi, Seham S. Abyah, Chahd Rahyl Adjmi, Fariha Agbere, Savyasanchi Aggarwal, Muhammad Ahmed ·

    From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

    arXiv:2605.03205v1 Announce Type: cross Abstract: Large language models (LLMs) are rapidly changing how researchers in materials science and chemistry discover, organize, and act on scientific knowledge. This paper analyzes a broad set of community-developed LLM applications in a…

  11. arXiv cs.CL TIER_1 English(EN) · Sasha Boguraev, Qing Yao, Kyle Mahowald ·

    France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions

    arXiv:2602.23547v2 Announce Type: replace Abstract: Sentences like "She will go to France or Spain, or perhaps to Germany or France." appear formally redundant, yet become acceptable in contexts such as "Mary will go to a philosophy program in France or Spain, or a mathematics pr…

  12. arXiv cs.CL TIER_1 English(EN) · Arnault Chatelain, \'Etienne Ollion, Qianwen Guan, Diandra Fabre, Lorraine Goeuriot, Emile Chapuis, Abdelkrim Beloued, Marie Candito, Nicolas Herv\'e, Didier Schwab ·

    BenCSSmark: Making the Social Sciences Count in LLM Research

    arXiv:2605.04886v1 Announce Type: new Abstract: This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing com…

  13. arXiv cs.CL TIER_1 English(EN) · Didier Schwab ·

    BenCSSmark: Making the Social Sciences Count in LLM Research

    This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pivotal in the develop…

  14. arXiv cs.CL TIER_1 English(EN) · Mengchu Li, Jin Zhu, Jinglai Li, Chengchun Shi ·

    Segmenting Human-LLM Co-authored Text via Change Point Detection

    arXiv:2605.03723v1 Announce Type: new Abstract: The rise of large language models (LLMs) has created an urgent need to distinguish between human-written and LLM-generated text to ensure authenticity and societal trust. Existing detectors typically provide a binary classification …

  15. arXiv cs.AI TIER_1 (CA) · \"Onder G\"urcan, Moharram Challenger ·

    LLM-enabled Social Agents

    arXiv:2605.02335v1 Announce Type: cross Abstract: Large Language Models (LLMs) have transformed agent-agent and human-agent interaction by enabling software, physical, and simulation agents to communicate and deliberate through natural language. Yet fluent language use does not b…

  16. arXiv cs.AI TIER_1 English(EN) · Jay Bhan, Nicole Nobili, Srinivasan Raghuraman, Patrick Langer ·

    New Bounds for Zarankiewicz Numbers via Reinforced LLM Evolutionary Search

    arXiv:2605.01120v1 Announce Type: new Abstract: The Zarankiewicz number $\textbf{Z}(m, n, s, t)$ is the maximum number of edges in a bipartite graph $G_{m, n}$ such that there is no complete $K_{s, t}$ bipartite subgraph. We determine for the first time the exact values of three …

  17. arXiv cs.CL TIER_1 English(EN) · Chengchun Shi ·

    Segmenting Human-LLM Co-authored Text via Change Point Detection

    The rise of large language models (LLMs) has created an urgent need to distinguish between human-written and LLM-generated text to ensure authenticity and societal trust. Existing detectors typically provide a binary classification for an entire passage; however, this is insuffic…

  18. arXiv cs.AI TIER_1 English(EN) · Xingyu Hu, Kai Zhang, Jiancan Wu, Shuli Wang, Chi Wang, Wenshuai Chen, Yinhua Zhu, Haitao Wang, Xingxing Wang, Xiang Wang ·

    DynamicPO: Dynamic Preference Optimization for Recommendation

    arXiv:2605.00327v1 Announce Type: cross Abstract: In large language model (LLM)-based recommendation systems, direct preference optimization (DPO) effectively aligns recommendations with user preferences, requiring multi-negative objective functions to leverage abundant implicit-…

  19. arXiv cs.CL TIER_1 English(EN) · Ewelina Gajewska, Michal Wawer, Katarzyna Budzynska, Jaroslaw A. Chudziak ·

    Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework

    arXiv:2605.01416v1 Announce Type: cross Abstract: The increasing scale and complexity of online platforms raises critical policy questions around harmful content, digital well-being, and user autonomy. Traditional content moderation systems rely on centralised, top-down rules, of…

  20. arXiv cs.CL TIER_1 English(EN) · Luo Ji, Qi Qin, Ningyuan Xi, Teng Chen, Qingqing Gu, Hongyan Li ·

    Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

    arXiv:2605.01973v1 Announce Type: new Abstract: Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalab…

  21. arXiv cs.CL TIER_1 English(EN) · Xuemei Tang, Xufeng Duan, Zhenguang G. Cai ·

    Do Large Language Models Plan Answer Positions? Position Bias in Multiple-Choice Question Generation

    arXiv:2605.01846v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate multiple-choice questions (MCQs), where correct answers should ideally be uniformly distributed across options. However, we observe that LLMs exhibit systematic position…

  22. Hugging Face Daily Papers TIER_1 (CA) ·

    LLM-enabled Social Agents

    Large Language Models (LLMs) have transformed agent-agent and human-agent interaction by enabling software, physical, and simulation agents to communicate and deliberate through natural language. Yet fluent language use does not by itself yield socially intelligible behaviour. Mo…

  23. arXiv cs.CL TIER_1 English(EN) · Zhongyi Zhou, Kohei Uehara, Haoyu Zhang, Jingtao Zhou, Lin Gu, Ruofei Du, Zheng Xu, Tatsuya Harada ·

    ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

    arXiv:2508.04086v2 Announce Type: replace Abstract: Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like depth-first search (DFS). This leads to inevitable annotation failures and low efficiency in data gener…

  24. arXiv cs.CL TIER_1 English(EN) · Hongyan Li ·

    Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

    Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signa…

  25. arXiv cs.CL TIER_1 English(EN) · Zhenguang G. Cai ·

    Do Large Language Models Plan Answer Positions? Position Bias in Multiple-Choice Question Generation

    Large language models (LLMs) are increasingly used to generate multiple-choice questions (MCQs), where correct answers should ideally be uniformly distributed across options. However, we observe that LLMs exhibit systematic position biases during generation. Through extensive exp…

  26. arXiv cs.AI TIER_1 English(EN) · Shuxing Yang, Fujia Chen, Rui Zhao, Junyao Wu, Yize Wang, Haiyao Luo, Ning Han, Qiaolu Chen, Yuze Hu, Wenhao Li, Mingzhu Li, Hongsheng Chen, Yihao Yang ·

    End-to-end autonomous scientific discovery on a real optical platform

    arXiv:2604.27092v1 Announce Type: new Abstract: Scientific research has long been human-led, driving new knowledge and transformative technologies through the continual revision of questions, methods and claims as evidence accumulates. Although large language model (LLM)-based ag…

  27. arXiv cs.CL TIER_1 English(EN) · Yilun Zhu, Nikhita Vedula, Shervin Malmasi ·

    From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking

    arXiv:2604.27410v1 Announce Type: cross Abstract: Entity search, i.e., finding the most similar entities to a query entity, faces unique challenges in e-commerce, where product similarity varies across categories and contexts. Traditional embedding-based approaches often struggle…

  28. arXiv cs.CL TIER_1 English(EN) · Yuxi Ma, Jieming Cui, Muyang Li, Ye Zhao, Yu Li, Yixuan Wang, Chi Zhang, Yinyin Zang, Yixin Zhu ·

    Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health

    arXiv:2604.27846v1 Announce Type: new Abstract: How people narrate their experiences offers a window into how the mind organizes them. Computational approaches to therapeutic writing have evolved from lexical counting to neural methods, yet remain fragmented: dictionary tools mis…

  29. arXiv cs.AI TIER_1 English(EN) · Ahmed Abdelkawy, Ahmed Elsayed, Asem Ali, Aly Farag, Thomas Tretter, Michael McIntyre ·

    Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification

    arXiv:2601.06394v4 Announce Type: replace-cross Abstract: Understanding student behavior in the classroom is essential to improve both pedagogical quality and student engagement. Existing methods for predicting student engagement typically require substantial annotated data to mo…

  30. arXiv cs.AI TIER_1 English(EN) · Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li ·

    Test Before You Deploy: Governing Updates in the LLM Supply Chain

    arXiv:2604.27789v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used as core dependencies in software systems. However, the hosted LLM services evolve continuously through provider-side updates without explicit version changes. These silent updates…

  31. arXiv cs.AI TIER_1 English(EN) · Jackson Vonderhorst, Kuangshi Ai, Haichao Miao, Shusen Liu, Chaoli Wang ·

    Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

    arXiv:2604.27996v1 Announce Type: new Abstract: This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three prima…

  32. arXiv cs.AI TIER_1 English(EN) · Xiang Wang ·

    DynamicPO: Dynamic Preference Optimization for Recommendation

    In large language model (LLM)-based recommendation systems, direct preference optimization (DPO) effectively aligns recommendations with user preferences, requiring multi-negative objective functions to leverage abundant implicit-feedback negatives and sharpen preference boundari…

  33. arXiv cs.AI TIER_1 English(EN) · Chaoli Wang ·

    Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

    This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-speci…

  34. arXiv cs.CL TIER_1 English(EN) · Yixin Zhu ·

    Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health

    How people narrate their experiences offers a window into how the mind organizes them. Computational approaches to therapeutic writing have evolved from lexical counting to neural methods, yet remain fragmented: dictionary tools miss discourse structure, while embeddings conflate…

  35. arXiv cs.AI TIER_1 English(EN) · Jingyue Li ·

    Test Before You Deploy: Governing Updates in the LLM Supply Chain

    Large Language Models (LLMs) are increasingly used as core dependencies in software systems. However, the hosted LLM services evolve continuously through provider-side updates without explicit version changes. These silent updates can introduce behavioral drift, causing regressio…

  36. arXiv cs.CL TIER_1 English(EN) · Shervin Malmasi ·

    From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking

    Entity search, i.e., finding the most similar entities to a query entity, faces unique challenges in e-commerce, where product similarity varies across categories and contexts. Traditional embedding-based approaches often struggle to capture nuanced context-specific attribute rel…

  37. arXiv cs.AI TIER_1 English(EN) · Mahiro Nakao, Kazuhiro Takemoto ·

    Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

    arXiv:2604.26577v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly considered for deployment as the control component of robotic health attendants, yet their safety in this context remains poorly characterized. We introduce a dataset of 270 harmful inst…

  38. arXiv cs.AI TIER_1 English(EN) · Nayoung Choi, Haeyu Jeong, Changbong Kim, Hongjun Lim, Jinho D. Choi ·

    Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

    arXiv:2604.26120v1 Announce Type: new Abstract: Behavioral logs provide rich signals for user modeling, but are noisy and interleaved across diverse intents. Recent work uses LLMs to generate interpretable natural-language personas from user logs, yet evaluation often emphasizes …

  39. arXiv cs.CL TIER_1 English(EN) · Jingjie Ning, Jo\~ao Coelho, Yibo Kong, Yunfan Long, Bruno Martins, Jo\~ao Magalh\~aes, Jamie Callan, Chenyan Xiong ·

    Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests

    arXiv:2601.17617v3 Announce Type: replace-cross Abstract: LLM-powered search agents are increasingly being used for multi-step information seeking tasks, yet the IR community lacks empirical understanding of how agentic search sessions unfold and how retrieved evidence is reflect…

  40. arXiv cs.AI TIER_1 English(EN) · Matteo Leonesi, Francesco Belardinelli, Flavio Corradini, Marco Piangerelli ·

    Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

    arXiv:2604.26511v1 Announce Type: cross Abstract: Alignment faking (AF) occurs when an LLM strategically complies with training objectives to avoid value modification, reverting to prior preferences once monitoring is lifted. Current detection methods focus on conversational sett…

  41. arXiv cs.CL TIER_1 English(EN) · Iago Alves Brito, Walcy Santos Rezende Rios, Julia Soares Dollis, Diogo Fernandes Costa Silva, Arlindo Rodrigues Galv\~ao Filho ·

    Safety Is Not Universal: The Selective Safety Trap in LLM Alignment

    arXiv:2601.04389v2 Announce Type: replace Abstract: Current safety evaluations of large language models (LLMs) create a dangerous illusion of universal protection by aggregating harms under generic categories such as "Identity Hate", obscuring vulnerabilities toward specific popu…

  42. arXiv cs.CL TIER_1 English(EN) · Deergh Singh Budhauria, Sanyam Jain, Rishav Agarwal, Tracy King ·

    Text Style Transfer with Machine Translation for Graphic Designs

    arXiv:2604.26361v1 Announce Type: new Abstract: Globalization of graphic designs such as those used in marketing materials and magazines is increasingly important for communication to broad audiences. To accomplish this, the textual content in the graphic designs needs to be accu…

  43. arXiv cs.AI TIER_1 English(EN) · Kazuhiro Takemoto ·

    Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

    Large language models (LLMs) are increasingly considered for deployment as the control component of robotic health attendants, yet their safety in this context remains poorly characterized. We introduce a dataset of 270 harmful instructions spanning nine prohibited behavior categ…

  44. arXiv cs.AI TIER_1 English(EN) · Marco Piangerelli ·

    Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

    Alignment faking (AF) occurs when an LLM strategically complies with training objectives to avoid value modification, reverting to prior preferences once monitoring is lifted. Current detection methods focus on conversational settings and rely primarily on Chain-of-Thought (CoT) …

  45. Hugging Face Daily Papers TIER_1 English(EN) ·

    Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

    Alignment faking (AF) occurs when an LLM strategically complies with training objectives to avoid value modification, reverting to prior preferences once monitoring is lifted. Current detection methods focus on conversational settings and rely primarily on Chain-of-Thought (CoT) …

  46. arXiv cs.CL TIER_1 English(EN) · Tracy King ·

    Text Style Transfer with Machine Translation for Graphic Designs

    Globalization of graphic designs such as those used in marketing materials and magazines is increasingly important for communication to broad audiences. To accomplish this, the textual content in the graphic designs needs to be accurately translated and have the text styling pres…

  47. arXiv cs.CL TIER_1 English(EN) · Sreehari Sankar, Aliakbar Nafar, Mona Barman, Hannah K. Heitz, Ashwin Kumar, Pouria Tohidi, Dailun Li, Danish Hussain, Russell DuBois, Hamed Hasheminia, Farshad Majzoubi ·

    Analyzing LLM Reasoning to Uncover Mental Health Stigma

    arXiv:2604.25053v1 Announce Type: new Abstract: While large language models (LLMs) are increasingly being explored for mental health applications, recent studies reveal that they can exhibit stigma toward individuals with psychological conditions. Existing evaluations of this sti…

  48. arXiv cs.CL TIER_1 English(EN) · Huyen Nguyen, Haoxuan Zhang, Yang Zhang, Haihua Chen, Junhua Ding ·

    LongSumEval: Question-Answering Based Evaluation and Feedback-Driven Refinement for Long Document Summarization

    arXiv:2604.25130v1 Announce Type: new Abstract: Evaluating long document summaries remains the primary bottleneck in summarization research. Existing metrics correlate weakly with human judgments and produce aggregate scores without explaining deficiencies or guiding improvement,…

  49. arXiv cs.CL TIER_1 English(EN) · Youngjoon Jang, Chanhee Park, Hyeonseok Moon, Young-kyoung Ham, Jiwon Moon, Jinhyeon Kim, JuKyung Jung, Heuiseok Lim ·

    LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model

    arXiv:2604.25297v1 Announce Type: new Abstract: In recent years, the rapid proliferation of open-source large language models (LLMs) has spurred efforts to turn general-purpose models into domain specialists. However, many domain-specialized LLMs are developed using datasets and …

  50. arXiv cs.CL TIER_1 English(EN) · Peng Liao, Peijia Zheng, Lingbo Li, Shangsong Liang, Lin Chen ·

    Intrinsic Mutual Information as a Modulator for Preference Optimization

    arXiv:2604.24804v1 Announce Type: cross Abstract: Offline preference optimization methods, such as Direct Preference Optimization (DPO), offer significant advantages in aligning Large Language Models (LLMs) with human values. However, achieving optimal performance with these meth…

  51. arXiv cs.LG TIER_1 English(EN) · Zhezheng Hao, Hong Wang, Haoyang Liu, Jian Luo, Jiarui Yu, Hande Dong, Qiang Lin, Can Wang, Jiawei Chen ·

    Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

    arXiv:2510.10150v3 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) serves as a cornerstone technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, its training is often plagued by \emph{entropy collapse},…

  52. Hugging Face Daily Papers TIER_1 English(EN) ·

    LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model

    In recent years, the rapid proliferation of open-source large language models (LLMs) has spurred efforts to turn general-purpose models into domain specialists. However, many domain-specialized LLMs are developed using datasets and training protocols that are not aligned with the…

  53. arXiv cs.CL TIER_1 English(EN) · Heuiseok Lim ·

    LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model

    In recent years, the rapid proliferation of open-source large language models (LLMs) has spurred efforts to turn general-purpose models into domain specialists. However, many domain-specialized LLMs are developed using datasets and training protocols that are not aligned with the…

  54. arXiv cs.AI TIER_1 English(EN) · Luca Cotti, Anisa Rula, Devis Bianchini, Federico Cerutti ·

    Enabling Transparent Cyber Threat Intelligence Combining Large Language Models and Domain Ontologies

    arXiv:2509.00081v2 Announce Type: replace-cross Abstract: Effective Cyber Threat Intelligence (CTI) relies upon accurately structured and semantically enriched information extracted from cybersecurity system logs. However, current methodologies often struggle to identify and inte…

  55. arXiv cs.AI TIER_1 English(EN) · Jiajun Chen, Hua Shen ·

    Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

    arXiv:2602.12134v2 Announce Type: replace Abstract: Existing work on value alignment typically characterizes value relations statically, ignoring how alignment interventions, such as prompting, fine-tuning, or preference optimization, reshape the broader value system. In practice…

  56. arXiv cs.CL TIER_1 English(EN) · Bingfeng Chen, Chenjie Qiu, Yifeng Xie, Boyan Xu, Ruichu Cai, Zhifeng Hao ·

    $\mathcal{S}^2$IT: Stepwise Syntax Integration Tuning for Large Language Models in Aspect Sentiment Quad Prediction

    arXiv:2604.23296v1 Announce Type: new Abstract: Aspect Sentiment Quad Prediction (ASQP) has seen significant advancements, largely driven by the powerful semantic understanding and generative capabilities of large language models (LLMs). However, while syntactic structure informa…

  57. arXiv cs.AI TIER_1 English(EN) · Hao Wang, Sathwik Karnik, Bea Lim, Somil Bansal ·

    Using Language Models as Closed-Loop High-Level Planners for Robotics Applications: A Brief Overview and Benchmarks

    arXiv:2511.07410v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) and Vision Language Models (VLMs) have become popular tools for embodied high-level planning. However, their deployment in black-box settings often leads to unpredictable or costly errors. To h…

  58. arXiv cs.LG TIER_1 English(EN) · Wenzhe Xu, Biao Liu, Yiyang Sun, Xin Geng, Ning Xu ·

    Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment

    arXiv:2604.24178v1 Announce Type: new Abstract: Multi-Objective Alignment aims to align Large Language Models (LLMs) with diverse and often conflicting human values by optimizing multiple objectives simultaneously. Existing methods predominantly rely on static preference weight c…

  59. arXiv cs.CL TIER_1 (AF) · Abbas Zeitoun, Lucas Torroba-Hennigen, Yoon Kim ·

    Hyperloop Transformers

    arXiv:2604.21254v2 Announce Type: replace-cross Abstract: LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further constrained by the model…

  60. arXiv cs.CL TIER_1 English(EN) · Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram ·

    One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

    arXiv:2604.13006v2 Announce Type: replace Abstract: Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness under trivial constraints? We show that simple lexical constraints (banning a single punctuation character or comm…

  61. arXiv cs.CL TIER_1 English(EN) · Nabelanita Utami, Ryohei Sasano ·

    Can We Still Hear the Accent? Investigating the Resilience of Native Language Signals in the LLM Era

    arXiv:2604.08568v2 Announce Type: replace Abstract: The evolution of writing assistance tools from machine translation to large language models (LLMs) has changed how researchers write. This study investigates whether this shift is homogenizing research papers by analyzing native…

  62. arXiv cs.CL TIER_1 English(EN) · Miriam Winkler, Verena Blaschke, Barbara Plank ·

    Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

    arXiv:2603.15130v2 Announce Type: replace Abstract: Indirectness is a common feature of daily communication, yet is underexplored in NLP research for both low-resource as well as high-resource languages. Indirect Question Answering (IQA) aims at classifying the polarity of indire…

  63. arXiv cs.CL TIER_1 English(EN) · Johannes Wirth ·

    OLaPh: Optimal Language Phonemizer

    arXiv:2509.20086v2 Announce Type: replace Abstract: Phonemization is a critical component in text-to-speech synthesis. Traditional approaches rely on deterministic transformations and lexica, while neural methods offer potential for higher generalization on out-of-vocabulary (OOV…

  64. arXiv cs.CL TIER_1 English(EN) · Jisoo Yang (Chung-Ang University), Jongwon Ryu (Chung-Ang University), Minuk Ma (University of British Columbia), Trung X. Pham (Van Lang University), Junyeong Kim (Chung-Ang University) ·

    The Pragmatic Persona: Discovering LLM Persona through Bridging Inference

    arXiv:2604.24079v1 Announce Type: new Abstract: Large Language Models (LLMs) reveal inherent and distinctive personas through dialogue. However, most existing persona discovery approaches rely on surface-level lexical or stylistic cues, treating dialogue as a flat sequence of tok…

  65. arXiv cs.CL TIER_1 English(EN) · Nikita Borovkov, Elisei Rykov, Olga Tsymboi, Sergei Filimonov, Nikita Surnachev, Dmitry Bitman, Anatolii Potapov ·

    Learning Selective LLM Autonomy from Copilot Feedback in Enterprise Customer Support Workflows

    arXiv:2604.23855v1 Announce Type: new Abstract: We present a deployed system that automates end-to-end customer support workflows inside an enterprise Business Process Management (BPM) platform. The approach is scalable in production and reaches selective automation within two we…

  66. arXiv cs.CL TIER_1 English(EN) · Imranul Ashrafi, Inigo Jauregi Unanue, Massimo Piccardi ·

    Pref-CTRL: Preference Driven LLM Alignment using Representation Editing

    arXiv:2604.23543v1 Announce Type: new Abstract: Test-time alignment methods offer a promising alternative to fine-tuning by steering the outputs of large language models (LLMs) at inference time with lightweight interventions on their internal representations. Recently, a promine…

  67. arXiv cs.CL TIER_1 English(EN) · Yurui Xiang, Xingyi Mao, Rui Sheng, Zixin Chen, Zelin Zang, Yuyang Wu, Haipeng Zeng, Huamin Qu, Yushi Sun, Yanna Lin ·

    VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs

    arXiv:2604.23356v1 Announce Type: new Abstract: Large language models (LLMs) show promise in medical diagnosis, but real-world deployment remains challenging due to high-stakes clinical decisions and imperfect reasoning reliability. As a result, careful inspection of model behavi…

  68. arXiv cs.AI TIER_1 English(EN) · Kiyoshige Garces, Gloria Milena Fernandez-Nieto, Linxuan Zhao, Sachini Samaraweera, Dragan Gasevic, Roberto Martinez-Maldonado, Vanessa Echeverria ·

    Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact

    arXiv:2604.23255v1 Announce Type: cross Abstract: Research shows that dialogue, the interactive process through which participants articulate their thinking, plays a central role in constructing shared understanding, coordinating action, and shaping learning outcomes in teams. An…

  69. arXiv cs.AI TIER_1 English(EN) · Wenjie Xiao, Xuehai Tang, Biyu Zhou, Songlin Hu, Jizhong Han ·

    RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents

    arXiv:2604.22888v1 Announce Type: cross Abstract: Agent skills introduce a new and more severe form of indirect injection for LLM agents: unlike traditional indirect prompt injection, attackers can hide malicious instructions inside a dense, action-oriented skill that already fun…

  70. arXiv cs.AI TIER_1 English(EN) · Ashmi Banerjee, Adithi Satish, Wolfgang W\"orndl, Yashar Deldjoo ·

    Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

    arXiv:2604.24158v1 Announce Type: new Abstract: Evaluating nuanced conversational travel recommendations is challenging when human annotations are costly and standard metrics ignore stakeholder-centric goals. We study LLMs-as-Judges for sustainable city-trip lists across four dim…

  71. arXiv cs.AI TIER_1 English(EN) · Abid Talukder, Maruf Ahmed Mridul, Oshani Seneviratne ·

    Towards Automated Ontology Generation from Unstructured Text: A Multi-Agent LLM Approach

    arXiv:2604.23090v1 Announce Type: new Abstract: Automatically generating formal ontologies from unstructured natural language remains a central challenge in knowledge engineering. While large language models (LLMs) show promise, it remains unclear which architectural design choic…

  72. arXiv cs.LG TIER_1 English(EN) · Lawrence Phillips, Marc Boubnovski Martell, Aditya Misra, Josefa Lia Stoisser, Cesar A. Prada-Medina, Rory Donovan-Maiye, Kaspar M\"artens ·

    SynthPert: Enhancing LLM Biological Reasoning via Synthetic Reasoning Traces for Cellular Perturbation Prediction

    arXiv:2509.25346v2 Announce Type: replace-cross Abstract: Predicting cellular responses to genetic perturbations represents a fundamental challenge in systems biology, critical for advancing therapeutic discovery and virtual cell modeling. While large language models (LLMs) show …

  73. arXiv cs.LG TIER_1 English(EN) · Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang ·

    Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

    arXiv:2604.18239v2 Announce Type: replace Abstract: Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based objectives suppress the chosen response along with the rejected one, a phenomenon known as likelihoo…

  74. arXiv cs.CL TIER_1 English(EN) · Junhua Ding ·

    LongSumEval: Question-Answering Based Evaluation and Feedback-Driven Refinement for Long Document Summarization

    Evaluating long document summaries remains the primary bottleneck in summarization research. Existing metrics correlate weakly with human judgments and produce aggregate scores without explaining deficiencies or guiding improvement, preventing effective refinement in applications…

  75. Hugging Face Daily Papers TIER_1 English(EN) ·

    LongSumEval: Question-Answering Based Evaluation and Feedback-Driven Refinement for Long Document Summarization

    Evaluating long document summaries remains the primary bottleneck in summarization research. Existing metrics correlate weakly with human judgments and produce aggregate scores without explaining deficiencies or guiding improvement, preventing effective refinement in applications…

  76. arXiv cs.CL TIER_1 English(EN) · Farshad Majzoubi ·

    Analyzing LLM Reasoning to Uncover Mental Health Stigma

    While large language models (LLMs) are increasingly being explored for mental health applications, recent studies reveal that they can exhibit stigma toward individuals with psychological conditions. Existing evaluations of this stigma primarily rely on multiple-choice questions …

  77. arXiv cs.AI TIER_1 English(EN) · Ning Xu ·

    Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment

    Multi-Objective Alignment aims to align Large Language Models (LLMs) with diverse and often conflicting human values by optimizing multiple objectives simultaneously. Existing methods predominantly rely on static preference weight construction strategies. However, rigidly alignin…

  78. arXiv cs.CL TIER_1 English(EN) · Junyeong Kim ·

    The Pragmatic Persona: Discovering LLM Persona through Bridging Inference

    Large Language Models (LLMs) reveal inherent and distinctive personas through dialogue. However, most existing persona discovery approaches rely on surface-level lexical or stylistic cues, treating dialogue as a flat sequence of tokens and failing to capture the deeper discourse-…

  79. arXiv cs.AI TIER_1 English(EN) · Hong Su ·

    An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

    arXiv:2604.22199v1 Announce Type: cross Abstract: Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) intera…

  80. arXiv cs.CL TIER_1 English(EN) · Youmi Ma, Naoaki Okazaki ·

    From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models

    arXiv:2601.11020v3 Announce Type: replace Abstract: Advances in mechanistic interpretability have identified special attention heads, known as retrieval heads, that are responsible for retrieving information from the context. However, the role of these retrieval heads in improvin…

  81. arXiv cs.CL TIER_1 English(EN) · Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam ·

    Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

    arXiv:2604.22294v1 Announce Type: new Abstract: Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections g…

  82. arXiv cs.CL TIER_1 English(EN) · Anatolii Potapov ·

    Learning Selective LLM Autonomy from Copilot Feedback in Enterprise Customer Support Workflows

    We present a deployed system that automates end-to-end customer support workflows inside an enterprise Business Process Management (BPM) platform. The approach is scalable in production and reaches selective automation within two weeks for a new process, leveraging supervision al…

  83. arXiv cs.CL TIER_1 English(EN) · Monica S. Lam ·

    Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

    Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documen…

  84. arXiv cs.AI TIER_1 English(EN) · Hong Su ·

    An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

    Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated large-language-model (LLM) interaction for uncovered tasks, and even successful exe…

  85. Hugging Face Daily Papers TIER_1 English(EN) ·

    GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion

    Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based approaches attempt to align these…

  86. arXiv cs.CL TIER_1 English(EN) · Tieke He ·

    GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion

    Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based approaches attempt to align these…

  87. arXiv cs.CL TIER_1 English(EN) · Yoon Kim ·

    Hyperloop Transformers

    LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further constrained by the model's memory footprint, thus motivating parameter-efficient a…

  88. Hugging Face Daily Papers TIER_1 English(EN) ·

    Hyperloop Transformers

    LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further constrained by the model's memory footprint, thus motivating parameter-efficient a…

  89. arXiv cs.CL TIER_1 English(EN) · Yong Ge ·

    Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

    Online reviews have played a pivotal role in consumers' decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains …

  90. Hugging Face Daily Papers TIER_1 English(EN) ·

    Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

    Online reviews have played a pivotal role in consumers' decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains …

  91. Hugging Face Daily Papers TIER_1 English(EN) ·

    On Reasoning Behind Next Occupation Recommendation

    In this work, we develop a novel reasoning approach to enhance the performance of large language models (LLMs) in future occupation prediction. In this approach, a reason generator first derives a ``reason'' for a user using his/her past education and career history. The reason s…

  92. arXiv cs.CL TIER_1 English(EN) · Ee-Peng Lim ·

    On Reasoning Behind Next Occupation Recommendation

    In this work, we develop a novel reasoning approach to enhance the performance of large language models (LLMs) in future occupation prediction. In this approach, a reason generator first derives a `"reason'' for a user using his/her past education and career history. The reason s…

  93. arXiv cs.CV TIER_1 English(EN) · Md Adnan Arefeen, Biplob Debnath, Ravi K. Rajendran, Murugan Sankaradas, Srimat T. Chakradhar ·

    Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery

    arXiv:2605.05344v1 Announce Type: new Abstract: In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant imag…

  94. arXiv cs.CV TIER_1 English(EN) · Bumjun Kim, Albert No ·

    Memorization In Stable Diffusion Is Unexpectedly Driven by CLIP Embeddings

    arXiv:2605.02908v1 Announce Type: new Abstract: Understanding how textual embeddings contribute to memorization in text-to-image diffusion models is crucial for both interpretability and safety. This paper investigates an unexpected behavior of CLIP embeddings in Stable Diffusion…

  95. arXiv cs.CV TIER_1 English(EN) · Ruichi Zhang, Chikai Shang, Jiacheng Yang, Mengke Li, Yang Zhou, Junlong Gao, Yang Lu ·

    CUE: Concept-Aware Multi-Label Expansion to Mitigate Concept Confusion in Long-Tailed Learning

    arXiv:2605.01309v1 Announce Type: new Abstract: Long-tailed distributions are common in real-world recognition tasks, where a few head classes have many samples while most tail classes have very few. Recently, fine-tuning foundation models for long-tailed learning has gained atte…

  96. arXiv cs.CV TIER_1 English(EN) · Dahua Gao, Yubo Dong, Anqi Li, Zhenyuan Lin, Ang Gao, Danhua Liu, Guangming Shi ·

    FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging

    arXiv:2604.27653v1 Announce Type: new Abstract: Conventional push-broom hyperspectral imaging suffers from slow acquisition speeds, precluding real-time object detection; in contrast, snapshot spectral imaging enables instantaneous hyperspectral images (HSIs) capture, making real…

  97. arXiv cs.CV TIER_1 English(EN) · Shuokun Cheng, Jinghao Shi, Kun Sun ·

    UHR-Net: An Uncertainty-Aware Hypergraph Refinement Network for Medical Image Segmentation

    arXiv:2604.28095v1 Announce Type: new Abstract: Accurate lesion segmentation is crucial for clinical diagnosis and treatment planning. However, lesions often resemble surrounding tissues and exhibit ill-defined boundaries, leading to unstable predictions in boundary/transition re…

  98. arXiv stat.ML TIER_1 English(EN) · Mehryar Mohri, Yutao Zhong ·

    Mind the Gap: Structure-Aware Consistency in Preference Learning

    arXiv:2604.27733v1 Announce Type: cross Abstract: Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pair…

  99. arXiv cs.CV TIER_1 English(EN) · Kun Sun ·

    UHR-Net: An Uncertainty-Aware Hypergraph Refinement Network for Medical Image Segmentation

    Accurate lesion segmentation is crucial for clinical diagnosis and treatment planning. However, lesions often resemble surrounding tissues and exhibit ill-defined boundaries, leading to unstable predictions in boundary/transition regions. Moreover, small-lesion cues can be dilute…

  100. arXiv stat.ML TIER_1 English(EN) · Yutao Zhong ·

    Mind the Gap: Structure-Aware Consistency in Preference Learning

    Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that fo…

  101. arXiv cs.CV TIER_1 English(EN) · Guangming Shi ·

    FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging

    Conventional push-broom hyperspectral imaging suffers from slow acquisition speeds, precluding real-time object detection; in contrast, snapshot spectral imaging enables instantaneous hyperspectral images (HSIs) capture, making real-time object detection feasible, yet its potenti…

  102. arXiv cs.CV TIER_1 English(EN) · Yuqing Cao, Shuo Zhu, Rongzhou Chen, Jingyan Chen, Ni Chen, Edmund Y. Lam ·

    Rapid tracking through strongly scattering media with physics-informed neuromorphic speckle analysis

    arXiv:2604.25310v1 Announce Type: new Abstract: This work addresses the critical problem of tracking fast-moving objects through strongly scattering media in a low-light environment. Different from existing approaches that use frame-based cameras with fixed exposure times, which …

  103. arXiv cs.CV TIER_1 English(EN) · Xinxin Liu, Ming Li, Zonglin Lyu, Yuzhang Shang, Chen Chen ·

    Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

    arXiv:2604.24952v1 Announce Type: new Abstract: Human visual preferences are inherently multi-dimensional, encompassing aesthetics, detail fidelity, and semantic alignment. However, existing datasets provide only single, holistic annotations, resulting in severe label noise: imag…

  104. arXiv cs.CV TIER_1 English(EN) · Edmund Y. Lam ·

    Rapid tracking through strongly scattering media with physics-informed neuromorphic speckle analysis

    This work addresses the critical problem of tracking fast-moving objects through strongly scattering media in a low-light environment. Different from existing approaches that use frame-based cameras with fixed exposure times, which trade off signal-to-noise ratio for temporal res…

  105. arXiv cs.CV TIER_1 English(EN) · Jui-Cheng Chiu, Yu-Chao Wang, Shengyang Luo, Tongyan Wang, Qi Yang, Nabin Khanal, Yingjie Victor Chen ·

    MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks

    arXiv:2604.23788v1 Announce Type: new Abstract: Appreciating multi-figure paintings requires understanding how characters relate through subtle cues like gaze alignment, gesture, and spatial arrangement. We present MIRAGE, an evidence-centric framework designed to scaffold the ex…

  106. arXiv cs.CV TIER_1 English(EN) · Aashish Anantha Ramakrishnan, Sharon X. Huang, Dongwon Lee ·

    ANCHOR: LLM-driven Subject Conditioning for Text-to-Image Synthesis

    arXiv:2404.10141v2 Announce Type: replace Abstract: Text-to-image (T2I) models have achieved remarkable progress in high-quality image synthesis, yet most benchmarks rely on simple, self-contained prompts, failing to capture the complexity of real-world captions. Human-written ca…

  107. arXiv cs.CV TIER_1 English(EN) · Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen ·

    Towards Long-horizon Agentic Multimodal Search

    arXiv:2604.12890v2 Announce Type: replace Abstract: Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multim…

  108. arXiv cs.CV TIER_1 English(EN) · Ruiqing Sun, Xingshan Yao, Zhijing Wu, Tian Lan, Chenhao Cui, Huiyang Zhao, Jialing Shi, Chen Yang, Xianling Mao ·

    Do Protective Perturbations Really Protect Portrait Privacy under Real-world Image Transformations?

    arXiv:2604.23688v1 Announce Type: new Abstract: Proactive defense methods protect portrait images from unauthorized editing or talking face generation (TFG) by introducing pixel-level protective perturbations, and have already attracted increasing attention for privacy protection…

  109. arXiv cs.CV TIER_1 English(EN) · Chen Chen ·

    Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

    Human visual preferences are inherently multi-dimensional, encompassing aesthetics, detail fidelity, and semantic alignment. However, existing datasets provide only single, holistic annotations, resulting in severe label noise: images that excel in some dimensions but are deficie…

  110. arXiv cs.CV TIER_1 English(EN) · Liangwei Lyu, Jiaqi Xu, Jianwei Ding, Qiyao Deng ·

    When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

    arXiv:2602.21977v4 Announce Type: replace Abstract: Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and c…

  111. Smol AINews TIER_1 English(EN) ·

    Context Engineering: Much More than Prompts

    **Context Engineering** emerges as a significant trend in AI, highlighted by experts like **Andrej Karpathy**, **Walden Yan** from **Cognition**, and **Tobi Lutke**. It involves managing an LLM's context window with the right mix of prompts, retrieval, tools, and state to optimiz…

  112. Eugene Yan TIER_1 English(EN) ·

    Evaluating Long-Context Question & Answer Systems

    Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.

  113. Eugene Yan TIER_1 English(EN) ·

    Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space

    The fundamentals of text-to-image generation, relevant papers, and experimenting with DDPM.