LLMs show bias in legal arguments, generate kitsch, and struggle with cultural alignment
ByPulseAugur Editorial·
Summary by gemini-2.5-flash-lite
from 13 sources
A new paper argues that Large Language Models (LLMs) inherently produce "kitsch" due to their training methods, leading to outputs that are perceived as generic despite high ratings. Another study introduces StarDrinks, a new English and Korean dataset for evaluating speech-to-slot systems in realistic drink-ordering scenarios. Additionally, research explores the persuadability of LLMs in legal decision-making contexts and proposes a framework called HiLight to improve how frozen LLMs handle evidence in long contexts.
AI
IMPACT
New research explores LLM limitations in creativity and legal reasoning, alongside tools for better speech and evidence processing.
RANK_REASON
The cluster contains multiple academic papers discussing LLM capabilities and limitations, including kitsch generation, legal decision-making, and evidence highlighting.
arXiv:2604.26233v1 Announce Type: new Abstract: As Large Language Models (LLMs) are proposed as legal decision assistants, and even first-instance decision-makers, across a range of judicial and administrative contexts, it becomes essential to explore how they answer legal questi…
arXiv cs.CL
TIER_1·Marcely Zanon Boito, Caroline Brun, Inyoung Kim, Denys Proux, Salah Ait-Mokhtar, Nikolaos Lagos, Jean-Luc Meunier, Ioan Calapodescu·
arXiv:2604.26500v1 Announce Type: new Abstract: LLMs and speech assistants are increasingly used for task-oriented interactions, yet their evaluation often relies on controlled scenarios that fail to capture the variability and complexity of real user requests. Drink ordering, fo…
arXiv cs.CL
TIER_1Deutsch(DE)·Xenia Klinge, Stefan Ortlieb, Alexander Koller·
arXiv:2604.25929v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used to generate pictures, texts, music, videos, and other works that have traditionally required human creativity. LLM-generated artifacts are often rated better than human-generated wo…
LLMs and speech assistants are increasingly used for task-oriented interactions, yet their evaluation often relies on controlled scenarios that fail to capture the variability and complexity of real user requests. Drink ordering, for example, involves diverse named entities, drin…
arXiv cs.CL
TIER_1·Ant\'onio Branco, Jo\~ao Silva, Nuno Marques, Luis Gomes, Ricardo Campos, Raquel Sequeira, Sara Nerea, Rodrigo Silva, Miguel Marques, Rodrigo Duarte, Artur Putyato, Diogo Folques, Tiago Valente·
arXiv:2604.25654v1 Announce Type: new Abstract: Although the cultural (mis)alignment of Large Language Models (LLMs) has attracted increasing attention -- often framed in terms of cultural bias -- until recently there has been limited work on the design and development of dataset…
As Large Language Models (LLMs) are proposed as legal decision assistants, and even first-instance decision-makers, across a range of judicial and administrative contexts, it becomes essential to explore how they answer legal questions, and in particular the factors that lead the…
Although the cultural (mis)alignment of Large Language Models (LLMs) has attracted increasing attention -- often framed in terms of cultural bias -- until recently there has been limited work on the design and development of datasets for cultural assessment. Here, we review exist…
arXiv cs.CL
TIER_1·Georg Ahnert, Anna-Carolina Haensch, Barbara Plank, Markus Strohmaier·
arXiv:2510.11586v2 Announce Type: replace Abstract: Many in-silico simulations of human survey responses with large language models (LLMs) focus on generating closed-ended survey responses, whereas LLMs are typically trained to generate open-ended text instead. Previous research …
arXiv cs.CL
TIER_1·Shaoang Li, Yanhang Shi, Yufei Li, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Frank Shyu, Luke Simon, Sandeep Pandey, Xi Liu, Jian Li·
arXiv:2604.22565v1 Announce Type: new Abstract: Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for froz…
Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or re…
Large language models (LLMs) have recently demonstrated strong reasoning capabilities and attracted increasing research attention in the field of autonomous driving (AD). However, safe application of LLMs on AD perception and prediction still requires a thorough understanding of …
arXiv:2604.21479v2 Announce Type: replace Abstract: Large language models (LLMs) have recently demonstrated strong reasoning capabilities and attracted increasing research attention in the field of autonomous driving (AD). However, safe application of LLMs on AD perception and pr…
Large language models (LLMs) have recently demonstrated strong reasoning capabilities and attracted increasing research attention in the field of autonomous driving (AD). However, safe application of LLMs on AD perception and prediction still requires a thorough understanding of …