PulseAugur
实时 13:31:05
English(EN) Discovering Lexical Gaps Using Embeddings from Multilingual LLMs

新研究探讨大语言模型在翻译、偏见和多语言任务中的细微差别

几篇新研究论文探讨了大语言模型(LLMs)在不同语言和文化背景下的细微差别。一项研究介绍了LLMBridge,一个改进英语指代消解的系统,其性能优于先前的最先进模型。另一篇论文提出了一个用于评估机器翻译中文化本地化的基准,强调习语和双关语对大语言模型尤其具有挑战性。关于德语大语言模型GRUFF的研究揭示了代词保真度和偏见问题,尤其是在新代词方面。此外,关于多语言大语言模型的研究探讨了语言在任务执行中的作用、亚洲语言中的文化偏见以及减轻跨语言文化不一致的方法。 AI

影响 这些研究突显了大语言模型开发中持续存在的挑战,特别是在实现文化细微差别、强大的多语言能力和无偏见的推理方面,指出了未来研究和模型改进的方向。

排序理由 该集群由arXiv上发表的多篇学术论文组成,重点关注大语言模型的研究和评估。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 15 个来源。 我们如何撰写摘要 →

新研究探讨大语言模型在翻译、偏见和多语言任务中的细微差别

报道来源 [15]

  1. arXiv cs.CL TIER_1 English(EN) · Lauren Levine, Amir Zeldes ·

    LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English

    arXiv:2605.29048v1 Announce Type: new Abstract: In this paper, we introduce LLMBridge, a new LLM based system for the task of end-to-end referential bridging resolution in English. Our bridging resolution pipeline combines heuristic pre/post-processing with the natural language i…

  2. arXiv cs.CL TIER_1 English(EN) · Madison Van Doren, Casey Ford, Jennifer Barajas, Riley VanMeter, Cory Holland ·

    "Be My Cheese?": Cultural Nuance Benchmarking for Machine Translation in Multilingual LLMs

    arXiv:2602.04729v2 Announce Type: replace Abstract: We present a large-scale human evaluation benchmark for assessing cultural localisation in machine translation produced by state-of-the-art multilingual large language models (LLMs). Existing MT benchmarks emphasise token-level …

  3. arXiv cs.CL TIER_1 English(EN) · Fabian Mewes, Anne Lauscher, Vagrant Gautam ·

    GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German

    arXiv:2605.30214v1 Announce Type: new Abstract: Third-person singular pronouns have long been used to study stereotypical biases in language models and to test their abilities to reason about reference. More recently, the interplay between reasoning and bias has been investigated…

  4. arXiv cs.CL TIER_1 English(EN) · Vagrant Gautam ·

    GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German

    Third-person singular pronouns have long been used to study stereotypical biases in language models and to test their abilities to reason about reference. More recently, the interplay between reasoning and bias has been investigated with the task of pronoun fidelity, which assess…

  5. arXiv cs.CL TIER_1 English(EN) · Qishi Zhan, Minxuan Hu, Seoyeon Jang, Lei Zhao, Ziheng Chen, Man Liang, Xinyue Xiang, Jiaxin Liu, Guansu Wang, Liang He ·

    Disentangling Language Roles in Multilingual LLM Task Execution

    arXiv:2605.27649v1 Announce Type: new Abstract: Multilingual LLMs are increasingly used when instruction, source content, and required response languages do not coincide. Existing benchmarks have expanded multilingual instruction-following evaluation, but they rarely isolate thes…

  6. arXiv cs.CL TIER_1 English(EN) · Tarek Naous, Anagha Savit, Carlos Rafael Catalan, Geyang Guo, Jaehyeok Lee, Kyungdon Lee, Lheane Marie Dizon, Mengyu Ye, Neel Kothari, Sahajpreet Singh, Sarah Masud, Tanish Patwa, Trung Thanh Tran, Zohaib Khan, Alan Ritter, Tanmoy Chakraborty, Yuki Arase… ·

    Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

    arXiv:2510.05291v2 Announce Type: replace Abstract: As Large Language Models (LLMs) develop stronger multilingual capabilities, their sensitivity to culturally diverse entities becomes increasingly important. Prior work by Naous et al. (2024) has shown that LLMs often favor Weste…

  7. arXiv cs.AI TIER_1 English(EN) · Santiago Acevedo, Alessandro Laio, Marco Baroni ·

    Differential syntactic and semantic encoding in LLMs

    arXiv:2601.04765v4 Announce Type: replace-cross Abstract: We study how syntactic and semantic information is encoded in inner layer representations of Large Language Models (LLMs), focusing on the very large DeepSeek-V3. We find that, by averaging hidden-representation vectors of…

  8. arXiv cs.AI TIER_1 English(EN) · Manan Uppadhyay, Prashant Kodali, Pranjal Chitale, Reshma Ramaprasad, Himanshu Beniwal, Sunayana Sitaram ·

    DEPART: DEcomposing PARiTy across Multilingual LLMs

    arXiv:2605.28163v1 Announce Type: cross Abstract: Multilingual Large Language Models (mLLMs) leaderboards report per-language accuracy but rarely explain why disparities emerge, leaving systemic biases unattributed and offering practitioners no actionable levers. We first establi…

  9. arXiv cs.AI TIER_1 English(EN) · Irune Zubiaga, Aitor Soroa, Rodrigo Agerri ·

    Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study

    arXiv:2605.28710v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used for the automatic evaluation of generated text, yet most prior work focuses on English. Despite the growing demand for multilingual evaluation, extending LLM-based evaluators to m…

  10. arXiv cs.CL TIER_1 English(EN) · Lucas Resck, Isabelle Augenstein, Anna Korhonen ·

    Mitigating Cross-Lingual Cultural Inconsistencies in LLMs via Consensus-Driven Preference Optimisation

    arXiv:2605.12515v2 Announce Type: replace Abstract: Despite their impressive capabilities, multilingual large language models (MLLMs) frequently exhibit inconsistent behaviour when the prompt's language changes. While such adaptation is generally desirable, it becomes a critical …

  11. arXiv cs.AI TIER_1 English(EN) · Rodrigo Agerri ·

    Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study

    Large language models (LLMs) are increasingly used for the automatic evaluation of generated text, yet most prior work focuses on English. Despite the growing demand for multilingual evaluation, extending LLM-based evaluators to multilingual settings remains challenging, particul…

  12. arXiv cs.CL TIER_1 English(EN) · Sunayana Sitaram ·

    DEPART: DEcomposing PARiTy across Multilingual LLMs

    Multilingual Large Language Models (mLLMs) leaderboards report per-language accuracy but rarely explain why disparities emerge, leaving systemic biases unattributed and offering practitioners no actionable levers. We first establish that these gaps are systematic rather than arti…

  13. arXiv cs.CL TIER_1 English(EN) · Yoonwon Jung, Aaron S. Cohen, Benjamin K. Bergen ·

    Discovering Lexical Gaps Using Embeddings from Multilingual LLMs

    arXiv:2605.24310v1 Announce Type: new Abstract: Lexical gaps are words that do not exist in certain languages. They pose challenges for building multilingual lexical resources, for machine translation, and for cross-lingual transfer. Existing lexical gap detection relies on human…

  14. dev.to — LLM tag TIER_1 English(EN) · Ai developer ·

    One Ruler to Measure Them All: How Language Affects LLM Quality

    <h1> One Ruler to Measure Them All: How Language Affects LLM Quality </h1> <p>Most discussions about LLM performance focus on the model architecture and prompting. But there's a hidden factor: the tokenizer. It determines how much of your text fits in the context window.</p> <h2>…

  15. dev.to — LLM tag TIER_1 English(EN) · Ai developer ·

    One Ruler to Measure Them All: How Language Affects LLM Quality

    <h1> One Ruler to Measure Them All: How Language Affects LLM Quality </h1> <p>Most discussions about LLM performance focus on the model architecture and prompting. But there's a hidden factor: the tokenizer. It determines how much of your text fits in the context window.</p> <h2>…