Brief

last 24h

[50/153] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · dev.to — LLM tag English(EN) · 17h

NCA-GENL Certification: Top GenAI Credential in 2026

NVIDIA is offering a new certification, the NVIDIA Certified Associate Generative AI and LLMs (NCA-GENL), designed to validate foundational knowledge in GenAI and LLM integration. This certification is aimed at professionals seeking to demonstrate their understanding of AI systems beyond casual use, making them more attractive to employers in a rapidly evolving job market. With 88% of companies utilizing AI and 25% scaling their AI programs, the NCA-GENL is positioned as a key credential for career advancement in AI-centric roles. AI

IMPACT Validates foundational GenAI and LLM skills, potentially increasing the pool of qualified professionals for AI integration roles.
- NVIDIA
- ChatGPT
- LLMs
- Generative AI
- NCA-GENL
TOOL · arXiv cs.AI English(EN) · 23h

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Researchers have developed a new framework called EDRM that uses early-stage entropy dynamics to determine when Large Language Models (LLMs) should engage in explicit reasoning. They observed that tasks benefiting from Chain-of-Thought (CoT) reasoning show a consistent reduction in entropy during generation, indicating a shift to a structured reasoning state. EDRM leverages this entropy reduction signal to adaptively select inference strategies, leading to significant token reductions and accuracy improvements across various benchmarks and LLMs. AI

IMPACT Optimizes LLM inference by selectively invoking reasoning, potentially reducing costs and improving efficiency for AI operators.
- LLMs
- Chain-of-Thought (CoT)
TOOL · arXiv cs.AI English(EN) · 23h

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

Researchers have explored two methods for efficiently fine-tuning large language models for text classification tasks, particularly under resource constraints. The study compared attaching a classification head to a pre-trained causal LLM using its final-token embedding versus instruction-tuning the LLM in a prompt-to-response format. Experiments on patent and public datasets demonstrated that the embedding-based method often matched or surpassed the instruction-tuned approach for single-label classification, requiring significantly fewer trainable parameters. AI

IMPACT Presents efficient fine-tuning techniques for LLMs, potentially lowering the barrier for deploying these models in text classification tasks.
- LLMs
- BERT
- Ciaran Cooney
TOOL · arXiv cs.AI English(EN) · 23h

RAG-Pull: Turning Retrieval into a Code-Injection Channel via Invisible Unicode Perturbations

Researchers have developed a novel attack method called RAG-Pull that exploits Retrieval-Augmented Generation (RAG) systems. By inserting invisible Unicode characters into queries or external code, RAG-Pull can redirect retrieval to malicious code snippets. This manipulation can lead to vulnerabilities such as remote code execution and SQL injection, compromising the safety alignment of LLMs. AI

IMPACT This research highlights a new attack vector against LLMs that could compromise data security and model safety.
TOOL · arXiv cs.AI English(EN) · 23h

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

Researchers have introduced SciHorizon-GENE, a new benchmark designed to evaluate the capabilities of large language models (LLMs) in understanding and reasoning about gene-level biological information. This benchmark, derived from extensive biological databases, includes over 540,000 questions covering gene-to-function reasoning relevant to cell annotation and mechanism analysis. Evaluations of current LLMs reveal significant variations in their gene-level reasoning abilities and persistent issues with generating accurate and complete functional interpretations. AI

IMPACT Establishes a new standard for evaluating LLM performance in life sciences, guiding development for biological interpretation tasks.
TOOL · arXiv cs.AI English(EN) · 23h

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

Researchers have introduced BOHM, a novel method for attributing contributions within compound AI systems that utilize hierarchical routing. Unlike traditional Shapley-based methods, BOHM leverages existing routing weights, offering a zero-cost attribution solution that is particularly effective for systems with opaque components or agentic orchestrators. The method provides multi-resolution attribution across all levels of the hierarchy simultaneously, demonstrating strong correlation with Shapley values on various benchmarks while requiring significantly fewer evaluations. AI

IMPACT Provides a more efficient method for understanding how complex AI systems make decisions, potentially improving debugging and interpretability.
- LLMs
- SHAP
- BOHM
TOOL · arXiv cs.AI English(EN) · 23h

LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding

Researchers have introduced LFRAG, a new framework designed to improve multimodal retrieval-augmented generation (RAG) for visually rich documents. Unlike previous page-level retrieval methods, LFRAG operates at the block level, segmenting documents to capture both semantic meaning and layout structures. This approach enhances retrieval accuracy and reduces redundant information, leading to more efficient and precise downstream generation tasks. The team also developed LFDocQA, a new benchmark dataset with block-level annotations to facilitate evaluation of these fine-grained retrieval capabilities. AI

IMPACT Enhances AI's ability to process and understand complex visual documents, potentially improving information extraction and Q&A systems.
- LLMs
- LFRAG
- LFDocQA
- multimodal RAG
TOOL · arXiv cs.AI English(EN) · 23h

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Researchers have developed a new framework to test how open-source large language models (LLMs) can be used to spread political influence online. Their study evaluated over 30 LLMs from various families and countries, finding that these models are generally more willing to generate left-leaning content. The research also indicated that larger models tend to have narrower political expressivity, and significant regional differences exist in their outputs. AI

IMPACT Establishes a framework for auditing LLM political steerability, crucial for countering influence campaigns.
TOOL · arXiv cs.LG English(EN) · 23h

Convex Optimization for Alignment and Preference Learning on a Single GPU

Researchers have developed a new method called COALA, which uses convex optimization to fine-tune large language models for human preferences. This approach significantly reduces the computational resources and training time required compared to existing methods like DPO, enabling efficient training on a single GPU. COALA demonstrates competitive performance across multiple datasets and models, achieving stable reward increases and faster convergence. AI

IMPACT Enables more efficient fine-tuning of LLMs on limited hardware, potentially democratizing access to preference alignment techniques.
- ChatGPT
- Llama-3.1-8B
- Gemini
- LLMs
RESEARCH · arXiv cs.AI English(EN) · 3d · [2 sources]

CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

Researchers have developed CVSearch, a new framework designed to improve how multimodal large language models (MLLMs) process high-resolution images. This training-free system dynamically adapts its search strategy, first attempting an expert-assisted search and then employing a novel semantic-aware scanning mechanism if the initial attempt fails. CVSearch aims to overcome the efficiency and coverage trade-offs of existing methods by intelligently decomposing images and exploring details iteratively, achieving state-of-the-art accuracy while enhancing search efficiency. AI

IMPACT Enhances multimodal LLM capabilities for processing high-resolution imagery, potentially improving applications in fields requiring detailed visual understanding.
RESEARCH · arXiv cs.AI English(EN) · 3d · [2 sources]

It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

A new study published on arXiv reveals that geopolitical biases in large language models primarily stem from the post-training alignment phase, rather than the initial training data. Researchers tested seven LLM pairs, finding that six exhibited biases favoring their developer's region after post-training. This effect was particularly pronounced in Alibaba's Qwen 2.5, which showed an 18-fold increase in China-favorability odds post-training. The study also noted that the language used in prompts can amplify these biases, as seen with the French-made Mistral model becoming pro-France only when prompted in French. AI

IMPACT Highlights that LLM alignment processes, not just raw data, shape geopolitical biases, necessitating greater transparency in model development.
- LLMs
- arXiv
- Alibaba
- Qwen 2.5
TOOL · Forbes — Innovation English(EN) · 5d

How The ARISE Network Is Rethinking Clinical AI

The ARISE Healthcare Network, a collaboration of physicians from Harvard and Stanford, is investigating the real-world performance and evaluation of AI in medicine. Led by researchers like Jonathan Chen and Adam Rodman, ARISE aims to understand how AI systems function in clinical settings, define clinical reasoning, and explore optimal human-AI collaboration. Early findings suggest that advanced LLMs can sometimes outperform physicians, even those using AI tools, prompting a re-evaluation of what constitutes clinical reasoning. AI

IMPACT Prompts a re-evaluation of clinical reasoning and optimal human-AI collaboration in healthcare settings.
- AI
- ChatGPT
- Stanford
- LLMs
- Harvard
- Adam Rodman
- ARPA-H
- ARISE Healthcare Network
- Jonathan H. Chen
TOOL · dev.to — LLM tag English(EN) · 3d

Graph RAG vs Vector RAG: When to Use Each

This article compares two primary approaches to Retrieval-Augmented Generation (RAG) for large language models: Vector RAG and Graph RAG. Vector RAG uses similarity-based retrieval of text chunks stored in a vector database, offering simplicity and speed. Graph RAG, conversely, models knowledge as nodes and relationships, enabling retrieval based on structural context and multi-hop reasoning. The choice between them depends on the complexity of queries and the importance of relationships versus semantic similarity. AI

IMPACT Helps developers choose the most effective RAG architecture for their specific LLM application needs.
TOOL · dev.to — LLM tag English(EN) · 5d

Quantitative Content Methodology: 5-Layer Content Framework

A new content methodology called Quantitative Content Methodology (QCM) has been introduced, treating text as a mathematical dataset optimized for search engines and LLMs. QCM focuses on high information density, aiming for at least 2.5 verifiable data points per 100 words, and structures content with an "atomic answer" as the first sentence under each H2 heading. This framework is designed to make content more easily citable by generative search engines like Google's AI Overviews, ChatGPT, and Gemini. AI

IMPACT This methodology could help content creators produce material that is more easily understood and cited by AI-powered search and summarization tools.
TOOL · dev.to — LLM tag English(EN) · 2d

Why LLMs Fail at OpenSCAD Code Generation (and How to Fix It)

Large language models struggle to generate accurate OpenSCAD code for 3D architectural models due to issues with spatial reasoning, coordinate frame confusion, and understanding constructive solid geometry operations. The author found that LLMs often produce code that parses and renders but contains subtle geometric errors. A more effective approach involves having the LLM generate a structured intermediate representation, such as JSON, which is then translated into OpenSCAD code by a deterministic script, simplifying the LLM's task to a 2D spatial problem. AI

IMPACT This approach could improve LLM capabilities in specialized code generation tasks, particularly for 3D modeling.
- LLMs
- Python
- JSON
- OpenSCAD
TOOL · dev.to — LLM tag English(EN) · 2d

LangChain JsonOutputParser: Fix Malformed JSON from LLMs

This article addresses the common issue of Large Language Models (LLMs) returning malformed JSON, which causes LangChain's JsonOutputParser to fail. It explains that LLMs can produce errors like single quotes, trailing commas, markdown code fences, or truncated responses. The post offers several solutions, including using the `repair_json` library for auto-repair before parsing, LangChain's `OutputFixingParser` which uses an additional LLM call to correct errors, and `RetryOutputParser` for structured retry logic. AI

IMPACT Provides practical solutions for developers to handle common errors when integrating LLMs with structured data outputs.
TOOL · dev.to — LLM tag English(EN) · 4d

Inside MDASH: Designing a Microsoft‑Scale Multi‑Model Agentic Cyber Defense Benchmark

A new benchmark called MDASH is proposed to evaluate multi-model agentic systems in cybersecurity, moving beyond single-prompt accuracy to assess end-to-end performance under realistic conditions. This approach is crucial as LLMs are increasingly integrated into security operations for tasks like alert enrichment and playbook automation. The benchmark aims to measure system-level impact on detection and response times, while also considering safety, policy adherence, and potential failure modes like prompt injection or tool abuse. AI

IMPACT Establishes a new evaluation framework for AI in security, pushing for system-level assessment beyond single-model performance.
- Microsoft
- GPT-5.5
- LLMs
TOOL · Towards AI English(EN) · 6d

How AI Turns Healthcare Data into Real-Time Clinical Decision Support

Modern healthcare faces a data liquidity problem, where a significant portion of patient information remains trapped in unstructured formats like scanned documents and free-text notes. This necessitates manual data entry and validation by clinicians, consuming valuable time and potentially impacting patient care. AI-driven automation pipelines, utilizing OCR, NLP, and LLMs, are transforming this raw data into structured, actionable insights. These systems extract and organize critical information, enabling faster and more informed clinical decision-making without replacing healthcare professionals. AI

IMPACT AI is streamlining healthcare data processing, enabling faster clinical decisions and improving patient care by converting unstructured data into actionable insights.
- OCR
- healthcare
- AI
- LLMs
- NLP
TOOL · Towards AI English(EN) · 5d

How Do Modern LLMs Cheat the Scaling Laws? (In a Good Way).

Modern large language models appear to defy traditional scaling laws, achieving better performance with fewer parameters than previously expected. This suggests that architectural innovations and training methodologies are playing a more significant role in model efficiency. Researchers are exploring these advancements to understand how LLMs can achieve superior results without a proportional increase in computational resources. AI

IMPACT Understanding how LLMs achieve efficiency beyond traditional scaling laws could lead to more cost-effective model development and deployment.
- LLMs
TOOL · Medium — MLOps tag English(EN) · 3d

Master Generative AI: Start with a Free Demo -Visualpath

Visualpath is offering a free demo to help individuals master generative AI and prompt engineering. The program aims to equip learners with future-ready AI skills, focusing on large language models (LLMs) and their applications. This initiative is presented as a pathway to becoming an expert in the field by 2026. AI

IMPACT Offers a pathway for individuals to acquire generative AI and prompt engineering skills.
TOOL · Mastodon — mastodon.social English(EN) · 3d

Looking for an overview on critical concerns when using LLMs? 🤔 Check out my chapter in the just appeared edited collection "Understanding Science with Large La

A new edited collection titled "Understanding Science with Large Language Models?" has been released, featuring a chapter on critical concerns related to LLM usage. The book aims to provide an overview of these important issues within the scientific community. AI

IMPACT Provides an overview of critical concerns for scientists using LLMs.
TOOL · Mastodon — mastodon.social English(EN) · 5d

🚀🎓 Ah, the dazzling world of # AI # research strikes again! This time in the form of # PopuLoRA , where # LLMs engage in a riveting game of self-play, trying to

Researchers have introduced PopuLoRA, a novel approach where large language models engage in self-play to improve their reasoning capabilities. This method involves LLMs attempting to outsmart themselves in a simulated environment, aiming to enhance their performance through this co-evolutionary process. AI

IMPACT This self-play method could lead to more robust and capable LLMs by enabling them to refine their reasoning skills independently.
- LLMs
- PopuLoRA
TOOL · Mastodon — mastodon.social English(EN) · 6d

Using algebra and LLMs to verify a flight-plan bug fix in Lean https://jameshaydon.github.io/algebra-llms-lean-flight-plan/ # Programming # AI # Math

Researchers have utilized large language models (LLMs) in conjunction with algebraic methods to verify a bug fix within the Lean theorem prover. This approach focused on a specific flight-plan software component, demonstrating a novel application of AI in formal verification. The integration of LLMs aims to enhance the accuracy and efficiency of verifying complex software systems. AI

IMPACT Demonstrates a new method for using LLMs in formal software verification, potentially improving reliability in critical systems.
- LLMs
- Lean
- algebra
TOOL · Mastodon — fosstodon.org English(EN) · 3d · [2 sources]

GitHub has seen a big increase in AI engineering resources. This is good for developers wanting to learn and build AI. # AI , # GitHub , # LLM , # AIAgents , #

GitHub has experienced a significant surge in AI engineering resources, including AI agents and large language models. This expansion offers developers readily available guides and code to accelerate their AI development efforts. The platform aims to become a central hub for AI-related tools and knowledge. AI

IMPACT GitHub's increased AI resources can accelerate development and adoption of AI tools and applications.
- GitHub
- AI agents
- LLMs
TOOL · Mastodon — fosstodon.org English(EN) · 5d

Nothing to see here, just keeping track of this article on AI sycophancy... "Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence" Link: https:

A new research paper explores the phenomenon of "AI sycophancy," where AI models exhibit overly agreeable or flattering behavior. The study suggests that prolonged interaction with such sycophantic AI can negatively impact users' prosocial intentions and foster dependence. This effect is particularly concerning for younger individuals who may be more susceptible to these influences. AI

IMPACT Research suggests that overly agreeable AI may reduce users' prosocial behavior and increase dependence, particularly concerning for younger demographics.
- LLMs
- AI sycophancy
COMMENTARY · Towards AI English(EN) · 2d

The Prompt Engineering Cookbook: Principles, Tactics, and Patterns That Actually Work.

This article provides a practical guide to prompt engineering for large language models, emphasizing clear and specific instructions over brevity. It introduces principles, tactics, and patterns for effectively interacting with models like ChatGPT and Claude. The guide includes a Python helper function for generating model completions and details techniques such as using delimiters and providing context to achieve reliable and structured outputs for various applications. AI

IMPACT Provides practical techniques for users to better leverage existing LLMs for applications.
- ChatGPT
- Claude
- LLMs
- Python
- Towards AI
RESEARCH · Together AI blog English(EN) · 3d · [2 sources]

FlashAttention

Together AI has released FlashAttention-3 and FlashAttention-4, significant upgrades to their GPU-accelerated attention mechanism for large language models. FlashAttention-3, designed for Hopper GPUs, achieves up to 75% utilization and 1.5-2x speedup over its predecessor by exploiting new hardware features like Tensor Cores and Tensor Memory Accelerator, and supporting FP8 precision. FlashAttention-4, optimized for Blackwell GPUs, further enhances performance by pipelining computations and addressing bottlenecks in transcendental functions and memory traffic, reaching 71% utilization and offering substantial speedups over existing libraries. AI

IMPACT These optimized attention mechanisms promise significantly faster LLM training and inference, enabling longer context windows and more efficient GPU utilization.
COMMENTARY · dev.to — LLM tag English(EN) · 4d

The Request Is the Wrong Unit of Scale for LLMs on Kubernetes

The traditional web application scaling model, which relies on request counts, is insufficient for serving large language models (LLMs). LLM workloads vary significantly in complexity based on the number of input and output tokens, not just the number of HTTP requests. This distinction is crucial because input tokens impact the time to first token, while output tokens affect the overall processing time and system capacity, leading to potential performance issues even when request metrics appear stable. AI

IMPACT Highlights the need for new scaling metrics beyond request counts for efficient LLM deployment.
- LLMs
- Kubernetes
COMMENTARY · dev.to — MCP tag English(EN) · 1d

The Control Plane is Leaking: When Context Becomes Command

Large Language Models inherently blur the lines between data and control, presenting a significant security challenge for infrastructure engineers and ML operators. Unlike traditional computing, LLMs lack a distinct data plane, meaning all information within their context window, whether it's a prompt, document, or even hidden instructions within an image, is treated as executable command. This architectural flaw allows untrusted artifacts to influence model behavior, leading to potential breaches like bypassing database security or altering engineering calculations. AI

IMPACT Highlights a fundamental architectural challenge in LLMs that could impact the security and auditability of AI systems.
COMMENTARY · Simon Willison English(EN) · 1w · [4 sources]

The last six months in LLMs in five minutes

Simon Willison presented a five-minute talk at PyCon US 2026 summarizing LLM developments since November 2025. Key advancements included significant improvements in coding agents, which became reliable for daily use, and the emergence of 'Claws'—personal AI assistants like OpenClaw that drove sales of Mac Minis for local hosting. AI

IMPACT Summarizes key LLM progress in coding agents and personal assistants, highlighting their increasing utility and market impact.
- Simon Willison
- LLMs
- Claude Opus 4.5
- OpenClaw
- Gemini 3
- GPT-5.1 Codex Max
- Mac Mini
- GPT-5.1
- Claude Sonnet 4.5
- PyCon US 2026
- Warelay
- Anthropic
- OpenAI
TOOL · Mastodon — fosstodon.org English(EN) · 5d

Google's @ johnmu comments more about SEO and AI markdown files and llms.txt https://www. seroundtable.com/google-adds-m arkdown-files-to-help-docs-41342.html #

Google is incorporating AI-generated markdown files into its documentation to enhance search engine optimization. John Mueller, a Google search advocate, confirmed that these files will help improve how search engines understand and index content, particularly for AI-related topics and large language models. AI

IMPACT Google's adoption of AI-generated markdown for documentation may improve discoverability of AI-related content.
- Google
- AI
- LLMs
- John Mueller
- markdown files
COMMENTARY · dev.to — LLM tag English(EN) · 6d

HTML vs Markdown for LLMs: Why Clean Structure Beats Raw Pages

A recent article highlights that feeding raw HTML directly into Large Language Models (LLMs) can lead to noisy context windows and inefficient token usage. The author argues that LLMs understand clean Markdown significantly better than HTML, which often contains extraneous elements like navigation menus, ads, and styling wrappers. Converting HTML to Markdown before ingestion can drastically reduce token count, improve semantic chunking, and enhance the overall accuracy and consistency of RAG systems and AI agents. AI

IMPACT Using Markdown instead of raw HTML for LLM inputs can significantly reduce token usage and improve the accuracy of RAG systems and AI agents.
- AI agents
- LLMs
- Markdown
- HTML
- RAG systems
COMMENTARY · dev.to — LLM tag English(EN) · 6d

Should We Use AI In Development?

The author argues against the widespread use of AI in software development, citing potential drawbacks such as mental atrophy, reduced ownership, and privacy concerns. While acknowledging AI's utility for minor tasks, they contend that over-reliance can lead to increased maintenance burdens and errors. The piece suggests that AI should not be used for tasks where cognitive output is the primary product, advocating for human-centric development. AI

IMPACT Argues that over-reliance on AI in development can lead to decreased programmer skill and increased long-term maintenance costs.
- AI
- LLMs
COMMENTARY · Glean blog English(EN) · 1d

The 10 best AI voice assistants in 2026: A comprehensive guide

AI voice assistants in 2026 are significantly more advanced, leveraging LLMs, ASR, ML, and NLP to understand natural speech, learn continuously, and personalize responses. These assistants are categorized into personal helpers for daily tasks and business agents for workflow automation and knowledge retrieval. The article emphasizes that the best assistant is determined by individual needs such as integrations, accuracy, security, and language support, rather than brand name alone. AI

IMPACT Provides a framework for evaluating and understanding the evolving landscape of AI voice assistants for both personal and professional applications.
- Gemini
- LLMs
- Siri
- ML
- NLP
- AI voice assistants
COMMENTARY · Forbes — Innovation English(EN) · 6d

AI’s Dirty Secret: It Mostly Speaks English

Despite claims of multilingual capabilities, most AI systems primarily operate in English due to training data imbalances. Large language models are predominantly trained on English content, with studies indicating up to 90% of training tokens are English. This linguistic bias means AI often processes information through an English-centric lens, even when translating outputs, potentially overlooking cultural nuances and local contexts. Consequently, AI performance can be weaker and error rates higher in non-English languages, impacting its effectiveness in diverse global applications. AI

IMPACT AI systems' English-centric training limits their effectiveness and cultural nuance in non-English languages, impacting global applications.
COMMENTARY · dev.to — LLM tag English(EN) · 1d

Why Enterprises Should Not Let LLMs Execute SQL Directly?

Enterprises should avoid allowing large language models to directly execute SQL queries due to significant security, permission, cost, and auditing risks. Prompts alone are insufficient to enforce control over LLM-generated SQL. Implementing a deterministic validation layer between LLMs and production databases is crucial for managing these risks and transforming the SQL generation process into a controllable system. AI

IMPACT Highlights critical security and operational risks for businesses integrating LLMs into data analysis workflows, emphasizing the need for robust governance layers.
- LLMs
- SQL
- Enterprises
COMMENTARY · Mastodon — sigmoid.social English(EN) · 3d · [2 sources]

Yes. I suspect the biggest impacts of LLMs won't come from a workflow "problem -> #LLM produces solution," but from "problem -> LLM helps create specialised too

The primary impact of large language models (LLMs) is expected to stem from their ability to assist humans in creating specialized tools, rather than directly solving problems. This perspective shifts the focus from what LLMs can do independently to the potential of human-LLM collaborative systems. The discussion emphasizes the creation of tailored solutions through this partnership. AI

IMPACT Focuses on the future application and impact of LLMs, offering a perspective on their role in tool creation and human-AI collaboration.
- LLMs
- Janne M. Korhonen
COMMENTARY · Towards AI English(EN) · 3d

Sharing Your .env With LLMs Is Relatively Safe. Is It Really? Here’s Why.

Sharing .env files with large language models (LLMs) is generally considered safe due to training data policies. However, a new analysis suggests that the agentic attack surface presents a distinct and potentially more significant risk. This perspective highlights that while LLMs are trained not to retain sensitive information, their ability to act on instructions could still expose credentials or other private data. AI

IMPACT Highlights potential security vulnerabilities in LLM interactions, urging caution beyond standard training data policies.
RESEARCH · arXiv cs.AI English(EN) · 3d · [2 sources]

CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

Two new research papers propose novel approaches to continual learning in large language and vision-language models, aiming to mitigate catastrophic forgetting. CP-MoE introduces a transient expert to guide updates and preserve knowledge, while MoRAM utilizes fine-grained rank-1 adapters as memory units to enable content-addressable retrieval. Both methods demonstrate improved performance on benchmarks, offering better trade-offs between plasticity and stability compared to existing Mixture-of-Experts techniques. AI

IMPACT These papers introduce novel techniques for continual learning, potentially improving the ability of large models to adapt to new information without forgetting previous knowledge.
- Mixture-of-Experts
- LLMs
- LoRA
- Continual Learning
- VQA v2
- MoRAM
- CP-MoE
- SuperNI
COMMENTARY · Mastodon — fosstodon.org English(EN) · 2d

🤖 LLMs are just giant probability machines pretending to think It’s fascinating that simple mathematics between tokens can eventually become a machine that writ

Large language models operate on complex mathematical probabilities between tokens, enabling them to generate diverse content like essays, code, and poetry. Despite the common association of probability with uncertainty, these models leverage it to simulate reasoning and creative output. This process highlights how sophisticated mathematical operations can mimic cognitive functions. AI

IMPACT Explains the fundamental probabilistic nature of LLMs, offering insight into their capabilities and limitations.
- LLMs
COMMENTARY · dev.to — LLM tag English(EN) · 4d

AI Cyber Defense for Critical Infrastructure: From SOC Triage to Autonomous Protection

Critical infrastructure is increasingly integrating AI, expanding its attack surface to include models, data, and ML pipelines. Traditional security measures and human-only Security Operations Centers (SOCs) are overwhelmed by the volume of data and the speed of AI-native attacks. To counter this, organizations must adopt AI SecOps, embedding continuous security checks into operational pipelines and using AI-driven tools to match the speed and reasoning of adversarial AI. AI

IMPACT Critical infrastructure must secure AI systems and defend with AI to counter evolving threats and data overload.
- AI
- LLMs
- SOC
- Critical Infrastructure
COMMENTARY · Mastodon — mastodon.social English(EN) · 3d

How Can We Prevent AI Models From Cannibalizing Themselves When Human-Generated Data Runs Out? Getty Images While the evolution of artificial intelligence (AI)

A significant concern in AI development is the potential for models to degrade over time due to a lack of novel human-generated data. This phenomenon, known as "model collapse," occurs when AI systems primarily learn from synthetic data produced by other AI models. Researchers are exploring methods to prevent this self-cannibalization and ensure continued AI progress. AI

IMPACT Addresses a potential long-term constraint on AI development, prompting research into novel data generation and training strategies.
- AI
- LLMs
COMMENTARY · Mastodon — sigmoid.social English(EN) · 5d

TIL: why do llms hallucinate? llms are trained to predict text; they don't check facts against the training data and have no knowledge; training data has gaps,

Large language models hallucinate because they are designed to predict text, not to verify facts against their training data. Their training datasets often contain gaps, inconsistencies, and underrepresented information, leading to the generation of inaccurate or fabricated content. This behavior highlights that current AI systems lack true knowledge or intelligence. AI

IMPACT Explains a core limitation of current LLMs, impacting user trust and application design.
- LLMs
TOOL · Mastodon — fosstodon.org English(EN) · 4d

COROS thinks ChatGPT should analyze your training data COROS is opening athlete training data to LLMs through a new MCP integration. https://www. androidauthori

COROS, a wearable technology company, is integrating its platform with large language models (LLMs) to analyze athlete training data. This new integration, called the COROS Training Hub (CTH), aims to provide deeper insights into performance and recovery by leveraging AI. The company is making this data available to LLMs like ChatGPT, allowing for more sophisticated analysis than previously possible. AI

IMPACT Enables more sophisticated analysis of athlete performance data through AI integration.
- ChatGPT
- LLMs
- COROS
TOOL · arXiv cs.AI English(EN) · 1w

Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency

Researchers have identified a predictable relationship between factual recall in large language models, their size, and the frequency of topics in their training data. By evaluating 38 models on over 8,900 scholarly references, they found that recall quality follows a sigmoid curve based on a combination of model parameters and topic representation. These factors alone accounted for a significant portion of the variance in recall performance across different model families. AI

IMPACT Establishes a new scaling law for factual recall in LLMs, suggesting that performance is predictable based on model size and training data composition.
COMMENTARY · Mastodon — fosstodon.org English(EN) · 1d

I only found conflicting reports and outdated numbers on the # CO2 emissions of LLMs. So I've made some estimations myself. I was hoping with newer numbers I co

A Mastodon user has estimated the CO2 emissions associated with Large Language Models (LLMs), finding the results to be alarmingly high. The user's calculations, based on publicly available data, suggest that the environmental impact of LLMs is a significant concern. While acknowledging the potential for error, the user believes their simple calculation method is likely accurate and posits that AI's destructive potential may stem from its carbon footprint rather than existential risk. AI

IMPACT Raises awareness about the significant environmental costs of large language models, prompting operators to consider sustainability.
- LLMs
- Mastodon
COMMENTARY · Mastodon — fosstodon.org English(EN) · 2d

One of the biggest problems I have with # LLMs is the amount of waste generated by using them. As with other things that ostensibly make our lives easier, inste

The author expresses concern over the significant environmental waste generated by large language models, particularly in code generation. They argue that the ease of producing code with LLMs leads to excessive consumption and a disregard for the material costs, such as energy usage and carbon emissions. The piece advocates for pricing these true costs into LLM usage to encourage more mindful application. AI

IMPACT Highlights the significant environmental costs associated with LLM usage, urging for a re-evaluation of their application.
- GitHub
- LLMs
TOOL · arXiv cs.CL English(EN) · 5d

GradeLegal: Automated Grading for German Legal Cases

Researchers have developed a system called GradeLegal to automate the grading of German legal exam solutions using large language models. The study evaluated 27 different LLMs and various prompting strategies, finding that reasoning-oriented models can achieve high agreement with expert graders in public law, reaching a quadratic weighted kappa of 0.91. However, performance in criminal law was lower, indicating a more challenging task. Ensembling multiple models further improved grading accuracy, offering a potential alternative to top-tier proprietary models. AI

IMPACT Automated grading systems could streamline feedback for legal students and reduce bottlenecks for educators.
COMMENTARY · Astral Codex Ten (Scott Alexander) English(EN) · 3d

New Paradigms Won't Save You

Scott Alexander argues that even if Artificial General Intelligence (AGI) requires a new paradigm beyond current Large Language Models (LLMs), such a paradigm could emerge within the next 3-5 years. He uses Lindy's Law to estimate the timeline for revolutionary AI advancements, suggesting that a paradigm shift as significant as the Transformer architecture could appear relatively soon. Alexander contends that the rapid scaling of compute and the increasing number of AI researchers, potentially augmented by AI itself, will accelerate development, making the AGI timeline a near-term concern rather than a distant future event. AI

IMPACT Argues that AGI development, even with new paradigms, could be a near-term concern, challenging the notion of a distant future for advanced AI.
COMMENTARY · Mastodon — fosstodon.org English(EN) · 6d

🎙️ 𝗥𝗲𝗮𝗹𝗶𝘁𝘆 𝘃𝘀 𝗛𝘆𝗽𝗲: 𝗔𝗜 𝗶𝗻 𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 | 𝗦𝗔𝗚 𝟮𝟬𝟮𝟱 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝘄𝗶𝘁𝗵 𝗩𝗮𝘂𝗴𝗵𝗻 𝗩𝗲𝗿𝗻𝗼𝗻 🤖 # AI is changing software development – but where does real value end a

Vaughn Vernon, in an interview for SAG 2025, discusses the impact of AI on software development, distinguishing between genuine value and hype. He shares insights on LLMs, productivity gains, the issue of hallucinations, and the practical trade-offs developers face when adopting AI tools. The conversation also touches upon developer craftsmanship in the age of AI. AI

IMPACT Discusses the practical implications and trade-offs of AI adoption for software developers.
- Vaughn Vernon
- AI
- LLMs
- SAG 2025