PulseAugur / Brief
EN
LIVE 22:47:25

Brief

last 24h
[50/9096] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. RAG Is A Hack - with Jerry Liu from LlamaIndex

    LlamaIndex, a tool for integrating large datasets with language models, has seen significant growth since its inception in October 2022. Initially developed as "GPT Tree Index" to address limitations with GPT-3's context window, it has become a leading platform for Retrieval Augmented Generation (RAG). The project's open-source community expanded rapidly after the release of LlamaHub, which provides over 200 data connectors for various sources. AI

    RAG Is A Hack - with Jerry Liu from LlamaIndex
  2. Non-engineers guide: Train a LLaMA 2 chatbot

    Hugging Face has released a guide aimed at non-engineers to train a LLaMA 2 chatbot. The guide provides a step-by-step process, making it accessible for individuals without extensive technical backgrounds. It covers the essential aspects of chatbot training using the LLaMA 2 model, enabling a broader audience to engage with AI development. AI

    Non-engineers guide: Train a LLaMA 2 chatbot
  3. Responsible Scaling Policies (RSPs)

    METR (Model Evaluation & Threat Research) has proposed Responsible Scaling Policies (RSPs) as a framework for AI developers to manage the risks associated with increasing AI capabilities. RSPs define specific thresholds for dangerous AI capabilities and outline the protective measures required to safely continue development. While voluntary adoption is encouraged, METR acknowledges that these policies alone may not be sufficient to prevent AI-related catastrophes and suggests they could serve as a precursor to future regulation. AI

    Responsible Scaling Policies (RSPs)
  4. GPT-4V(ision) system card

    OpenAI has released a system card detailing the safety properties of its GPT-4V model, which can analyze image inputs. This multimodal capability is seen as a significant advancement in AI research, expanding the potential applications of large language models. The system card elaborates on the evaluations, preparations, and mitigation strategies implemented to ensure the safe handling of image data within GPT-4V. AI

    GPT-4V(ision) system card
  5. Contributor Spotlight: Mohammad Aflah Khan

    Mohammad Aflah Khan, an undergraduate student at IIIT Delhi, has contributed significantly to EleutherAI's Pythia model suite, focusing on bias evaluation. His work highlights the critical importance of open access to AI model training data, methodologies, and computational resources. Khan is particularly interested in AI-assisted tools and the challenge of data quality and model upkeep in a rapidly evolving world. AI

    Contributor Spotlight: Mohammad Aflah Khan
  6. The third New England RLHF Hackers Hackathon

    The New England RLHF Hackers (NERH) group, primarily composed of EleutherAI collaborators, held their third hackathon focusing on Reinforcement Learning from Human Feedback (RLHF). Projects explored training models with Inverse Learning from Q-learning, aligning LLMs with idealized reward models instead of human preferences, and visualizing reward model behavior using techniques like QDAIF. Another project investigated using Sparse Autoencoders to identify features within reward models that influence their scoring, revealing potential biases against certain topics like politics or pregnancy. The group also discussed methods for directly evaluating reward models independent of the full RLHF training process. AI

    The third New England RLHF Hackers Hackathon
  7. Advancing red teaming with people and AI

    OpenAI has announced new initiatives to enhance AI safety through red teaming, a process of using people and AI to identify potential risks in new systems. The company is sharing two papers detailing their approach to external red teaming and introducing a new method for automated red teaming. Additionally, OpenAI is launching a Red Teaming Network to formally recruit domain experts from diverse backgrounds to collaborate on evaluating and improving the safety of their AI models throughout the development lifecycle. AI

    Advancing red teaming with people and AI
  8. 3D Skew-Normal Splatting

    Researchers are advancing 3D Gaussian Splatting (3DGS) with new methods for improved scene representation, editing, and compression. Innovations include Skew-Normal Splatting for better modeling of asymmetric structures, and PanoWorld for generating consistent multi-room VR tours. Other developments focus on physics-driven scene editing for autonomous driving, aesthetic assessment of 3DGS content, and efficient compression techniques like GETA-3DGS. AI

    3D Skew-Normal Splatting

    IMPACT Advances in 3DGS offer improved realism and efficiency for applications in VR, autonomous driving, and content creation.

  9. Object Detection Leaderboard

    Hugging Face has launched a new leaderboard specifically for object detection models. This platform aims to track and rank the performance of various models in this computer vision task. It provides a centralized resource for researchers and developers to compare different object detection approaches and identify state-of-the-art solutions. AI

    Object Detection Leaderboard
  10. Fine-tuning Llama 2 70B using PyTorch FSDP

    Hugging Face has released a guide detailing how to fine-tune Meta's Llama 2 70B model using PyTorch's Fully Sharded Data Parallel (FSDP) feature. This method significantly reduces memory requirements, enabling the fine-tuning process on more accessible hardware. The guide emphasizes efficient training techniques to make large language model customization more feasible for a wider range of users and researchers. AI

    Fine-tuning Llama 2 70B using PyTorch FSDP
  11. Diffusion Models for Video Generation

    Researchers are exploring advanced diffusion models for video generation, addressing challenges like temporal consistency and data scarcity. New methods focus on improving parameterization, such as the v-prediction technique, and incorporating conditional sampling for tasks like extending video length or filling missing frames. Efforts are also underway to enhance efficiency and controllability through post-training frameworks, hybrid attention mechanisms, and semantic-visual adaptation, aiming for real-time generation and higher quality outputs. AI

    Diffusion Models for Video Generation

    IMPACT Advances in diffusion models are improving video generation quality, efficiency, and controllability, potentially enabling new applications in content creation and analysis.

  12. Spread Your Wings: Falcon 180B is here

    Technology Innovation Institute (TII) has released Falcon 180B, a new large language model, making it available on Hugging Face. This model boasts 180 billion parameters and is designed for research and commercial use. Falcon 180B is noted for its strong performance on various benchmarks, positioning it as a significant open-source alternative in the LLM landscape. AI

    Spread Your Wings: Falcon 180B is here
  13. Evaluation & Hallucination Detection for Abstractive Summaries

    Evaluating abstractive summarization, which involves rephrasing source material rather than copying sentences, presents challenges, particularly in assessing relevance and factual consistency. While fluency and coherence are largely addressed by modern language models, measuring relevance remains subjective. Detecting factual inconsistencies, or hallucinations, is a key focus, with studies indicating significant error rates in generated summaries, such as up to 30% in CNN/DailyMail datasets. Common evaluation methods include n-gram-based metrics like ROUGE and embedding-based metrics, alongside techniques like natural language inference and question-answering for hallucination detection. AI

    Evaluation & Hallucination Detection for Abstractive Summaries
  14. Code Llama: Llama 2 learns to code

    Meta AI has released Code Llama, a family of large language models specifically designed for coding tasks. These models are built upon Llama 2 and come in various sizes, including a 7B, 13B, and 34B parameter version. Code Llama also includes specialized versions for Python and an instruction-following model, aiming to improve code generation and understanding. AI

    Code Llama: Llama 2 learns to code
  15. The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI

    Quentin Anthony of EleutherAI discussed the mathematics behind training large language models in a recent podcast. He highlighted the importance of understanding compute requirements, introducing a core equation that relates compute (C) to model parameters (P) and dataset size (D). The discussion also covered practical aspects like GPU tradeoffs, model precision, and memory optimization techniques such as activation recomputation and distributed training strategies like ZeRO. AI

    The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI
  16. Fine-tune Llama 2 with DPO

    Hugging Face has released a new library called TRL, which simplifies the process of fine-tuning large language models using Direct Preference Optimization (DPO). This method allows for more efficient and stable training compared to traditional reinforcement learning techniques. The library is designed to be user-friendly, enabling developers to easily integrate DPO into their existing workflows for models like Llama 2. AI

    Fine-tune Llama 2 with DPO
  17. Towards Encrypted Large Language Models with FHE

    Researchers have developed a method to run large language models using fully homomorphic encryption (FHE), allowing computations on encrypted data without decryption. This breakthrough enables privacy-preserving AI applications where sensitive user data can be processed securely. The approach integrates FHE techniques with existing LLM architectures, paving the way for confidential AI services. AI

    Towards Encrypted Large Language Models with FHE
  18. Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny

    Hugging Face has released the code and weights for SD-Small and SD-Tiny, two smaller versions of its Stable Diffusion model. These models were created using knowledge distillation, a technique that trains a smaller model to mimic the behavior of a larger one. The goal is to make powerful image generation models more accessible and efficient for researchers and developers. AI

    Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny
  19. Building Secure AI Gateways with MLflow AI Gateway

    Google Research has introduced ReasoningBank, a novel framework designed to enhance AI agents' ability to learn from their experiences, both successes and failures, after deployment. This system distills generalizable reasoning strategies from past interactions, allowing agents to continuously improve and avoid repeating mistakes. Separately, new research explores optimizing multi-agent communication through latent representations and introduces Agent Evolving Learning (AEL) for agents operating in open-ended environments, focusing on how to effectively use remembered information. Additionally, DeepSeek has released preview models of its V4 series, offering large context windows and advanced capabilities at a significantly lower cost than comparable frontier models. AI

    IMPACT New frameworks for agent learning and memory, alongside cost-effective frontier models, could accelerate AI adoption in complex tasks and personalized applications.

  20. FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

    Tri Dao, a recent Stanford PhD graduate and key author of the FlashAttention paper, discussed the advancements in attention mechanisms for Transformers on the Latent Space podcast. FlashAttention, first released in May 2022, significantly speeds up Transformer models by optimizing memory usage and reducing read/write overhead between GPU memory types. The newly released FlashAttention-2 further enhances these capabilities, making it a standard component in many open-source large language models. AI

    FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI
  21. There's a new Llama in town

    Meta AI has released Llama 2, a new large language model that is expected to significantly impact the LLM landscape. This release includes a new NeRF model called Zip-NeRF, capable of generating 3D scenes from 2D images. The hosts also discussed new functionalities from OpenAI and compared them with Anthropic's Claude 2. AI

    There's a new Llama in town
  22. Welcoming Llama Guard 4 on Hugging Face Hub

    Meta AI has released Llama 4, a new family of open-source large language models, available on Hugging Face. This release includes Llama Guard 4, a model specifically designed for safety, and two other models, Maverick and Scout. The availability of these models on Hugging Face Hub facilitates broader access and experimentation within the AI community. AI

    Welcoming Llama Guard 4 on Hugging Face Hub
  23. Minetester: A fully open RL environment built on Minetest

    EleutherAI has developed Minetester, an open-source reinforcement learning environment built on the Minetest voxel engine. This new environment aims to facilitate AI alignment research by offering greater transparency and customizability compared to existing Minecraft-based frameworks like MineRL. Minetester provides features such as programmatic reward relay, client-server synchronization for deterministic gameplay, headless operation, and a Python client wrapper for integration with modern ML frameworks. AI

    Minetester: A fully open RL environment built on Minetest
  24. Ethics and Society Newsletter #4: Bias in Text-to-Image Models

    A recent Hugging Face newsletter highlights significant biases present in text-to-image AI models. These biases can lead to the perpetuation of harmful stereotypes and underrepresentation of certain demographic groups. The newsletter emphasizes the need for ongoing research and development to mitigate these ethical concerns and promote fairer AI systems. AI

    Ethics and Society Newsletter #4: Bias in Text-to-Image Models
  25. What's going on with the Open LLM Leaderboard?

    The Hugging Face Open LLM Leaderboard has updated its evaluation methodology to include the MMLU benchmark, a comprehensive test of language model knowledge across 57 subjects. This change aims to provide a more robust assessment of model capabilities by incorporating a wider range of academic and professional domains. The leaderboard now uses a weighted average of MMLU scores alongside existing benchmarks to rank open-source large language models. AI

    What's going on with the Open LLM Leaderboard?
  26. LLM Powered Autonomous Agents

    Lilian Weng's blog post details the architecture of LLM-powered autonomous agents, highlighting key components like planning, memory, and tool use. The post explains how agents can break down complex tasks, reflect on past actions for improvement, and utilize external tools or vector stores for information retrieval. Techniques such as Chain of Thought and Tree of Thoughts are discussed for task decomposition, while ReAct is presented as a method for integrating reasoning and action. AI

    LLM Powered Autonomous Agents
  27. Fine-Tune MMS Adapter Models for low-resource ASR

    Hugging Face has released new adapter models for their MMS (Massively Multilingual Speech) ASR system. These adapters are designed to improve performance on low-resource languages, enabling better speech recognition for a wider range of linguistic communities. The release focuses on making ASR technology more accessible and effective for languages with limited existing training data. AI

    Fine-Tune MMS Adapter Models for low-resource ASR
  28. Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)

    Researchers have demonstrated the effectiveness of Transformer models for time series forecasting tasks. The Autoformer architecture, specifically designed for this purpose, shows strong performance by decomposing the time series into seasonal and trend components. This approach allows for more accurate predictions by handling complex temporal dependencies. AI

    Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)
  29. Can foundation models label data like humans?

    Hugging Face's Open LLM Leaderboard is exploring the use of large language models (LLMs) for data labeling, aiming to replicate human-level accuracy. This approach could significantly speed up and reduce the cost of data annotation for training AI models. The blog post discusses the potential and challenges of using LLMs in this capacity, particularly in comparison to traditional human annotators. AI

    Can foundation models label data like humans?
  30. The Falcon has landed in the Hugging Face ecosystem

    The Falcon large language model has been integrated into the Hugging Face ecosystem. This integration makes the model more accessible to developers and researchers. Falcon is known for its strong performance on various benchmarks and its open-source nature. AI

    The Falcon has landed in the Hugging Face ecosystem
  31. Improving mathematical reasoning with process supervision

    OpenAI has developed a new method called process supervision to improve AI's mathematical reasoning capabilities. This technique rewards each correct step in a problem-solving process, rather than just the final answer, leading to better performance and reduced hallucinations. The company found that process supervision not only enhances accuracy but also offers alignment benefits by directly training models to produce human-endorsed reasoning chains. OpenAI has released its dataset to encourage further research into this promising approach. AI

    Improving mathematical reasoning with process supervision
  32. Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

    Researchers are developing advanced quantization techniques to make large language models (LLMs) more efficient. New methods like AutoRound, LATMiX, and GSQ aim to reduce model size and computational requirements, enabling deployment on less powerful hardware. These approaches focus on optimizing how model weights and activations are represented at lower bit-widths, with some achieving accuracy comparable to higher-precision models. Innovations include novel calibration strategies for post-training quantization and learnable affine transformations to improve robustness. AI

    Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

    IMPACT Enables more efficient deployment of LLMs on resource-constrained devices, potentially lowering inference costs and increasing accessibility.

  33. MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML

    MosaicML has released MPT-7B, an open-source transformer model trained on one trillion tokens that matches LLaMA-7B's quality and is commercially licensed. This model boasts an impressive context length of up to 84,000 tokens, significantly exceeding limitations found in models like GPT-3. MosaicML also open-sourced its LLM Foundry codebase used for training and evaluation, alongside three fine-tuned versions of MPT-7B, including one specialized for long-form storytelling. AI

    MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML
  34. Creating instruction tuned models

    Erin Mikail Staples discussed the creation of instruction-tuned Large Language Models at ODSC East. The conversation focused on the critical role of human feedback in this process. Staples also highlighted the significance of open data and practical tools for data annotation and fine-tuning custom generative AI models. AI

    Creating instruction tuned models
  35. Large-scale Near-deduplication Behind BigCode

    Hugging Face has detailed its large-scale near-deduplication process, a crucial step in preparing massive datasets for training large language models. This method focuses on identifying and removing near-duplicate data points, which is essential for improving model efficiency and performance. The blog post outlines the technical challenges and solutions involved in processing datasets of unprecedented scale. AI

    Large-scale Near-deduplication Behind BigCode
  36. RWKV: Reinventing RNNs for the Transformer Era — with Eugene Cheah of UIlicious

    The RWKV (Receptance Weighted Key Value) project introduces a novel architecture that revives Recurrent Neural Networks (RNNs) while incorporating advantages typically found in Transformers. This approach aims to overcome the scaling limitations of traditional Transformers, particularly in training and inference, while maintaining competitive performance on reasoning benchmarks. The RWKV project is characterized by its distributed, international, and largely volunteer-driven community, drawing parallels to early EleutherAI efforts. AI

    RWKV: Reinventing RNNs for the Transformer Era — with Eugene Cheah of UIlicious
  37. Why language models hallucinate

    OpenAI has published research addressing two key challenges in large language models: hallucinations and interpretability. Their paper on hallucinations argues that current evaluation methods incentivize models to guess rather than admit uncertainty, leading to confident but false statements. To combat this, they propose penalizing confident errors more heavily than uncertainty. In parallel, OpenAI has developed a method using GPT-4 to automatically generate and score natural language explanations for the behavior of individual neurons within language models, releasing a dataset for GPT-2 to aid interpretability research. AI

    Why language models hallucinate
  38. Creating a Coding Assistant with StarCoder

    Hugging Face has released StarCoder, a new large language model specifically trained for code generation. This model is built on the StarChat architecture and has been trained on a massive dataset of permissively licensed code from GitHub. StarCoder aims to provide developers with a powerful and accessible tool for various coding tasks. AI

    Creating a Coding Assistant with StarCoder
  39. Automating code optimization with LLMs

    Researchers are exploring various methods to enhance Large Language Models (LLMs) for code-related tasks. One study evaluates locally deployed LLMs like LLaMA 3.2 and Mistral for Python bug detection, finding they can identify bugs but struggle with precise localization. Another paper introduces TreeCoder, a framework to optimize LLM code generation by treating decoding strategies and constraints as optimizable components, improving accuracy on benchmarks like MBPP and SQL-Spider. Additionally, a case study at BMW demonstrates how fine-tuning LLMs like Qwen2.5-Coder and DeepSeek-Coder can generate and modify enterprise domain-specific languages across multiple files. Finally, a new approach called CAT uses call-chain awareness to improve LLM-based unit test generation for Java projects, significantly boosting code coverage. AI

    Automating code optimization with LLMs

    IMPACT Advances in LLM code generation and analysis techniques could lead to more robust and efficient software development tools.

  40. Training a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit

    Replit has open-sourced its new code-focused large language model, replit-code-v1-3b. This model, which is significantly smaller than OpenAI's Codex, reportedly outperforms it on the HumanEval benchmark when fine-tuned on Replit's data. The release was discussed in an interview with Replit's Head of AI, Reza Shabani, who detailed the journey of training the model and its potential applications for developers. AI

    Training a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit
  41. Transformer Math 101

    EleutherAI has published a blog post detailing the fundamental mathematical equations governing the training costs of transformer language models. The post explains that compute requirements are primarily determined by the number of parameters and the dataset size, with a key formula being C ≈ τ T = 6PD. It also discusses the concept of "compute optimal" models, referencing the Chinchilla scaling laws where dataset size is approximately 20 times the number of parameters, and provides practical engineering takeaways for calculating and optimizing these costs. AI

    Transformer Math 101
  42. Graph Classification with Transformers

    Hugging Face has released a new blog post detailing how to perform graph classification tasks using Transformer models. The post provides a practical guide, likely aimed at researchers and developers, on leveraging the power of Transformers for analyzing graph-structured data. This approach could open new avenues for applying advanced deep learning techniques to domains where graph data is prevalent. AI

    Graph Classification with Transformers
  43. Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow

    Meta AI has released its Segment Anything Model (SAM), a significant advancement in computer vision, which includes the model, weights, data, and a demo website. This open-source release is notable for its extensive dataset, containing significantly more images and masks than previous datasets. The podcast features Joseph Nelson of Roboflow discussing SAM's capabilities, including its zero-shot transfer and promptability, and demonstrating its integration into Roboflow's platform. The discussion also touches upon the broader landscape of multimodal AI and the remaining challenges in computer vision. AI

    Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow
  44. Making LLMs more accurate by using all of their layers

    Google Research has developed a new framework to evaluate the behavioral alignment of large language models with human social inclinations. This approach adapts established psychological questionnaires into large-scale situational judgment tests, allowing for the quantification of model tendencies in realistic scenarios. The research identifies gaps where model behaviors deviate from human consensus or fail to capture the range of human opinions, aiming to improve LLM navigation of social dynamics. Separately, Google Research also introduced SLED, a novel decoding strategy that enhances LLM factuality by utilizing all model layers instead of just the final one, without requiring external data or fine-tuning. AI

    Making LLMs more accurate by using all of their layers

    IMPACT New methods for evaluating LLM alignment and improving factuality could lead to more trustworthy and socially adept AI systems.

  45. StackLLaMA: A hands-on guide to train LLaMA with RLHF

    Hugging Face has released StackLLaMA, an open-source model trained on code and natural language. This model is designed to assist developers with coding tasks, offering capabilities such as code generation and explanation. The release aims to provide a powerful, accessible tool for the AI development community. AI

    StackLLaMA: A hands-on guide to train LLaMA with RLHF
  46. Complex Systems are Hard to Control

    Deep learning systems are complex adaptive systems, similar to ecosystems or financial markets, making them difficult to control through traditional engineering approaches. These systems exhibit emergent behaviors and feedback loops, leading to unintended consequences when straightforward attempts are made to guide their actions. The author suggests that safety measures must account for this complex adaptive nature, moving beyond simple reliability and redundancy. AI

    Complex Systems are Hard to Control
  47. Exploratory Analysis of TRLX RLHF Transformers with TransformerLens

    Researchers have demonstrated a method for training and analyzing language models using Reinforcement Learning from Human Feedback (RLHF). The process involves using the TRLX library for RLHF fine-tuning and TransformerLens for mechanistic interpretability. This approach was used to fine-tune a GPT-2 model to generate negatively biased movie reviews and then analyze the model to identify specific network regions responsible for this behavior. AI

    Exploratory Analysis of TRLX RLHF Transformers with TransformerLens
  48. ARC Evals is now METR

    The Alignment Research Center's (ARC) evaluation team has officially spun off to form a new independent nonprofit organization named METR (Model Evaluation & Threat Research). METR will continue its work on evaluating frontier AI systems, focusing on their autonomous capabilities and potential threats, including AI self-improvement and evasion of oversight. The organization, led by Beth Barnes, has previously partnered with leading AI labs like OpenAI and Anthropic for evaluations and aims to develop rigorous testing methodologies to ensure AI safety before widespread deployment. AI

    ARC Evals is now METR
  49. GPTs are GPTs: An early look at the labor market impact potential of large language models

    OpenAI has released a report analyzing the potential impact of large language models (LLMs) on the U.S. labor market. The study suggests that around 80% of the workforce could experience at least a 10% change in their job tasks due to LLMs, with approximately 19% facing potential impacts on over half of their work. This exposure is not confined to specific income brackets or industries, indicating broad economic and social implications. AI

    GPTs are GPTs: An early look at the labor market impact potential of large language models
  50. Prompt Engineering

    Prompt engineering, also known as in-context prompting, involves guiding Large Language Models (LLMs) to achieve desired outcomes without altering their underlying weights. This empirical field focuses on autoregressive language models and aims to improve alignment and steerability. Basic techniques include zero-shot learning, where the model is given a task directly, and few-shot learning, which provides examples to better guide the model's understanding and performance. AI

    Prompt Engineering