Brief

last 24h

[8/8] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

COMMENTARY · Gary Marcus English(EN) · 5d

Could generative AI could turn out to be the tech industry’s Vietnam? And could public backlash lead AI to a better place?

Cognitive scientist Gary Marcus draws a parallel between the current generative AI boom and the Vietnam War, suggesting that massive, costly investments may yield little return and be fueled by arrogance. He notes a growing public backlash against AI, evidenced by commencement speakers being booed for mentioning the technology. Marcus also points to a potential political shift, with Donald Trump reportedly considering AI pre-flight checks, a move Marcus had previously predicted and advocated for. AI

IMPACT Public backlash and potential policy shifts could slow AI development or steer it toward safer, more regulated paths.
COMMENTARY · Astral Codex Ten (Scott Alexander) English(EN) · 3d

New Paradigms Won't Save You

Scott Alexander argues that even if Artificial General Intelligence (AGI) requires a new paradigm beyond current Large Language Models (LLMs), such a paradigm could emerge within the next 3-5 years. He uses Lindy's Law to estimate the timeline for revolutionary AI advancements, suggesting that a paradigm shift as significant as the Transformer architecture could appear relatively soon. Alexander contends that the rapid scaling of compute and the increasing number of AI researchers, potentially augmented by AI itself, will accelerate development, making the AGI timeline a near-term concern rather than a distant future event. AI

IMPACT Argues that AGI development, even with new paradigms, could be a near-term concern, challenging the notion of a distant future for advanced AI.
COMMENTARY · Mastodon — fosstodon.org English(EN) · 5d · [2 sources]

“Could the multitrillion dollar #investment in #AI , burning money at unprecedented rates, and still struggling with #hallucinations , #unreliability and #misal

The current massive investments in AI, totaling trillions of dollars, are being spent at an unprecedented pace. Despite this significant financial outlay, the technology continues to grapple with fundamental issues such as hallucinations, unreliability, and misalignment. This raises questions about whether generative AI, even with immense funding, could ultimately prove to be a costly error driven by overconfidence. AI

IMPACT Raises questions about the long-term viability and return on investment for current AI development strategies.
COMMENTARY · r/MachineLearning English(EN) · 10h

The famous METR AI time horizons graph contains numerous severe errors [D]

A recent analysis by Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, has identified numerous severe errors in the widely cited METR AI time horizons graph. These flaws include fabricated human baseline data, incentivizing benchmarkers to take longer by paying them hourly, a biased sample of human testers, and potential test-training data contamination. Witkin argues that the graph's significant inaccuracies render it unreliable for drawing meaningful conclusions about AI capabilities and their impact on tasks like software development. AI

IMPACT Critiques of widely cited AI capability graphs highlight the need for rigorous scientific standards and can influence how AI progress is perceived.
RESEARCH · arXiv cs.AI English(EN) · 1w · [2 sources]

Position: Weight Space Should Be a First-Class Generative AI Modality

A new position paper proposes treating neural network checkpoints as a primary data modality for generative AI. The authors argue that synthesizing models in weight space can match fine-tuning performance at a fraction of the cost, leveraging structured regions of weight space. This approach could accelerate the development of AI systems that create or improve other AI systems. AI

IMPACT Proposes a new paradigm for AI development, potentially reducing costs and accelerating the creation of AI systems by AI.
RESEARCH · Gary Marcus English(EN) · 4d · [2 sources]

Checking the math behind OpenAI and Anthropic’s latest headlines

OpenAI announced that a new, unreleased reasoning model helped solve an 80-year-old mathematical conjecture. The model, which utilizes chain-of-thought reasoning, systematically explored paths that human mathematicians had overlooked. While impressive, some experts caution that this achievement may be more of a marketing demonstration for OpenAI's new model rather than a significant advancement in AI-assisted mathematics, suggesting that smaller, more specialized models combined with existing tools might be the future. AI

IMPACT Demonstrates AI's potential in complex problem-solving, though its practical impact on advancing mathematics is debated.
- Cal Newport
- Thomas Bloom
- Anthropic
- OpenAI
- ChatGPT
- Gary Marcus
- Noam Brown
- Paul Erdos
RESEARCH · arXiv cs.AI English(EN) · 5d · [3 sources]

How to Build Marcus's Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields

Researchers have developed a new hyperdimensional computing architecture called PyVaCoAl/VaCoAl, which is built upon an algebraic primitive of XOR-and-shift over GF(2). This architecture aims to fulfill Gary Marcus's three core components for cognitive architectures: operations over variables, recursively structured representations, and a distinction between individuals and kinds. The system supports reversible variable binding, non-commutative compositional bundling, and address-space separation, offering a functional neural substrate that more closely aligns with Marcus's specifications than previous methods. AI

IMPACT Proposes a novel architecture that may offer a more robust foundation for cognitive computing systems.
RESEARCH · arXiv cs.CL English(EN) · 1mo · [16 sources]

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

A new study published on arXiv investigated the hallucination tendencies of four popular LLMs—ChatGPT, Grok, Gemini, and Copilot—when used for academic writing. The research introduced a "Hallucination Index" (HI) and found that Grok and Copilot performed better in reference generation but struggled with abstract prompts, while Gemini and ChatGPT showed better tone control but higher factual hallucination risks. The study concluded that hallucination behavior is influenced by task type and prompting conditions, not solely by model architecture. Separately, Gary Marcus highlighted multiple studies indicating that current LLMs are unreliable for medical advice, often providing inaccurate or fabricated information with high confidence, and should not be used for unsupervised clinical decision-making. AI

IMPACT LLM hallucinations in academic and medical contexts pose risks of misinformation and unreliable decision-making, highlighting the need for caution and further research.
- DeepSeek
- Copilot
- Cursor
- Grok
- ChatGPT
- Ollama
- llama3.1:8b
- Glia
- Eshaan Nair
- SQLite
- Claude
- Gemini
- CoreWeave
- Nvidia
- Palantir
- Gary Marcus
- arXiv
- Large Language Models
- Nature Medicine
- JAMA Network Open