Brief

last 24h

[12/12] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CL English(EN) · 20h

Is a Document Educational or Just Wikipedia-Style? -- Pitfalls of Classifier-Based Quality Filtering

Researchers have identified a significant vulnerability in classifier-based quality filtering, a common technique for curating pre-training data for large language models. Their study demonstrates that simple reformatting of content, mimicking Wikipedia's style, can trick these classifiers into misjudging document quality. This could lead to the inclusion of lower-quality data in training corpora, potentially impacting model performance. AI

IMPACT Highlights a potential flaw in data curation for LLMs, which could impact model quality if not addressed.
TOOL · Mastodon — mastodon.social Italiano(IT) · 8h

Want a personal server that keeps you online even when the network goes down? 🌪️ Project N.O.M.A.D. brings Wikipedia, offline maps, courses, and local AI to any PC

Project N.O.M.A.D. aims to create a personal server that can run essential services like Wikipedia, offline maps, courses, and local AI applications on any computer. This initiative focuses on providing access to information and tools even when internet connectivity is lost. The project seeks to empower users with self-hosted digital resources. AI

IMPACT Enables offline access to AI models and digital resources, reducing reliance on constant internet connectivity for information and applications.
TOOL · arXiv stat.ML English(EN) · 6d

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

Researchers have introduced TailedTS, a new benchmark dataset designed to evaluate time series forecasting models on data exhibiting heavy-tailed, zero-inflated, and non-Gaussian distributions. Derived from Wikipedia page view data, TailedTS contains approximately 24.69 billion data points, highlighting that a small percentage of pages receive a majority of views, thus creating a challenging testbed for model robustness. The dataset also facilitates research into periodicity quantification and standardized prediction benchmarks using non-Gaussian loss functions, revealing that standard estimators perform poorly on high-volume data. AI

IMPACT Introduces a new dataset to improve the robustness of time series forecasting models against extreme volatility and non-Gaussian distributions.
TOOL · Mastodon — fosstodon.org English(EN) · 2d

PsyPost: AI-generated Grokipedia articles are longer, less readable, and cite fewer sources than their Wikipedia counterparts. “A recent study published in the

A recent study indicates that AI-generated articles on Grokipedia are less readable and cite fewer sources than human-edited Wikipedia articles. The research, published in the Proceedings of the National Academy of Sciences, found that automated encyclopedias tend to produce longer, more complex content. Furthermore, these AI-generated articles may exhibit a rightward political bias in certain subject areas. AI

IMPACT AI-generated content may lack the quality and neutrality of human-created information, impacting trust and reliability.
TOOL · arXiv cs.CL English(EN) · 6d

Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

Researchers have developed Stage-Audit, a system designed to improve the accuracy and source-grounding of tables generated by large language models. The system addresses the issue of LLMs fabricating or misattributing sources for table entries by implementing distinct curator and auditor roles with write permissions. Stage-Audit also incorporates a row-level source-citation gate and a comprehensive audit taxonomy to ensure explicit traceability of information. AI

IMPACT Enhances the reliability of LLM-generated structured data, reducing the risk of misinformation and improving data integrity for downstream applications.
COMMENTARY · Mastodon — fosstodon.org English(EN) · 4d

Scrapers vs Wikis: Person who runs a bunch of custom Wiki websites writes about abuse from scrapers https:// weirdgloop.org/blog/clankers # via :lobsters # robo

A wiki operator describes the challenges posed by aggressive web scrapers, which he likens to "clankers." These automated bots consume significant server resources and bandwidth, disrupting the normal functioning of his custom wiki sites. The operator highlights the need for better tools or protocols to manage and mitigate the impact of such scraping activities. AI

IMPACT This discusses the impact of automated systems, which can include AI-driven scrapers, on web infrastructure and content creators.
- web scrapers
- Wiki
RESEARCH · arXiv cs.AI English(EN) · 5d · [8 sources]

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Researchers have introduced several new benchmarks and methods for Visual Question Answering (VQA) systems. HyLoVQA proposes a dynamic hypernetwork-generated low-rank adaptation technique for continual VQA, improving adaptation to new tasks and objects. WikiVQABench offers a knowledge-grounded VQA benchmark using Wikipedia and Wikidata, designed to test models requiring external knowledge. Additionally, UCSF-PDGM-VQA focuses on brain tumor MRI interpretation, highlighting current VLM limitations in clinical settings, while RoboSurg-VQA addresses surgical segmentation-aware VQA, and VISTAQA benchmarks joint answer correctness and pixel-level evidence grounding. AI

IMPACT These new benchmarks and adaptation techniques aim to improve the reliability and capabilities of Vision-Language Models in complex, real-world scenarios.
MEME · Mastodon — mastodon.social English(EN) · 5d · [2 sources]

An # AI Wikipedia clone that openly hallucinates everything: https:// futurism.com/artificial-intell igence/deranged-wikipedia-clone-made-entirely-of-ai-halluci

A new AI-powered Wikipedia clone is generating entirely fabricated content, a phenomenon known as hallucination. This AI-generated encyclopedia is producing nonsensical and inaccurate information, highlighting the current limitations of AI in factual content creation. The project serves as a stark example of how AI can confidently present false information as fact. AI

IMPACT Highlights the current unreliability of AI for factual content generation.
- AI
- Wikipedia
MEME · Mastodon — fosstodon.org English(EN) · 4d

@ isotopp tested it it looks like it's working but sadly and of course its not. checked the output against the wikipedia page # ai # googlesearch # sloppyficati

A user tested a new AI feature, likely related to Google Search, and found it to be functional but flawed. The output was compared against a Wikipedia page, indicating a discrepancy or error in the AI's generated information. AI

IMPACT This is a user-level observation about a specific AI feature's performance, with no clear industry-wide impact.
MEME · Mastodon — sigmoid.social English(EN) · 5d · [2 sources]

RE: https:// openbiblio.social/@rstockm/116 605905467076274 # KI # AI # Bibliotheken # Libraries # wikipedia # future # zukunft # aufgaben ?

This cluster contains Mastodon posts discussing the intersection of AI, libraries, and Wikipedia. The posts reference conversations about future tasks and IT within these domains. The content appears to be a series of replies or cross-posts related to ongoing discussions on these topics within the Mastodon social network. AI
- Mastodon
- AI
- libraries
- Wikipedia
MEME · Mastodon — fosstodon.org English(EN) · 12h

@ steltenpower Better read about this in Wikipedia than view an overly long video made with # AI https:// en.wikipedia.org/wiki/Dutch_Ea st_India_Company#Declin

A user on Mastodon suggested that information about the Dutch East India Company's decline is better found on Wikipedia than through AI-generated videos. This sentiment highlights a preference for traditional, verifiable sources over potentially lengthy or less reliable AI content for historical research. AI

IMPACT User sentiment suggests a current skepticism towards AI-generated content for detailed historical research.
RESEARCH · arXiv cs.CL English(EN) · 4w · [9 sources]

Translate or Simplify First: An Analysis of Cross-lingual Text Simplification in English and French

Researchers are exploring how large language models (LLMs) align with human brain activity across different languages and tasks. Studies show that intermediate LLM layers best predict brain responses, and this alignment is influenced by training data language dominance rather than inherent model typology. Furthermore, instruction-tuned multimodal LLMs demonstrate stronger brain alignment, particularly when organized around task-specific demands rather than just surface semantics. AI

IMPACT Investigates how LLMs process and represent information, offering insights into their cognitive alignment and potential for cross-lingual and multimodal tasks.
- LLM
- French
- Wikipedia
- BLEU
- English
- arXiv
- Chinese
- Large Language Models
- LLM-based approaches
- Llama-3.1-8B
- LLMs
- GPT-2 XL
- LLaMA-2-7B
- fMRI
- multimodal LLMs
- Baichuan2-7B
- instruction-tuned multimodal LLMs