Wikipedia
PulseAugur coverage of Wikipedia — every cluster mentioning Wikipedia across labs, papers, and developer communities, ranked by signal.
- founded by Jimmy Wales 100%
- developed by Jimmy Wales 100%
- authored by Jimmy Wales 100%
- subsidiary of Jimmy Wales 100%
- subsidiary of Wikimedia Foundation 100%
- founded Jimmy Wales 95%
- used by LLM 70%
- affiliated with Wikimedia Foundation 70%
- used by large-language models 70%
- instance of Mastodon 60%
- other LLM 50%
- instance of large-language models 50%
10 天有情绪数据
-
Project N.O.M.A.D. enables local AI and offline services on personal servers
Project N.O.M.A.D. aims to create a personal server that can run essential services like Wikipedia, offline maps, courses, and local AI applications on any computer. This initiative focuses on providing access to inform…
-
AI-generated content criticized for historical research
A user on Mastodon suggested that information about the Dutch East India Company's decline is better found on Wikipedia than through AI-generated videos. This sentiment highlights a preference for traditional, verifiabl…
-
Classifier quality filtering vulnerable to Wikipedia-style reformatting
Researchers have identified a significant vulnerability in classifier-based quality filtering, a common technique for curating pre-training data for large language models. Their study demonstrates that simple reformatti…
-
AI-generated Grokipedia articles are longer, less readable than Wikipedia
A recent study indicates that AI-generated articles on Grokipedia are less readable and cite fewer sources than human-edited Wikipedia articles. The research, published in the Proceedings of the National Academy of Scie…
-
AI Search Feature Shows Promise but Lacks Accuracy
A user tested a new AI feature, likely related to Google Search, and found it to be functional but flawed. The output was compared against a Wikipedia page, indicating a discrepancy or error in the AI's generated information.
-
Wiki operator decries abuse from aggressive web scrapers
A wiki operator describes the challenges posed by aggressive web scrapers, which he likens to "clankers." These automated bots consume significant server resources and bandwidth, disrupting the normal functioning of his…
-
AI-generated Wikipedia clone hallucinates all content
A new AI-powered Wikipedia clone is generating entirely fabricated content, a phenomenon known as hallucination. This AI-generated encyclopedia is producing nonsensical and inaccurate information, highlighting the curre…
-
AI, Libraries, and Wikipedia Discussed on Mastodon
This cluster contains Mastodon posts discussing the intersection of AI, libraries, and Wikipedia. The posts reference conversations about future tasks and IT within these domains. The content appears to be a series of r…
-
New VQA benchmarks and methods tackle knowledge, adaptation, and grounding
Researchers have introduced several new benchmarks and methods for Visual Question Answering (VQA) systems. HyLoVQA proposes a dynamic hypernetwork-generated low-rank adaptation technique for continual VQA, improving ad…
-
Stage-Audit system improves LLM table accuracy and source traceability
Researchers have developed Stage-Audit, a system designed to improve the accuracy and source-grounding of tables generated by large language models. The system addresses the issue of LLMs fabricating or misattributing s…
-
New TailedTS dataset challenges time series models with heavy-tailed data
Researchers have introduced TailedTS, a new benchmark dataset designed to evaluate time series forecasting models on data exhibiting heavy-tailed, zero-inflated, and non-Gaussian distributions. Derived from Wikipedia pa…
-
GraphRAG cuts LLM token use by retrieving connected knowledge
Two projects developed using TigerGraph's GraphRAG approach demonstrate its effectiveness in reducing token usage and improving answer quality for large language models. These systems, one focused on cybersecurity and t…
-
New Wikipedia AI dataset Halupedia may degrade training data quality
A new Wikipedia-based AI training dataset called Halupedia is reportedly degrading the quality of Wikipedia's training data. This issue arises because Halupedia, which is designed to be a hallucination-free dataset, is …
-
AI data poisoning concerns grow with large language models
The concept of "data poisoning" in AI models is being discussed, particularly in relation to large language models trained on vast datasets like Wikipedia. This issue highlights concerns about the integrity and reliabil…
-
AI hallucination clone of Wikipedia raises misinformation fears
A new project has launched, creating a Wikipedia-like reference site entirely from AI-generated content, which has raised concerns about the spread of misinformation. This initiative underscores the potential for AI hal…
-
AI Wikipedia Clone Suffers Hallucinations, Threatens Internet Reliability
A new AI-powered platform aims to replicate Wikipedia's functionality but is reportedly plagued by hallucinations and factual inaccuracies. This project, described as a "Wikipedia clone built on AI hallucinations," rais…
-
Wikipedia founder Jimmy Wales warns of trust crisis and AI's impact
Jimmy Wales, the founder of Wikipedia, recently discussed the growing crisis of trust and its potential impact on society. In an interview, he expressed concerns that if this trend continues, it could lead to a new dark…
-
LLMs learn to actively seek external info for better task adaptation
Researchers have developed a new method for adapting large language models (LLMs) by enabling them to actively seek information from external sources like Wikipedia and web browsers. This approach, termed "active inform…
-
LLM popularity bias driven by pretraining data exposure, study finds
Researchers have analyzed how large language models (LLMs) develop preferences for well-known entities, a phenomenon often linked to popularity bias. Using the open OLMo models and their complete Dolma pretraining corpu…
-
New MedHopQA benchmark tests LLM multi-hop reasoning in biomedicine
Researchers have introduced MedHopQA, a new benchmark designed to evaluate the multi-hop reasoning capabilities of large language models in the biomedical domain. This benchmark consists of 1,000 expert-curated question…