ENTITY Common Crawl

Common Crawl

PulseAugur coverage of Common Crawl — every cluster mentioning Common Crawl across labs, papers, and developer communities, ranked by signal.

Total · 30d

6

6 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

3

3 over 90d

TIER MIX · 90D

significant 1
research 2
tool 3

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

SIGNIFICANT · CL_29627 · May 11 · 22:37

Elsevier sues Meta over AI training data, citing copyright infringement

Academic publishing giant Elsevier, along with other publishers and authors, has filed a lawsuit against Meta, accusing the company of illegally scraping and using copyrighted research papers to train its Llama large la…
RESEARCH · CL_14409 · May 4 · 04:00

LLM-generated content is rapidly growing on the web, study finds

A new research paper introduces DeGenTWeb, a system designed to systematically identify websites dominated by content generated by large language models (LLMs) with minimal human oversight. The study found that LLM-domi…
SIGNIFICANT · CL_13263 · May 2 · 20:29

News publishers demand Common Crawl block AI training on their content

News publishers are demanding that Common Crawl cease its unauthorized scraping of web content and prevent AI companies from using this data for model training. The News/Media Alliance has formally communicated this dem…
RESEARCH · CL_04516 · Apr 26 · 23:52

Google warns of increasing, unsophisticated AI prompt injection attacks

Google Threat Intelligence researchers have identified an increase in indirect prompt injection attacks targeting AI systems that browse the web. While many of these attacks are currently low in sophistication and harml…
TOOL · CL_17378 · Apr 24 · 06:48

Interactive guide explains how large language models like ChatGPT are built

A new interactive visual guide, based on Andrej Karpathy's lecture, explains the intricate process of building large language models. It details the journey from collecting vast amounts of internet text to the final sta…
RESEARCH · CL_05000 · Apr 23 · 23:32

Researchers unveil PermaFrost-Attack for latent LLM poisoning during pretraining

Researchers have introduced PermaFrost-Attack, a novel method for embedding hidden vulnerabilities, termed 'logic landmines,' into large language models during their pretraining phase. This attack, known as Stealth Pret…