ENTITY HellaSwag

HellaSwag

PulseAugur coverage of HellaSwag — every cluster mentioning HellaSwag across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

6 over 90d

Releases · 30d

0 over 90d

Papers · 30d

5 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 6 TOTAL

TOOL · CL_102459 · Jun 21 · 09:02

General LLMs now outperform specialized clinical AI on benchmarks, but safety concerns persist

General-purpose large language models are now achieving performance levels comparable to or exceeding specialized clinical AI systems on various benchmarks, including those for structured knowledge and reasoning. For in…
TOOL · CL_53675 · May 27 · 04:00

New QAT Method Achieves Near-Lossless LLM Performance

Researchers have developed a new method for quantization-aware training (QAT) of large language models (LLMs) called Max-Window Scale Estimation. This technique addresses two failure modes: amax saturation, where delaye…
RESEARCH · CL_50617 · May 25 · 15:29

New QUIET benchmark objectively measures LLM creative writing

Researchers have introduced QUIET, a new benchmark designed to evaluate the creative generation capabilities of large language models. Unlike existing benchmarks that rely on multiple-choice formats or subjective human …
TOOL · CL_32060 · May 14 · 18:16

LLM benchmark costs analyzed: $0.12 for 3 tasks

Benchmarking three large language model tasks (GSM8K, HellaSwag, and TruthfulQA) on a single T4 GPU costs approximately $0.12. The analysis reveals that generative tasks are the primary cost driver, while log-likelihood…
TOOL · CL_31715 · May 14 · 13:39

Evaluate LLMs for under $1 using Qwen2.5-0.5B

This post details a cost-effective method for evaluating large language models, demonstrating that comprehensive benchmarks can be run for under a dollar. The author used a free Google Colab T4 instance to test the Qwen…
RESEARCH · CL_24593 · May 10 · 01:24

Aurora optimizer boosts neural network training efficiency

Researchers have introduced Aurora, a new optimizer designed to improve the training of large neural networks, particularly those with rectangular matrices. Aurora addresses issues like neuron death in MLP layers that c…

General LLMs now outperform specialized clinical AI on benchmarks, but safety concerns persist

New QAT Method Achieves Near-Lossless LLM Performance

New QUIET benchmark objectively measures LLM creative writing

LLM benchmark costs analyzed: $0.12 for 3 tasks

Evaluate LLMs for under $1 using Qwen2.5-0.5B

Aurora optimizer boosts neural network training efficiency