ENTITY LLaMA-3-8B-Instruct

LLaMA-3-8B-Instruct

PulseAugur coverage of LLaMA-3-8B-Instruct — every cluster mentioning LLaMA-3-8B-Instruct across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

12 over 90d

Releases · 30d

0 over 90d

Papers · 30d

12 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 12 TOTAL

TOOL · CL_100162 · Jun 19 · 04:00

New pruning method preserves LLM reasoning performance

Researchers have developed a new training-free method called Causal Attribution Pruning (CAP) to reduce the size of large language models while preserving their reasoning capabilities. CAP identifies and prunes less cri…
TOOL · CL_84838 · Jun 11 · 04:00

New method tests LLM sycophancy without harming factual agreement

Researchers have developed a new method called dual-stance evaluation to assess large language models' sycophancy. This technique tests whether interventions designed to reduce agreement with false, sycophantic statemen…
TOOL · CL_70394 · Jun 4 · 04:00

Context labels dramatically alter language model behavior

Researchers have found that the labels used to present context to language models significantly impact their behavior. In tests across models like GPT-5.5 and DeepSeek V4 Pro, using labels such as "Instruction:" or "Ref…
RESEARCH · CL_68363 · Jun 3 · 04:00

New defenses and attacks target LLM jailbreaks and prompt injections

Researchers are developing new methods to defend large language models against prompt injection and jailbreak attacks. GuardNet utilizes an ensemble of shallow neural networks for efficient detection, while SlotGCG focu…
TOOL · CL_65565 · Jun 2 · 04:00

New NLHF algorithm improves LLM alignment with explicit exploration

Researchers have developed a new algorithm for Nash Learning from Human Feedback (NLHF) that addresses limitations in current methods for aligning large language models with human preferences. The proposed algorithm exp…
RESEARCH · CL_62284 · May 29 · 10:49

EvoDefense uses LLMs to co-evolve defenses against black-box attacks

Researchers have developed EvoDefense, a novel approach to protect large language models (LLMs) from attacks in black-box scenarios. This system uses a guard LLM and an experience memory to continuously refine defense s…
TOOL · CL_18791 · May 6 · 04:00

New method uses model's own outputs for safety fine-tuning

Researchers have developed a novel method for safety fine-tuning language models by identifying and utilizing the most challenging prompts. This technique involves scoring prompts based on the frequency of harmful model…
RESEARCH · CL_15836 · May 5 · 04:00

The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

Two new research papers explore vulnerabilities and detection methods in machine unlearning, a process designed to remove specific data from trained models for privacy compliance. One paper, "DurableUn," reveals that lo…
TOOL · CL_15459 · May 5 · 04:00

New attack redirects LLM attention to bypass safety alignment

Researchers have developed a new white-box adversarial attack called the Attention Redistribution Attack (ARA) that targets the internal attention mechanisms of safety-aligned large language models. This attack crafts n…
RESEARCH · CL_11433 · Apr 30 · 14:31

DPN-LE method precisely edits LLM personalities with minimal neuron intervention

Researchers have developed DPN-LE, a novel method for editing the "personality" of large language models by targeting specific neurons. Existing techniques often degrade overall model performance by modifying too many n…
RESEARCH · CL_70261 · Sep 17 · 17:00

New research tackles LLM factuality, safety, and complex task performance

Researchers are developing new methods to improve the reliability and safety of large language models (LLMs). Google Research introduced SLED, a decoding strategy that uses all LLM layers to enhance factual accuracy wit…
RESEARCH · CL_44017 · Apr 17 · 00:00

New DPO methods enhance LLM alignment with adaptive techniques

Researchers have developed several advancements to Direct Preference Optimization (DPO), a method for aligning large language models (LLMs) with human preferences. AdaDPO introduces self-adaptive coefficients to balance…

New pruning method preserves LLM reasoning performance

New method tests LLM sycophancy without harming factual agreement

Context labels dramatically alter language model behavior

New defenses and attacks target LLM jailbreaks and prompt injections

New NLHF algorithm improves LLM alignment with explicit exploration

EvoDefense uses LLMs to co-evolve defenses against black-box attacks

New method uses model's own outputs for safety fine-tuning

The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

New attack redirects LLM attention to bypass safety alignment

DPN-LE method precisely edits LLM personalities with minimal neuron intervention

New research tackles LLM factuality, safety, and complex task performance

New DPO methods enhance LLM alignment with adaptive techniques