Greedy Coordinate Gradient
PulseAugur coverage of Greedy Coordinate Gradient — every cluster mentioning Greedy Coordinate Gradient across labs, papers, and developer communities, ranked by signal.
3 day(s) with sentiment data
-
New SCARCE method improves rare-event analysis in AI systems
Researchers have introduced SCARCE (Scalable Cascade Analysis for Rare-event Characterisation via Embeddings), a novel method for estimating the probabilities of rare events in AI systems. SCARCE replaces traditional pe…
-
New ASR techniques tackle phonetic errors and judge reliability
Researchers are developing advanced methods to improve Automatic Speech Recognition (ASR) systems, particularly for low-resource languages and to address specific types of errors. One approach, Error-Aware TF-IDF, uses …
-
New defenses and attacks target LLM jailbreaks and prompt injections
Researchers are developing new methods to defend large language models against prompt injection and jailbreak attacks. GuardNet utilizes an ensemble of shallow neural networks for efficient detection, while SlotGCG focu…
-
New research reveals escalating LLM and LALM jailbreak vulnerabilities
Three new research papers explore the vulnerabilities and defenses of large language models (LLMs) and large audio-language models (LALMs). The first paper details a taxonomy of audio jailbreak attacks and defenses, hig…
-
New Frost Training method boosts LLM policy optimization
Researchers have introduced Frost Training, a novel method designed to enhance Monte Carlo-based policy optimization for a class of tasks known as Cross-Entropy Games. This technique leverages the gradient of the reward…
-
New Logit-Gap Steering method efficiently measures AI alignment robustness
Researchers have developed a new metric called the refusal-affirmation logit gap to quantify the safety margin of aligned language models. This metric, which measures the difference between refusal and affirmation token…
-
Researchers explore token position's impact on LLM adversarial attacks
Researchers have identified a critical blind spot in the adversarial robustness evaluation of large language models. Their study, focusing on the Greedy Coordinate Gradient (GCG) attack, reveals that the placement of ad…