Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 19h

Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-Reflection

Researchers have developed Reflect-Guard, a new method to improve the safety of large language models against adversarial prompts. This technique uses chain-of-thought self-reflection, fine-tuning models like Llama-Guard-3-8B with distilled reasoning from GPT-4o-mini. Even with a small dataset and minimal parameter updates, Reflect-Guard significantly boosts performance on benchmarks designed to test defenses against jailbreak attacks, particularly by enabling models to reason through obfuscated malicious intent. AI

IMPACT This research offers a promising direction for creating more robust LLM safety mechanisms by enabling models to reason about adversarial intent.
TOOL · arXiv cs.CL English(EN) · 6d

LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models

Researchers have developed LASH, a novel framework designed to enhance the jailbreaking of large language models. LASH adaptively combines outputs from multiple existing attack methods, treating them as seed prompts. This approach leverages the complementary strengths of different attack families to improve success rates against various models and harm categories. In evaluations on the JailbreakBench dataset, LASH achieved high attack success rates with significantly fewer queries compared to state-of-the-art baselines. AI

IMPACT Introduces a more effective method for red-teaming LLMs, potentially accelerating the discovery and patching of safety vulnerabilities.

Brief

Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-Reflection

LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models