Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 12h

Efficient On-Device Diffusion LLM Inference with Mobile NPU

Researchers have developed a new framework called "this http URL" designed to optimize the inference of diffusion large language models (dLLMs) on mobile devices. This framework addresses challenges such as shrinking workloads and complex data management inherent in mobile NPU architectures. It employs techniques like multi-block speculative decoding and dual-path progressive revision to significantly reduce generation latency while maintaining output quality. AI

IMPACT This framework could enable more powerful LLM applications to run directly on smartphones, improving user experience and privacy.
- this http URL
- LLaDA-8B
TOOL · arXiv cs.AI English(EN) · 2w

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit

Researchers have developed a new method called CreditDecoding to accelerate the text generation process in diffusion large language models (dLLMs). This technique addresses an inefficiency where models predict correct tokens earlier than their confidence scores allow for decoding, leading to redundant iterations. CreditDecoding quantifies a token's decoding potential using "Trace Credit" and fuses this with current model outputs to boost confidence in correct but underconfident tokens. This training-free approach has demonstrated significant speedups of up to 5.48 times with improved accuracy on various benchmarks and dLLM architectures. AI

IMPACT Accelerates LLM inference, potentially enabling faster and more efficient text generation for a wide range of applications.

Brief

Efficient On-Device Diffusion LLM Inference with Mobile NPU

CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit