English(EN) Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

新框架提升跨设备Transformer推理效率

作者 PulseAugur 编辑部 · [5 个来源] · 2026-05-25 10:39

研究人员开发了新的方法来提高Transformer模型在多个设备上推理的效率。一种名为ASTRA的方法，将序列并行与混合精度注意力相结合，以减少设备间带宽需求，即使在低带宽网络上也能实现显著的加速。另一个框架Meta-Attention使用贝叶斯元控制器动态地将token路由到最合适的注意力策略，提供了更好的计算-性能权衡。此外，一项关于嵌入式边缘设备的研究表明，驱动剖析的适应对于实际的分布式Transformer推理至关重要，通过降低延迟和能耗，其性能优于静态分布式设置。 AI

影响这些进步可以显著降低部署大型AI模型的计算成本和延迟，从而在各种硬件上实现更高效的实时应用程序。

排序理由多篇研究论文详细介绍了高效Transformer推理的新颖方法。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。我们如何撰写摘要 →

报道来源 [5]

arXiv cs.AI TIER_1 English(EN) · Xiao Liu, Lijun Zhang, Deepak Ganesan, Hui Guan · 2026-05-28 04:00

ASTRA: ASTRA：面向多设备 Transformer 推理的高效通信加速

arXiv:2505.19342v2 Announce Type: replace-cross Abstract: Multi-device inference can reduce Transformer latency by parallelizing computation. However, existing methods require high inter-device bandwidth, making them impractical for bandwidth-constrained environments. We present …
arXiv cs.LG TIER_1 English(EN) · Alan Ferrari · 2026-05-28 04:00

Meta-Attention：高效 Transformer 推理的贝叶斯逐 Token 路由

arXiv:2605.28384v1 Announce Type: new Abstract: Standard transformer architectures apply a single attention mechanism uniformly across all tokens and sequence positions, irrespective of local context or computational budget. We propose Meta-Attention, a framework that dynamically…
arXiv cs.LG TIER_1 English(EN) · Alan Ferrari · 2026-05-27 12:21

Meta-Attention：高效Transformer推理的贝叶斯逐Token路由

Standard transformer architectures apply a single attention mechanism uniformly across all tokens and sequence positions, irrespective of local context or computational budget. We propose Meta-Attention, a framework that dynamically routes each token to the most appropriate atten…
arXiv cs.AI TIER_1 English(EN) · Muhammad Azlan Qazi, Alexandros Iosifidis, Qi Zhang · 2026-05-26 04:00

面向嵌入式边缘部署的驱动式自适应分布式Transformer推理

arXiv:2605.25682v1 Announce Type: cross Abstract: Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical benefits on real hardware remain unclear: prior work relies largely on simulations that overloo…
arXiv cs.AI TIER_1 English(EN) · Qi Zhang · 2026-05-25 10:39

面向嵌入式边缘部署的驱动式自适应分布式Transformer推理

Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical benefits on real hardware remain unclear: prior work relies largely on simulations that overlook hardware-specific communication overheads. We pr…

报道来源 [5]

ASTRA: ASTRA：面向多设备 Transformer 推理的高效通信加速

Meta-Attention：高效 Transformer 推理的贝叶斯逐 Token 路由

Meta-Attention：高效Transformer推理的贝叶斯逐Token路由

面向嵌入式边缘部署的驱动式自适应分布式Transformer推理

面向嵌入式边缘部署的驱动式自适应分布式Transformer推理

相关实体

相关话题