PulseAugur
实时 04:27:22
实体 rectifier

rectifier

PulseAugur coverage of rectifier — every cluster mentioning rectifier across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
24
90 天内 24
发布 · 30天
0
90 天内 0
论文 · 30天
24
90 天内 24
层级分布 · 90 天
关系
情绪 · 30 天

3 天有情绪数据

最近 · 第 1/2 页 · 共 24 条
  1. TOOL · CL_45331 ·

    残差连接通过绕过层来支持更深层的LLM训练

    本文解释了残差连接,这是Transformer架构中的一个关键组成部分,对于训练像大型语言模型(LLMs)这样的深度神经网络至关重要。残差连接通过提供梯度的替代路径来帮助克服梯度消失问题,使模型能够学习更复杂的模式。这项技术对于自然语言处理(NLP)任务(如翻译、摘要和文本生成)的进步至关重要。

  2. TOOL · CL_45000 ·

    已识别出神经网络权重漂移是训练动态问题

    研究人员在神经网络中发现了一种称为“权重漂移”的现象,其中优化过程会无意中将权重推向负值。这种漂移独立于训练数据,在使用标准损失函数和 ReLU、GELU 等常见激活函数时会出现。研究表明,这种漂移会导致显著的激活稀疏性,可能影响模型准确性,并且还会放大 Transformer 层中的激活尖峰。

  3. TOOL · CL_44953 ·

    新的二次ReLU替代方案加速FHE神经网络推理

    研究人员开发了一种新方法,用于将神经网络中的ReLU激活函数替换为二次多项式,特别适用于全同态加密(FHE)。该方法旨在通过使用低次多项式来降低仅FHE推理的计算成本,同时保持校准数据集上的分类准确性。该方法将替换问题构建为线性分离问题,并使用凸包松弛将其扩展到误分类样本的情况,与现有方法相比实现了更快的推理速度。

  4. TOOL · CL_43959 ·

    New method secures embedded neural networks against timing attacks

    Researchers have developed a new methodology for implementing activation functions in embedded neural networks that prevents information leakage through timing side channels. This approach ensures consistent execution t…

  5. TOOL · CL_41870 ·

    视觉模型摒弃激活函数,采用多项式替代方案

    研究人员开发了用于视觉模型的新型无激活骨干网络架构,使用多项式函数替代ReLU或GELU等传统逐点非线性函数。这些集成到MetaFormer框架中的新颖模块,在ImageNet分类和语义分割等任务上,表现出与基于激活的模型相当或更优的性能。研究还表明,这些多项式变体在需要较低计算成本的情况下,性能优于先前专门的多项式网络。

  6. TOOL · CL_22136 ·

    AI research links data geometry to neural network generalization

    This paper theoretically investigates how data geometry influences generalization in overparameterized neural networks trained below the edge of stability. It derives generalization bounds for two-layer ReLU networks, s…

  7. TOOL · CL_22101 ·

    Researchers explore how gradient descent adapts neural network capacity to tasks

    Researchers have developed a theoretical framework to explain how neural networks adapt their capacity to specific tasks during gradient descent training. The study identifies three key dynamical principles—mutual align…

  8. TOOL · CL_22028 ·

    New lattice-based framework for piecewise GLMs inspired by renormalization group theory

    Researchers have introduced a novel framework for generalized linear models inspired by renormalization group theory. This approach utilizes additive hierarchical expansions to create models that are locally linear, sim…

  9. TOOL · CL_21916 ·

    New research explores active learning for conditional generative compressed sensing

    Researchers have developed a new framework for conditional generative compressed sensing, specifically for image recovery from subsampled Fourier measurements using prompt-conditioned generative models. This approach di…

  10. TOOL · CL_20734 ·

    Photonic ROM architecture enables high-speed, reconfigurable lookup tables for accelerators

    Researchers have developed OptiLookUp, a novel photonic architecture that utilizes integrated microring resonators to create a reconfigurable optical read-only memory (ROM). This system encodes input-output mappings dir…

  11. RESEARCH · CL_20254 ·

    New mechanistic estimation method outperforms sampling for wide random MLPs

    Researchers have developed a new method for estimating the expected output of wide, randomly initialized multilayer perceptrons (MLPs) without needing to run samples through the model. This "mechanistic estimation" appr…

  12. RESEARCH · CL_18833 ·

    Neural networks achieve super-fast convergence and represent complex functions with floating-point arithmetic

    Two new arXiv papers explore theoretical aspects of neural network convergence and representation capabilities. The first paper demonstrates that neural network classifiers can achieve super-fast convergence rates under…

  13. RESEARCH · CL_18326 ·

    Researchers develop exact ReLU realization for tensor-product refinement iterates

    Two new arXiv papers explore advanced mathematical techniques for realizing ReLU (Rectified Linear Unit) functions in neural networks. The first paper, "Exact ReLU realization of tensor-product refinement iterates," ext…

  14. RESEARCH · CL_10262 ·

    深度神经网络可证明地克服了偏微分方程的维度灾难

    研究人员证明了深度神经网络(DNN)在逼近Kolmogorov偏微分方程解时可以克服维度灾难。这项数学证明扩展了先前的发现,表明使用ReLU、Leaky ReLU和Softplus激活函数的网络可以在不导致计算成本相对于问题维度呈指数级增长的情况下,实现逼近精度。该工作在$L^p$意义下,针对广泛的$p$值证明了这种能力。

  15. RESEARCH · CL_08687 ·

    Researchers evolve activation functions to handle missing data in neural networks

    Researchers have developed a novel approach called Three-Channel Evolved Activations (3C-EA) to address challenges in machine learning when dealing with missing data. Unlike traditional activation functions, 3C-EA incor…

  16. RESEARCH · CL_06861 ·

    Ternary neural networks offer theoretical expressivity comparable to standard NNs

    Researchers have theoretically analyzed the expressivity of ternary neural networks, which use parameters restricted to {-1, 0, +1}. The study focuses on regression networks with ReLU activation functions, proving that …

  17. RESEARCH · CL_06782 ·

    MLP 跳跃连接无法被吸收进无残差模型

    研究人员调查了一个单隐藏层 MLP 周围的跳跃连接是否可以被吸收进一个相同宽度的无残差 MLP。他们发现,对于 ReLU^2 和 ReGLU 等某些激活函数,由于次数参数的原因,吸收是不可能的。对于 SwiGLU 和 GeGLU 等门控激活函数,线性化参数也得出了相同的结论。虽然在特定的、非通用的权重条件下,吸收对于无门控的 ReLU 和 GELU 是可能的,但跳跃连接和无残差的 MLP 通常代表不同的函数类别。

  18. RESEARCH · CL_06377 ·

    New research explores activation functions beyond ReLU in neural networks

    A new paper explores the theoretical underpinnings of neural network kernels, specifically focusing on activation functions beyond the standard ReLU. Researchers characterized the Reproducing Kernel Hilbert Spaces (RKHS…

  19. RESEARCH · CL_06176 ·

    自监督网络在可比准确率下产生更少的线性区域

    一项发表在arXiv上的新研究调查了自监督深度ReLU网络中线性区域的复杂性。研究人员发现,自监督学习方法在达到相似准确率的情况下,与监督方法相比产生的线性区域更少。研究还观察到,对比学习方法会随着时间的推移扩展这些区域,而自蒸馏方法会合并它们,并且这些几何特性可以指示表征质量并检测模型崩溃的早期迹象。

  20. RESEARCH · CL_05188 ·

    超越注意力投影的线性:非线性查询的论证

    研究人员正在探索 Transformer 注意力机制背后的基本原理,新论文分析了其梯度流结构和动态。一项研究将注意力解释为单位球面上的梯度流,识别影响多头设置中 token 聚类和稳定性的因素。另一篇论文研究了用于复杂性控制的关键训练窗口,确定 Transformer 何时优先考虑推理而非记忆。此外,研究还揭示了深度神经网络中几何连续性的起源,将其归因于残差连接和对称性破坏的非线性,并考察了“注意力汇聚”现象的结构原因。