vision transformer
PulseAugur coverage of vision transformer — every cluster mentioning vision transformer across labs, papers, and developer communities, ranked by signal.
7 天有情绪数据
-
New Vision Transformer baseline sets SOTA on material segmentation
Researchers have revived the Apple Dense Material Segmentation (DMS) benchmark by establishing a new Vision Transformer baseline. They identified that standard training methods struggle with amorphous textures due to hi…
-
New depthwise convolution speeds up vision foundation models
Researchers have developed a new method to speed up vision foundation models by replacing certain attention heads in Vision Transformer (ViT) backbones with efficient depthwise convolution layers. This drop-in replaceme…
-
Fully Ternary Vision Transformer Achieves High Compression for Microcontrollers
Researchers have developed FTerViT, a fully ternary Vision Transformer that compresses all weight matrices and normalization parameters. This approach significantly reduces the model's memory footprint, making it more f…
-
New anomaly detection uses vision transformers for autonomous driving
Researchers have developed a new anomaly detection method for autonomous driving that uses pre-trained vision transformer embeddings. This approach models normality from a single reference image, avoiding the need for e…
-
CutMix training protocol induces spatial locality in Vision Transformers
Researchers have found that specific training techniques can encourage spatial locality in Vision Transformers. By using a 'Modern' protocol involving data augmentation like CutMix and ColorJitter, along with label smoo…
-
LESSViT architecture improves hyperspectral model generalization across sensors
Researchers have developed LESSViT, a novel architecture for hyperspectral imagery that addresses the challenge of generalizing models across different sensors. This Low-rank Efficient Spatial-Spectral ViT uses a struct…
-
TokenMask improves vision transformer segmentation efficiency
Researchers have developed TokenMask, a novel approach for vision transformer segmentation that bypasses the need for explicit image-space reconstruction. This method computes mask logits directly from query-token affin…
-
New GLIA framework enhances Vision Transformer use in image quality assessment
Researchers have developed a new framework called the Global-Local Interaction Adapter (GLIA) to improve Blind Image Quality Assessment (BIQA). This method leverages pre-trained Vision Transformers by using a dual-strea…
-
VoxCor method enables training-free volumetric features for medical imaging
Researchers have developed VoxCor, a novel method for creating reusable volumetric feature representations from pre-trained 2D Vision Transformer models. This training-free approach combines triplanar inference with a w…
-
What-Where Transformer separates object appearance from location
Researchers have introduced the What-Where Transformer (WWT), a novel visual backbone designed to better separate object appearance from spatial location. This new architecture uses a slot-based design where tokens repr…
-
Diffusion augmentation boosts Bangla character recognition accuracy
Researchers have developed a confidence-guided diffusion augmentation method to improve the recognition of handwritten Bangla compound characters. This approach uses diffusion models to generate high-quality synthetic c…
-
Foundation model learns from Dutch satellite data for global benchmarks
Researchers have developed a new foundation model for high-resolution remote sensing data, specifically trained on satellite images of the Netherlands. This model combines Convolutional Neural Networks and Vision Transf…
-
LC4-DViT uses generative AI and transformers for accurate land-cover mapping
Researchers have developed LC4-DViT, a novel framework for land-cover classification using a deformable Vision Transformer. This approach combines generative data creation with a deformation-aware backbone to improve ac…
-
New framework fuses facial and physiological signals for better emotion recognition
Researchers have developed a new framework for video-based emotion recognition that combines facial expressions with physiological signals from remote photoplethysmography (rPPG). Their method uses prompt tuning to inte…
-
Researchers develop robust foundation model for conservation laws using recurrent Vision Transformers
Researchers have developed a new architecture that enhances Flux Neural Operators (Flux NO) by incorporating context through Recurrent Vision Transformers. This hypernetwork model extracts solution dynamics over time, e…
-
DART vision-language model offers comprehensive rope condition monitoring
Researchers have developed DART, a vision-language foundation model designed for comprehensive rope condition monitoring. This model integrates a Vision Transformer with Llama-3.2-3B-Instruct to handle the entire inspec…
-
Hebbian Fast Weights enhance Vision Transformers for few-shot character recognition
Researchers have developed a new approach to few-shot character recognition by integrating Hebbian Fast-Weight (HFW) modules into Vision Transformer architectures. This method aims to mimic biological neural systems' ab…
-
RD-ViT cuts data needs for segmentation, outperforming standard ViT with fewer parameters
Researchers have developed RD-ViT, a novel Recurrent-Depth Vision Transformer designed for semantic segmentation tasks. This architecture significantly reduces data dependence by using a single, shared transformer block…
-
OneTrackerV2 unifies multimodal visual tracking with Dual Mixture-of-Experts
Researchers have developed a new event-based visual object tracking framework that addresses limitations of existing methods by explicitly modeling event density variations across multiple temporal scales. This approach…
-
Researchers develop AI framework for fluid-structure interaction prediction
Researchers have developed a new machine learning framework for predicting fluid-structure interactions (FSI) over long periods on deforming meshes. The system integrates a graph neural operator with a vision Transforme…