Sapiens2 model family achieves state-of-the-art in human-centric vision tasks

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced Sapiens2, a new family of high-resolution transformer models designed for human-centric vision tasks. These models, ranging from 0.4 to 5 billion parameters, support native 1K resolution and hierarchical variants up to 4K. Sapiens2 achieves improved performance through a unified pretraining objective combining masked image reconstruction with self-distilled contrastive learning, training on a dataset of 1 billion human images, and architectural enhancements like windowed attention for longer spatial context. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new model architecture and pretraining strategy for human-centric vision tasks, potentially improving performance on downstream applications like pose estimation and segmentation.

RANK_REASON This is a research paper describing a new model family.

Read on arXiv cs.CV →

COVERAGE [1]

arXiv cs.CV TIER_1 · Shunsuke Saito · 2026-04-23 13:45

Sapiens2

We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support…

COVERAGE [1]

Sapiens2

RELATED ENTITIES

RELATED TOPICS