TerraMind: Large-Scale Generative Multimodality for Earth Observation
Researchers have introduced TerraMind, a novel multimodal foundation model designed for Earth observation tasks. This model uniquely combines token-level and pixel-level data representations, allowing it to capture both high-level contextual information and fine-grained spatial details. TerraMind demonstrates strong zero-shot and few-shot learning capabilities, introduces a new technique called "Thinking-in-Modalities" (TiM) for data augmentation during fine-tuning and inference, and achieves state-of-the-art performance on benchmarks like PANGAEA. The model, its pretraining dataset, and associated code are publicly available under a permissive license. AI
IMPACT Introduces a new multimodal foundation model for Earth observation, potentially advancing capabilities in geospatial data analysis and application.