Planktonzilla: Multimodal dataset and models for understanding plankton ecosystems
Researchers have introduced Planktonzilla-17M, a new dataset containing 17.4 million images of plankton, making it the largest of its kind. This dataset aims to improve plankton classification by consolidating images from thirteen different imaging systems and standardizing taxonomy and metadata. Experiments using Planktonzilla-17M showed that supervised classification with taxonomic lineage as text performed comparably to or better than CLIP-style image-text training, and highlighted limitations in current biological foundation models for marine imaging. AI
IMPACT Establishes a new benchmark for marine imaging AI, potentially improving ecological monitoring and climate modeling.