Expansion of Global and Dense Open Embeddings Dataset of Earth 🌍
We updated our previous embeddings release with three models MMEarth and DeCUR-S2, DeCUR-S1 of the Major TOM embeddings dataset, developed in collaboration with CloudFerro S.A. asterisk labs and Φ-lab, European Space Agency - ESA. Together with @mikonvergence , Jędrzej S. Bojanowski, we extend the open-access collection of open dataset of Copernicus embeddings built at global scale, providing dense coverage across the entire acquisition area of Sentinel-1 and Sentinel-2 sensors.
Total embedding resources after the update: - 51 TB of AI-embeddings generated from processed Sentinel data, - over 40 billion embedding vectors, - processing of 147 TB of raw satellite data, - analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.
This project delivers open and free vectorized expansions of Major TOM datasets available on CREODIAS and Hugging Face, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.
Hey everyone 🤗! Check out this new Virtual Try Off model (based on SD1.5): 1aurent/TryOffAnyone This model isn't as accurate as others (e.g. xiaozaa/cat-try-off-flux based on FLUX.1) but it sure is fast!
First Global and Dense Open Embedding Dataset of Earth! 🌍 🤗
Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. 🔶 and Φ-lab at the European Space Agency (ESA) 🛰️. Together with @mikonvergence and Jędrzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.
💡 Highlights: 📊 Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data. 🧠 Models: Foundation models include SigLIP, DINOv2, and SSL4EO. 📦 Scale: 62 TB of raw satellite data processed into 170M+ embeddings.
This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.