blumenstiel commited on
Commit
72e38e2
1 Parent(s): f4c1974

Update ReadMe

Browse files
.gitattributes CHANGED
@@ -33,4 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- *.tif filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.tif filter=lfs diff=lfs merge=lfs -text
37
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,5 +1,11 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
4
 
5
  # Prithvi-EO-2.0
@@ -8,11 +14,13 @@ Prithvi-EO-2.0 is the second generation EO foundation model jointly developed by
8
 
9
  ## Architecture Overview
10
 
11
- Prithvi-EO-2.0 is based on the ViT architecture, pre-trained using a masked autoencoder (MAE) approach, with two major modifications as shown in the figure below. First, we introduce a random dropout mechanism that completely removes different bands before the patch embeddings, with the aim of improving the ability of the model to deal with missingness of data. Second, we make modifications to support inputs with temporal and multi-spectral characteristics.
12
 
13
- ![model_architecture](assets/modal_architecture.jpg)
14
 
15
- Our main modifications to the ViT architecture are the 3D positional embedding and the 3D patch embedding, which are required to deal with spatiotemporal data. We have also included metadata and process metadata about the actual geolocation (e.g. latitude and longitude) and date (i.e. year and day-of-year ranging 1-365). This is done by adding biases that are calculated via 2D sine-cosine positional encoding and added to the 3D positional embeddings and 3D patch embeddings via a learned weighted sum (i.e. the weight given is a parameter learned during pretraining). Since this metadata is often not available, we pretrained Prithvi-EO-2.0 allowing for this to be absent via a dropout.
 
 
16
 
17
  ## Pre-trained Models
18
 
@@ -23,12 +31,12 @@ Our main modifications to the ViT architecture are the 3D positional embedding a
23
  |Prithvi-EO-2.0-600M | Pretrained 600M parameter model | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M) | |
24
  |Prithvi-EO-2.0-600M-TL | Pretrained 600M parameter model with temporal and location embeddings | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL) |
25
 
26
- The models were pre-trained at the Julich Supercomputing Center with NASA's HLS V2 product (30m granularity) using 4.2M samples with six bands in the following order: Blue, Green, Red, Narrow NIR, SWIR, SWIR 2.
27
 
28
  ## Benchmarking
29
- The model was benchmarked on GEO-Bench across 12 different earth observation classification and segmentation tasks at different resolutions against some of the most popular geospatial foundation models. Below the average score across all GEO-Bench tasks is shown.
30
 
31
- ![geobench_overall_600M_TL.png](assets/geobench_overall_600M_TL.png)
32
 
33
  ## Demo and inference
34
  We provide a **demo** running Prithvi-EO-2.0-300M-TL [here](https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-EO-2.0-Demo).
@@ -41,7 +49,7 @@ python inference.py --data_files t1.tif t2.tif t3.tif t4.tif --input_indices <op
41
 
42
  ## Finetuning
43
 
44
- You can finetune the model using [TerraTorch](https://github.com/IBM/terratorch).
45
 
46
  ### Feedback
47
 
@@ -52,9 +60,9 @@ Your feedback is invaluable to us. If you have any feedback about the model, ple
52
  If this model helped your research, please cite `Prithvi-EO-2.0` in your publications. Here are two BibTeX entries as examples:
53
 
54
  ```
55
- @article{Prithvi-EO-2-preprint,
56
- author = {},
57
- title = {{Title}},
58
  journal = {arxiv},
59
  year = {2024}
60
  }
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - Pytorch
5
+ - Earth Observation
6
+ - Foundation Model
7
+ - NASA
8
+ - IBM
9
  ---
10
 
11
  # Prithvi-EO-2.0
 
14
 
15
  ## Architecture Overview
16
 
17
+ Prithvi-EO-2.0 is based on the ViT architecture, pretrained using a masked autoencoder (MAE) approach, with two major modifications as shown in the figure below.
18
 
19
+ ![model_architecture](assets/model_architecture.png)
20
 
21
+ First, we replaced the 2D patch embeddings and 2D positional embeddings with 3D versions to support inputs with spatiotemporal characteristics, i.e., a sequence of T images of size (H, W). Our 3D patch embeddings consist of a 3D convolutional layer, dividing the 3D input into non-overlapping cubes of size (t, h, w) for time, height, and width dimensions, respectively. For the 3D positional encodings, we first generate 1D sin/cos encodings individually for each dimension and then combine them together into a single, 3D positional encoding.
22
+
23
+ Second, we considered geolocation (center latitude and longitude) and date of acquisition (year and day-of-year ranging 1-365) in the pretraining of the TL model versions. Both encoder and decoder receive time and location information for each sample and encodes them independently using 2D sin/cos encoding. They are added to the embedded tokens via a weighted sum with learned weights: one for time and one for location and separate weights for encoder and decoder. Since this metadata is often not available, we added a drop mechanism during pretraining that randomly drops the geolocation and/or the temporal data to help the model learn how to handle the absence of this information.
24
 
25
  ## Pre-trained Models
26
 
 
31
  |Prithvi-EO-2.0-600M | Pretrained 600M parameter model | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M) | |
32
  |Prithvi-EO-2.0-600M-TL | Pretrained 600M parameter model with temporal and location embeddings | [https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL) |
33
 
34
+ The models were pre-trained at the Jülich Supercomputing Centre with NASA's HLS V2 product (30m granularity) using 4.2M samples with six bands in the following order: Blue, Green, Red, Narrow NIR, SWIR, SWIR 2.
35
 
36
  ## Benchmarking
37
+ We validated the Prithvi-EO-2.0 models through extensive experiments using [GEO-bench](https://github.com/ServiceNow/geo-bench). Prithvi-EO-2.0-600M-TL outperforms the previous Prithvi-EO model by 8% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1m to 15m).
38
 
39
+ ![overall_v2_600_tl.png](assets%2Foverall_v2_600_tl.png)
40
 
41
  ## Demo and inference
42
  We provide a **demo** running Prithvi-EO-2.0-300M-TL [here](https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-EO-2.0-Demo).
 
49
 
50
  ## Finetuning
51
 
52
+ You can finetune the model using [TerraTorch](https://github.com/IBM/terratorch). Examples of configs and notebooks are provided in the project repository: [github.com/NASA-IMPACT/Prithvi-EO-2.0](https://github.com/NASA-IMPACT/Prithvi-EO-2.0#fine-tuning).
53
 
54
  ### Feedback
55
 
 
60
  If this model helped your research, please cite `Prithvi-EO-2.0` in your publications. Here are two BibTeX entries as examples:
61
 
62
  ```
63
+ @article{Prithvi-EO-2-preprint,
64
+ author = {Szwarcman, Daniela and Roy, Sujit and Fraccaro, Paolo and Gíslason, Þorsteinn Elí and Blumenstiel, Benedikt and Ghosal, Rinki and de Oliveira, Pedro Henrique and de Sousa Almeida, João Lucas and Sedona, Rocco and Kang, Yanghui and Chakraborty, Srija and Wang, Sizhe and Kumar, Ankur and Truong, Myscon and Godwin, Denys and Lee, Hyunho and Hsu, Chia-Yu and Akbari Asanjan, Ata and Mujeci, Besart and Keenan, Trevor and Arévolo, Paulo and Li, Wenwen and Alemohammad, Hamed and Olofsson, Pontus and Hain, Christopher and Kennedy, Robert and Zadrozny, Bianca and Cavallaro, Gabriele and Watson, Campbell and Maskey, Manil and Ramachandran, Rahul and Bernabe Moreno, Juan},
65
+ title = {{Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications}},
66
  journal = {arxiv},
67
  year = {2024}
68
  }
assets/geobench_overall_600M_TL.png DELETED
Binary file (440 kB)
 
assets/logos.png DELETED
Binary file (486 kB)
 
assets/overall_v2_600_tl.png ADDED