How does pre-training on contiguous United States data affect model performance in other regions?

#17
by hyzhak - opened

It was mentioned in Readme

The model was pre-trained with NASA's HLS V2 L30 product (30m granularity) from the contiguous United States

Have you tested your model on other continents? Did you notice any differences in performance there? I can expect that spatial data would be very different there.

IBM-NASA Prithvi Models Family org

We finetuned the model also to different regions (e.g. sen1floods11 includes data from 11 regions around the world). This led to state of the art performance on this task, so the model was able to generalise. If input data for finetuning would be completely different than what was provided during pretraining, the model will take longer to converge to the new data.

This led to state of the art performance on this task, so the model was able to generalise.

For the flood task specifically, I think that's persuasive. Have you tested generalization for the other 3 datasets?

Crop segmentation is particularly interesting. Besides forest and vegetation, your dataset's largest classes are corn and soybeans. Those are huge in the U.S., but less so in many other parts of the world. I'd be interested in its ability to detect rice in Asia or cassava and sugar cane in Africa. Even if you stayed in the U.S., how does it do on coffee in Hawaii? I ask because some of the use-cases you're testing overlap with the UN's sustainable development goals, so there is potential for large impact. Unfortunately, there is often a focus on data from Western countries, so it can be hard to figure out how applicable these types of models are for the Global South.

Sign up or log in to comment