Zero-Shot Image Classification
TiC-CLIP
vision
File size: 6,265 Bytes
e3d1642
 
 
 
 
 
 
 
 
 
5060ad9
e3d1642
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5060ad9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3d1642
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: other
license_name: custom-apple-license
license_link: https://github.com/apple/ml-tic-clip/blob/main/LICENSE
tags:
- vision
- zero-shot-image-classification
datasets:
- apple/TiC-DataComp
---
# Model Card for TiC-CLIP-basic-cumulative

<!-- Provide a quick summary of what the model is/does. -->

This repository contains TiC-CLIP models trained on TiC-DataComp-Yearly with data from 2014 to 2022 using our modified OpenCLIP code.
For additional information refer to our [GitHub repo](https://github.com/apple/ml-tic-clip).

## Model Details

### Model Description

Keeping large foundation models up to date on latest data is inherently expensive.
To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models.
This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines.
We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models:
TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset,
contains over 12.7B timestamped image-text pairs spanning 9 years (2014-2022).
We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models.
We show OpenAI's CLIP (trained on data up to 2020) loses ≈8% zero-shot accuracy on our curated retrieval task from 2021-2022 compared with more recently trained models in OpenCLIP repository.
We then study how to efficiently train models on time-continuous data.
We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by 2.5× when compared to the standard practice of retraining from scratch.
Code is available at [this https URL](https://github.com/apple/ml-tic-clip).



- **Developed by:** Apple
- **License:** See [LICENSE](https://github.com/apple/ml-tic-clip/blob/main/LICENSE)

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [ml-tic-clip GitHub repo](https://github.com/apple/ml-tic-clip)
- **Paper:** [TiC-CLIP: Continual Training of CLIP Models, Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F., International Conference on Learning Representations (ICLR), 2024.](https://arxiv.org/abs/2310.16226)

## Uses

Researchers can use TiC-CLIP pretrained models for faster design of continual learning methods by start from a pretrained checkpoint and continually train on the next year or next month data.

## How to Get Started with the Model

The models are compatible with DataComp evaluation suite and our patched version of DataComp for evaluation on TiC-DataComp-Retrieval and TiC-DataCompNet.
The models can also be used to resume a training or as initialization for new training using OpenCLIP code.
Please follow instructions in our [GitHub repo](https://github.com/apple/ml-tic-clip) to create the evaluation sets or follow [DataComp](https://github.com/mlfoundations/datacomp) for the standard evaluations on 38 datasets.

The following snippet assumes the TiC-DataComp data has been prepared and following the instructions in the GitHub repo.
```bash
YEAR=2016 # There are no models before 2016 since data from 2014-2016 were compined into one year
REPO="apple/TiC-CLIP-basic-cumulative"
huggingface-cli download $REPO checkpoints/$YEAR.pt

## Train Cummulative
pushd datacomp
final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/
torchrun --nproc_per_node 8 --nnodes 1 \
    train.py \
    --scale "tic_medium" \
    --dataset_resampled \
    --data_dir $final_data_dir \
    --output_dir "./results/" \
    --exp_name "datacomp_medium-basic_cumulative" \
    --imagenet_val  $IMAGENET_VAL_PATH  \
    --save_frequency 1 \
    --resume
popd

## Evaluate Model
# Evaluate a ViT-B/16 model on TiC/Retrieval/Yearly/$YEAR and
# TiC/DataCompNet/Yearly/$YEAR
pushd datacomp
python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly
python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification
```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

Please refer to Sections 2-3 of our [TiC-CLIP](https://github.com/apple/ml-tic-clip) paper.

#### Preprocessing [optional]

[More Information Needed]


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary



## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

[More Information Needed]

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]