rom1504 commited on
Commit
1bc5d11
1 Parent(s): d490305

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md CHANGED
@@ -1,3 +1,121 @@
1
  ---
2
  license: mit
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ widget:
4
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png
5
+ candidate_labels: playing music, playing sports
6
+ example_title: Cat & Dog
7
  ---
8
+ # Model Card for CLIP ViT-B/32 xlm roberta base - LAION-5B
9
+
10
+ # Table of Contents
11
+
12
+ 1. [Model Details](#model-details)
13
+ 2. [Uses](#uses)
14
+ 3. [Training Details](#training-details)
15
+ 4. [Evaluation](#evaluation)
16
+ 5. [Acknowledgements](#acknowledgements)
17
+ 6. [Citation](#citation)
18
+ 7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
19
+
20
+
21
+ # Model Details
22
+
23
+ ## Model Description
24
+
25
+ A CLIP ViT-B/32 xlm roberta base model trained with the LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
26
+
27
+ Model training done by Romain Beaumont on the [stability.ai](https://stability.ai/) cluster.
28
+
29
+ # Uses
30
+
31
+ ## Direct Use
32
+
33
+ Zero-shot image classification, image and text retrieval, among others.
34
+
35
+ ## Downstream Use
36
+
37
+ Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others.
38
+
39
+ # Training Details
40
+
41
+ ## Training Data
42
+
43
+ This model was trained with the full LAION-5B (https://laion.ai/blog/laion-5b/).
44
+
45
+ ## Training Procedure
46
+
47
+ Training with batch size 90k for 13B sample of laion5B, see https://wandb.ai/rom1504/open-clip/reports/xlm-roberta-base-B-32--VmlldzoyOTQ5OTE2
48
+
49
+ Model is B/32 on visual side, xlm roberta base initialized with pretrained weights on text side.
50
+
51
+ # Evaluation
52
+
53
+ Evaluation done with code in the [LAION CLIP Benchmark suite](https://github.com/LAION-AI/CLIP_benchmark).
54
+
55
+ ## Testing Data, Factors & Metrics
56
+
57
+ ### Testing Data
58
+
59
+ The testing is performed with VTAB+ (A combination of VTAB (https://arxiv.org/abs/1910.04867) w/ additional robustness datasets) for classification and COCO and Flickr for retrieval.
60
+
61
+ ## Results
62
+
63
+ The model achieves
64
+ * imagenet 1k 62.33% (vs 62.9% for baseline)
65
+ * mscoco 63.4% (vs 60.8% for baseline)
66
+ * flickr30k 86.2% (vs 85.4% for baseline)
67
+
68
+ This model also have multilingual performance.
69
+
70
+ ![metrics](unknown.png)
71
+
72
+ # Acknowledgements
73
+
74
+ Acknowledging [stability.ai](https://stability.ai/) for the compute used to train this model.
75
+
76
+ # Citation
77
+
78
+ **BibTeX:**
79
+
80
+ In addition to forthcoming LAION-5B (https://laion.ai/blog/laion-5b/) paper, please cite:
81
+
82
+ OpenAI CLIP paper
83
+ ```
84
+ @inproceedings{Radford2021LearningTV,
85
+ title={Learning Transferable Visual Models From Natural Language Supervision},
86
+ author={Alec Radford and Jong Wook Kim and Chris Hallacy and A. Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
87
+ booktitle={ICML},
88
+ year={2021}
89
+ }
90
+ ```
91
+
92
+ OpenCLIP software
93
+ ```
94
+ @software{ilharco_gabriel_2021_5143773,
95
+ author = {Ilharco, Gabriel and
96
+ Wortsman, Mitchell and
97
+ Wightman, Ross and
98
+ Gordon, Cade and
99
+ Carlini, Nicholas and
100
+ Taori, Rohan and
101
+ Dave, Achal and
102
+ Shankar, Vaishaal and
103
+ Namkoong, Hongseok and
104
+ Miller, John and
105
+ Hajishirzi, Hannaneh and
106
+ Farhadi, Ali and
107
+ Schmidt, Ludwig},
108
+ title = {OpenCLIP},
109
+ month = jul,
110
+ year = 2021,
111
+ note = {If you use this software, please cite it as below.},
112
+ publisher = {Zenodo},
113
+ version = {0.1},
114
+ doi = {10.5281/zenodo.5143773},
115
+ url = {https://doi.org/10.5281/zenodo.5143773}
116
+ }
117
+ ```
118
+
119
+ # How To Get Started With the Model
120
+
121
+ https://github.com/mlfoundations/open_clip