chs20 commited on
Commit
b15c868
·
verified ·
1 Parent(s): 893f30a
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: mit
4
+ library_name: open_clip
5
+ pipeline_tag: zero-shot-image-classification
6
+ ---
7
+ [[Paper]](https://openreview.net/forum?id=e3scLKNiNg&noteId=e3scLKNiNg) [[GitHub]](https://github.com/fra31/perceptual-metrics)
8
+
9
+ Robust perceptual metric, based on CLIP model `laion/CLIP-ViT-B-16-laion2B-s34B-b88K`
10
+
11
+ Adversarially fine-tuned with TeCoA ([Mao et al. (2023)](https://arxiv.org/abs/2212.07016)) on ImageNet with infinity-norm and radius 4/255.
12
+
13
+ Performance on the perceptual similarity task [NIGHTS](https://dreamsim-nights.github.io):
14
+ ```
15
+ Clean L-inf, eps=4/255 L2, eps=3
16
+ 91.9 79.4 77.1
17
+ ```
18
+
19
+ ## Usage
20
+ ```python
21
+ model, _, image_processor = open_clip.create_model_and_transforms('hf-hub:chs20/TeCoA4-ViT-B-16-laion2B-s34B-b88K')
22
+ ```
23
+
24
+ ## Citation
25
+ If you find this model useful, please consider citing our papers:
26
+ ```bibtex
27
+ @inproceedings{croce2024adversarially,
28
+ title={Adversarially Robust CLIP Models Induce Better (Robust) Perceptual Metrics},
29
+ author={Croce, Francesco and Schlarmann, Christian and Singh, Naman Deep and Hein, Matthias},
30
+ year={2024},
31
+ booktitle={{ICML Workshop on Foundation Models in the Wild}}
32
+ }
33
+ ```
34
+
35
+ ```bibtex
36
+ @inproceedings{schlarmann2024robustclip,
37
+ title={Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models},
38
+ author={Schlarmann, Christian and Singh, Naman Deep and Croce, Francesco and Hein, Matthias},
39
+ year={2024},
40
+ booktitle={{ICML}}
41
+ }
42
+ ```
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
open_clip_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_cfg": {
3
+ "embed_dim": 512,
4
+ "vision_cfg": {
5
+ "image_size": 224,
6
+ "layers": 12,
7
+ "width": 768,
8
+ "patch_size": 16
9
+ },
10
+ "text_cfg": {
11
+ "context_length": 77,
12
+ "vocab_size": 49408,
13
+ "width": 512,
14
+ "heads": 8,
15
+ "layers": 12
16
+ }
17
+ },
18
+ "preprocess_cfg": {
19
+ "mean": [
20
+ 0.48145466,
21
+ 0.4578275,
22
+ 0.40821073
23
+ ],
24
+ "std": [
25
+ 0.26862954,
26
+ 0.26130258,
27
+ 0.27577711
28
+ ],
29
+ "interpolation": "bicubic",
30
+ "resize_mode": "shortest"
31
+ }
32
+ }
open_clip_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e687adcdbcc9694947b6bf590d768cf667eddf04a5800c511fa4ffded4c48510
3
+ size 598516980
open_clip_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b94945ab57ac75cbac140ecbe33fbb150c568560dd275f04c101c0c050500511
3
+ size 598599478
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "49406": {
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49407": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ }
20
+ },
21
+ "bos_token": "<|startoftext|>",
22
+ "clean_up_tokenization_spaces": true,
23
+ "do_lower_case": true,
24
+ "eos_token": "<|endoftext|>",
25
+ "errors": "replace",
26
+ "model_max_length": 77,
27
+ "pad_token": "<|endoftext|>",
28
+ "tokenizer_class": "CLIPTokenizer",
29
+ "unk_token": "<|endoftext|>"
30
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff