nielsr HF staff commited on
Commit
5fed80c
1 Parent(s): d5812f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -13,7 +13,7 @@ Disclaimer: The team releasing DINOv2 did not write a model card for this model
13
 
14
  ## Model description
15
 
16
- The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion at a resolution of 224x224 pixels.
17
 
18
  Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
19
 
 
13
 
14
  ## Model description
15
 
16
+ The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion.
17
 
18
  Images are presented to the model as a sequence of fixed-size patches, which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
19