brunorosilva commited on
Commit
3fcf4ec
·
1 Parent(s): 3343b23

docs: update readme

Browse files
Files changed (2) hide show
  1. MIT-LICENSE.txt +20 -0
  2. README.md +39 -16
MIT-LICENSE.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright (c) 2024 Bruno Chicelli
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
README.md CHANGED
@@ -1,4 +1,4 @@
1
- # (WIP) MakeItSports Bot Image-to-Art Search
2
 
3
  This project fine-tunes a Vision Transformer (ViT) model, pre-trained with "google/vit-base-patch32-224-in21k" weights and fine tuned with the style of [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/), to perform image-to-art search across 81k artworks made available by [WikiArt](https://wikiart.org/).
4
 
@@ -6,7 +6,7 @@ This project fine-tunes a Vision Transformer (ViT) model, pre-trained with "goog
6
 
7
  - [Overview](#overview)
8
  - [Installation](#installation)
9
- - [Usage](#usage)
10
  - [Dataset](#dataset)
11
  - [Training](#training)
12
  - [Inference](#inference)
@@ -44,30 +44,35 @@ This project leverages the Vision Transformer (ViT) model architecture for the t
44
 
45
  ### Training
46
 
47
- 1. Fine-tune the ViT model:
48
- ```sh
49
- poetry run python main.py train --epochs 50 --batch_size 32
50
- ```
51
 
52
  ### Inference via Gradio
53
 
54
- 1. Perform image-to-art search using the fine-tuned model:
55
- ```sh
56
- poetry run python main.py interface
57
- ```
 
 
 
 
 
58
 
59
  ### Create new gallery
60
 
61
- 1. If you want to index new images to search, use:
62
- ```sh
63
- poetry run python main.py gallery --gallery_path <your_path>
64
- ```
65
 
66
  ## Dataset
67
 
68
  The dataset derives from 1k images from the Instagram account [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/). Images are downloaded and split into training, validation and test sets. Each image is paired with its corresponding artwork for training purposes, if you want this dataset just ask me stating your usage.
69
 
70
- WikiArt is indexed using the same process, except that there's no expected result. So each artwork is mapped to itself and the embeddings are saved as a numpy file (will be changed to chromadb in the future).
71
 
72
  ## Training
73
 
@@ -75,4 +80,22 @@ The training script fine-tunes the ViT model on the prepared dataset. Key steps
75
 
76
  1. Loading the pre-trained "google/vit-base-patch32-224-in21k" weights.
77
  2. Preparing the dataset and data loaders.
78
- 3. Fine-tuning the model using a custom training loop.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Image-to-Art Search
2
 
3
  This project fine-tunes a Vision Transformer (ViT) model, pre-trained with "google/vit-base-patch32-224-in21k" weights and fine tuned with the style of [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/), to perform image-to-art search across 81k artworks made available by [WikiArt](https://wikiart.org/).
4
 
 
6
 
7
  - [Overview](#overview)
8
  - [Installation](#installation)
9
+ - [How it works](#how-it-works)
10
  - [Dataset](#dataset)
11
  - [Training](#training)
12
  - [Inference](#inference)
 
44
 
45
  ### Training
46
 
47
+ Fine-tune the ViT model:
48
+ ```sh
49
+ make train
50
+ ```
51
 
52
  ### Inference via Gradio
53
 
54
+ Perform image-to-art search using the fine-tuned model:
55
+ ```sh
56
+ make viz
57
+ ```
58
+
59
+ ### Recreate the wikiart gallery
60
+ ```sh
61
+ make wikiart
62
+ ```
63
 
64
  ### Create new gallery
65
 
66
+ If you want to index new images to search, use:
67
+ ```sh
68
+ poetry run python main.py gallery --gallery_path <your_path>
69
+ ```
70
 
71
  ## Dataset
72
 
73
  The dataset derives from 1k images from the Instagram account [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/). Images are downloaded and split into training, validation and test sets. Each image is paired with its corresponding artwork for training purposes, if you want this dataset just ask me stating your usage.
74
 
75
+ WikiArt is indexed using the same process, except that there's no expected result. So each artwork is mapped to itself and the model is used as a feature extractor and the gallery embeddings are saved as a numpy file (will be changed to chromadb in the future).
76
 
77
  ## Training
78
 
 
80
 
81
  1. Loading the pre-trained "google/vit-base-patch32-224-in21k" weights.
82
  2. Preparing the dataset and data loaders.
83
+ 3. Fine-tuning the model using a custom training loop.
84
+ 4. Saving the model to the results folder
85
+
86
+ ## Interface
87
+
88
+ The recommended method to get results is to use [gradio](https://www.gradio.app/) as an interface by running `make viz`. This will open a server and you can use some image you want to search or even use your webcam to get top 4 search results.
89
+
90
+ ### Examples
91
+
92
+ ## Contributing
93
+ There are three topics I'd appreciate help with:
94
+ 1. Increasing the gallery by embedding new painting datasets, the current one has 81k artworks and I really want to up this number to a least 500k;
95
+ 2. Porting the encoding and search to a vector db, preferably chromadb;
96
+ 3. Open issues with how this could be improved. I'm not perfect and the code is very spaghetti right now.
97
+
98
+ ## License
99
+ The source code for the site is licensed under the MIT license, which you can find in the MIT-LICENSE.txt file.
100
+
101
+ All graphical assets are licensed under the Creative Commons Attribution 3.0 Unported License.