Spaces:
Runtime error
Runtime error
brunorosilva
commited on
Commit
·
3fcf4ec
1
Parent(s):
3343b23
docs: update readme
Browse files- MIT-LICENSE.txt +20 -0
- README.md +39 -16
MIT-LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Copyright (c) 2024 Bruno Chicelli
|
2 |
+
|
3 |
+
Permission is hereby granted, free of charge, to any person obtaining
|
4 |
+
a copy of this software and associated documentation files (the
|
5 |
+
"Software"), to deal in the Software without restriction, including
|
6 |
+
without limitation the rights to use, copy, modify, merge, publish,
|
7 |
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8 |
+
permit persons to whom the Software is furnished to do so, subject to
|
9 |
+
the following conditions:
|
10 |
+
|
11 |
+
The above copyright notice and this permission notice shall be
|
12 |
+
included in all copies or substantial portions of the Software.
|
13 |
+
|
14 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15 |
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16 |
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17 |
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18 |
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19 |
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20 |
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
#
|
2 |
|
3 |
This project fine-tunes a Vision Transformer (ViT) model, pre-trained with "google/vit-base-patch32-224-in21k" weights and fine tuned with the style of [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/), to perform image-to-art search across 81k artworks made available by [WikiArt](https://wikiart.org/).
|
4 |
|
@@ -6,7 +6,7 @@ This project fine-tunes a Vision Transformer (ViT) model, pre-trained with "goog
|
|
6 |
|
7 |
- [Overview](#overview)
|
8 |
- [Installation](#installation)
|
9 |
-
- [
|
10 |
- [Dataset](#dataset)
|
11 |
- [Training](#training)
|
12 |
- [Inference](#inference)
|
@@ -44,30 +44,35 @@ This project leverages the Vision Transformer (ViT) model architecture for the t
|
|
44 |
|
45 |
### Training
|
46 |
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
|
52 |
### Inference via Gradio
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
### Create new gallery
|
60 |
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
|
66 |
## Dataset
|
67 |
|
68 |
The dataset derives from 1k images from the Instagram account [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/). Images are downloaded and split into training, validation and test sets. Each image is paired with its corresponding artwork for training purposes, if you want this dataset just ask me stating your usage.
|
69 |
|
70 |
-
WikiArt is indexed using the same process, except that there's no expected result. So each artwork is mapped to itself and the embeddings are saved as a numpy file (will be changed to chromadb in the future).
|
71 |
|
72 |
## Training
|
73 |
|
@@ -75,4 +80,22 @@ The training script fine-tunes the ViT model on the prepared dataset. Key steps
|
|
75 |
|
76 |
1. Loading the pre-trained "google/vit-base-patch32-224-in21k" weights.
|
77 |
2. Preparing the dataset and data loaders.
|
78 |
-
3. Fine-tuning the model using a custom training loop.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Image-to-Art Search
|
2 |
|
3 |
This project fine-tunes a Vision Transformer (ViT) model, pre-trained with "google/vit-base-patch32-224-in21k" weights and fine tuned with the style of [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/), to perform image-to-art search across 81k artworks made available by [WikiArt](https://wikiart.org/).
|
4 |
|
|
|
6 |
|
7 |
- [Overview](#overview)
|
8 |
- [Installation](#installation)
|
9 |
+
- [How it works](#how-it-works)
|
10 |
- [Dataset](#dataset)
|
11 |
- [Training](#training)
|
12 |
- [Inference](#inference)
|
|
|
44 |
|
45 |
### Training
|
46 |
|
47 |
+
Fine-tune the ViT model:
|
48 |
+
```sh
|
49 |
+
make train
|
50 |
+
```
|
51 |
|
52 |
### Inference via Gradio
|
53 |
|
54 |
+
Perform image-to-art search using the fine-tuned model:
|
55 |
+
```sh
|
56 |
+
make viz
|
57 |
+
```
|
58 |
+
|
59 |
+
### Recreate the wikiart gallery
|
60 |
+
```sh
|
61 |
+
make wikiart
|
62 |
+
```
|
63 |
|
64 |
### Create new gallery
|
65 |
|
66 |
+
If you want to index new images to search, use:
|
67 |
+
```sh
|
68 |
+
poetry run python main.py gallery --gallery_path <your_path>
|
69 |
+
```
|
70 |
|
71 |
## Dataset
|
72 |
|
73 |
The dataset derives from 1k images from the Instagram account [ArtButMakeItSports](https://www.instagram.com/artbutmakeitsports/). Images are downloaded and split into training, validation and test sets. Each image is paired with its corresponding artwork for training purposes, if you want this dataset just ask me stating your usage.
|
74 |
|
75 |
+
WikiArt is indexed using the same process, except that there's no expected result. So each artwork is mapped to itself and the model is used as a feature extractor and the gallery embeddings are saved as a numpy file (will be changed to chromadb in the future).
|
76 |
|
77 |
## Training
|
78 |
|
|
|
80 |
|
81 |
1. Loading the pre-trained "google/vit-base-patch32-224-in21k" weights.
|
82 |
2. Preparing the dataset and data loaders.
|
83 |
+
3. Fine-tuning the model using a custom training loop.
|
84 |
+
4. Saving the model to the results folder
|
85 |
+
|
86 |
+
## Interface
|
87 |
+
|
88 |
+
The recommended method to get results is to use [gradio](https://www.gradio.app/) as an interface by running `make viz`. This will open a server and you can use some image you want to search or even use your webcam to get top 4 search results.
|
89 |
+
|
90 |
+
### Examples
|
91 |
+
|
92 |
+
## Contributing
|
93 |
+
There are three topics I'd appreciate help with:
|
94 |
+
1. Increasing the gallery by embedding new painting datasets, the current one has 81k artworks and I really want to up this number to a least 500k;
|
95 |
+
2. Porting the encoding and search to a vector db, preferably chromadb;
|
96 |
+
3. Open issues with how this could be improved. I'm not perfect and the code is very spaghetti right now.
|
97 |
+
|
98 |
+
## License
|
99 |
+
The source code for the site is licensed under the MIT license, which you can find in the MIT-LICENSE.txt file.
|
100 |
+
|
101 |
+
All graphical assets are licensed under the Creative Commons Attribution 3.0 Unported License.
|