Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,9 @@ license: mit
|
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
-
ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt.
|
|
|
|
|
13 |
|
14 |
### Model Description
|
15 |
|
@@ -97,21 +99,33 @@ You can use either a comma or a semicolon to combine multiple keywords. for exam
|
|
97 |
However, a semicolon draws a stronger boundary between the keywords and encourages the model to transfer the last keyword in a given context (previous keywords).
|
98 |
|
99 |
|
100 |
-
## Training Details
|
101 |
-
|
102 |
### Training Data
|
103 |
|
104 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
105 |
|
106 |
-
[More Information Needed]
|
107 |
|
108 |
### Training Procedure
|
109 |
|
|
|
|
|
|
|
110 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
111 |
|
112 |
|
113 |
## Evaluation
|
114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
115 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
116 |
|
117 |
|
|
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
+
ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt.
|
13 |
+
It helps any text-to-image model in figurative or non-lexical language visualizations. It has been shown to be more robust than GPT3.5 Turbo (ChatGPT)
|
14 |
+
in generating depictable and semantically meaningful prompts.
|
15 |
|
16 |
### Model Description
|
17 |
|
|
|
99 |
However, a semicolon draws a stronger boundary between the keywords and encourages the model to transfer the last keyword in a given context (previous keywords).
|
100 |
|
101 |
|
|
|
|
|
102 |
### Training Data
|
103 |
|
104 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
105 |
+
- LyricCanvas dataset: a synthetically generated dataset: will be published soon
|
106 |
|
|
|
107 |
|
108 |
### Training Procedure
|
109 |
|
110 |
+
ViPE has been trained in the standard auto-regressive procedure: given a line (or lines) of lyrics as a prefix, the objective is to generate a plausible
|
111 |
+
prompt that is both despicable and semantically related to the given lyric(c). The loss function does not include the tokens corresponding to the lyrics. So ViPE
|
112 |
+
never generates any original lyrics and only learns to generate visually related prompts.
|
113 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
114 |
|
115 |
|
116 |
## Evaluation
|
117 |
+
In all of the following evaluations, ViPE consistently demonstrates its robustness compared to ChatGPT and achieves performance that is competitive with that of human experts.
|
118 |
+
|
119 |
+
- ***Intrinsic evaluations***
|
120 |
+
- General understanding of figurative language using [Fig-QA dataset](https://huggingface.co/datasets/nightingal3/fig-qa)
|
121 |
+
- ***Extrinsic evaluations***
|
122 |
+
- Image-text Retrieval on the [HAIVMet dataset](https://aclanthology.org/2023.findings-acl.465.pdf)
|
123 |
+
- Emotion visualizations: How well does ViPE transfer emotionally charged tweets into a depictable description of a scene in comparison with
|
124 |
+
ChatGPT. The [Emotion dataset](https://huggingface.co/datasets/dair-ai/emotion) is utilized.
|
125 |
+
- ***Human evaluations***:
|
126 |
+
- We conducted a user study involving 30 native English-speaking participants aged between 20 and 40. Participants were
|
127 |
+
presented with 3 images and a metaphor from the HAIVMet dataset. They were asked to select the images that matches the metaphor the best.
|
128 |
+
The images were generated using prompts from ViPE, ChatGPT, and human experts (HAIVMet).
|
129 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
130 |
|
131 |
|