fittar commited on
Commit
36051b1
·
1 Parent(s): 5459b37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -5
README.md CHANGED
@@ -9,7 +9,9 @@ license: mit
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
12
- ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt. It helps any text-to-image model in figurative or non-lexical language visualizations.
 
 
13
 
14
  ### Model Description
15
 
@@ -97,21 +99,33 @@ You can use either a comma or a semicolon to combine multiple keywords. for exam
97
  However, a semicolon draws a stronger boundary between the keywords and encourages the model to transfer the last keyword in a given context (previous keywords).
98
 
99
 
100
- ## Training Details
101
-
102
  ### Training Data
103
 
104
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
105
 
106
- [More Information Needed]
107
 
108
  ### Training Procedure
109
 
 
 
 
110
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
111
 
112
 
113
  ## Evaluation
114
-
 
 
 
 
 
 
 
 
 
 
 
115
  <!-- This section describes the evaluation protocols and provides the results. -->
116
 
117
 
 
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
12
+ ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt.
13
+ It helps any text-to-image model in figurative or non-lexical language visualizations. It has been shown to be more robust than GPT3.5 Turbo (ChatGPT)
14
+ in generating depictable and semantically meaningful prompts.
15
 
16
  ### Model Description
17
 
 
99
  However, a semicolon draws a stronger boundary between the keywords and encourages the model to transfer the last keyword in a given context (previous keywords).
100
 
101
 
 
 
102
  ### Training Data
103
 
104
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
105
+ - LyricCanvas dataset: a synthetically generated dataset: will be published soon
106
 
 
107
 
108
  ### Training Procedure
109
 
110
+ ViPE has been trained in the standard auto-regressive procedure: given a line (or lines) of lyrics as a prefix, the objective is to generate a plausible
111
+ prompt that is both despicable and semantically related to the given lyric(c). The loss function does not include the tokens corresponding to the lyrics. So ViPE
112
+ never generates any original lyrics and only learns to generate visually related prompts.
113
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
114
 
115
 
116
  ## Evaluation
117
+ In all of the following evaluations, ViPE consistently demonstrates its robustness compared to ChatGPT and achieves performance that is competitive with that of human experts.
118
+
119
+ - ***Intrinsic evaluations***
120
+ - General understanding of figurative language using [Fig-QA dataset](https://huggingface.co/datasets/nightingal3/fig-qa)
121
+ - ***Extrinsic evaluations***
122
+ - Image-text Retrieval on the [HAIVMet dataset](https://aclanthology.org/2023.findings-acl.465.pdf)
123
+ - Emotion visualizations: How well does ViPE transfer emotionally charged tweets into a depictable description of a scene in comparison with
124
+ ChatGPT. The [Emotion dataset](https://huggingface.co/datasets/dair-ai/emotion) is utilized.
125
+ - ***Human evaluations***:
126
+ - We conducted a user study involving 30 native English-speaking participants aged between 20 and 40. Participants were
127
+ presented with 3 images and a metaphor from the HAIVMet dataset. They were asked to select the images that matches the metaphor the best.
128
+ The images were generated using prompts from ViPE, ChatGPT, and human experts (HAIVMet).
129
  <!-- This section describes the evaluation protocols and provides the results. -->
130
 
131