fittar
/

ViPE-M-CTX7

@@ -9,7 +9,9 @@ license: mit
 <!-- Provide a quick summary of what the model is/does. -->
-ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt. It helps any text-to-image model in figurative or non-lexical language visualizations.
 ### Model Description
@@ -97,21 +99,33 @@ You can use either a comma or a semicolon to combine multiple keywords. for exam
 However, a semicolon draws a stronger boundary between the keywords and encourages the model to transfer the last keyword in a given context (previous keywords).
-## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->

 <!-- Provide a quick summary of what the model is/does. -->
+ViPE: Visualize Pretty-much Everything, is the first automated model for translating any arbitrary piece of text into a visualizable prompt.
+It helps any text-to-image model in figurative or non-lexical language visualizations. It has been shown to be more robust than GPT3.5 Turbo (ChatGPT)
+in generating depictable and semantically meaningful prompts.
 ### Model Description
 However, a semicolon draws a stronger boundary between the keywords and encourages the model to transfer the last keyword in a given context (previous keywords).
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+- LyricCanvas dataset: a synthetically generated dataset: will be published soon
 ### Training Procedure
+ViPE has been trained in the standard auto-regressive procedure: given a line (or lines) of lyrics as a prefix, the objective is to generate a plausible
+prompt that is both despicable and semantically related to the given lyric(c). The loss function does not include the tokens corresponding to the lyrics. So ViPE
+never generates any original lyrics and only learns to generate visually related prompts.
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 ## Evaluation
+In all of the following evaluations, ViPE consistently demonstrates its robustness compared to ChatGPT and achieves performance that is competitive with that of human experts.
+- ***Intrinsic evaluations***
+  - General understanding of figurative language using [Fig-QA dataset](https://huggingface.co/datasets/nightingal3/fig-qa)
+- ***Extrinsic evaluations***
+  - Image-text Retrieval on the [HAIVMet dataset](https://aclanthology.org/2023.findings-acl.465.pdf)
+  - Emotion visualizations: How well does ViPE transfer emotionally charged tweets into a depictable description of a scene in comparison with
+    ChatGPT. The [Emotion dataset](https://huggingface.co/datasets/dair-ai/emotion) is utilized.
+- ***Human evaluations***:
+  - We conducted a user study involving 30 native English-speaking participants aged between 20 and 40. Participants were
+  presented with 3 images and a metaphor from the HAIVMet dataset. They were asked to select the images that matches the metaphor the best.
+  The images were generated using prompts from ViPE, ChatGPT, and human experts (HAIVMet).
 <!-- This section describes the evaluation protocols and provides the results. -->