v2 Model Card
Overview
This repository contains multiple models trained using the GPT-2 architecture for generating creative stories, superhero names, and abilities. The models are designed to assist in generating narrative content based on user prompts.
Model Variants
- Story Model: Generates stories based on prompts.
- Name Model: Generates superhero names based on story context.
- Abilities Model: Generates superhero abilities based on story context.
- Midjourney Model: Generates mid-journey prompts for storytelling.
Training Data
The models were trained on a custom dataset stored in batch_ds_v2.txt
, which includes various story prompts, superhero names, and abilities. The dataset was preprocessed to extract relevant parts for training.
Training Procedure
- Framework: PyTorch with Hugging Face Transformers
- Model: GPT-2
- Training Arguments:
- Learning Rate: 1e-4
- Number of Epochs: 15
- Max Steps: 5000
- Batch Size: Auto-detected
- Gradient Clipping: 1.0
- Logging Steps: 1
Evaluation
The models were evaluated based on their ability to generate coherent and contextually relevant text. Specific metrics were not provided, but qualitative assessments were made during development.
Inference
To use the models for inference, you can send a POST request to the /generate/<model_path>
endpoint of the Flask application. The input should be a JSON object containing the input_text
key.
Example Request
{
"input_text": "[Ivan Ivanov, Lead Software Engineer, Superhero for Justice, Writing code, fixing issues, solving problems, Masculine, Long Hair, Adult]<|endoftext|>"
}
Usage
Loading a Model
You can load a model and its tokenizer as follows:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_name = "v2/story/small" # Change to your desired model path
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
Generating Text
To generate text using the loaded model, use the following code:
input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=50, do_sample=True)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
Limitations
- The models may generate biased or nonsensical outputs based on the training data.
- They may not always understand complex prompts or context, leading to irrelevant or inaccurate responses.
- The models are sensitive to input phrasing; slight changes in the prompt can yield different results.