metadata

license: mit
language:
  - en
pipeline_tag: text2text-generation

T5-Base Job Description to Resume JSON

This model fine-tunes google/t5-base to convert job descriptions into structured resume JSON data.

Model description

This model is based on the T5-base architecture fine-tuned on a dataset of 10,000 job description and resume pairs. It takes a job description as input and generates a JSON representation of a resume tailored to that job.

Base model: google/t5-base

Fine-tuning task: Text-to-JSON conversion

Training data: 10,000 job description and resume pairs

Intended uses & limitations

Intended uses:

Generating structured resume data from job descriptions
Assisting job seekers in tailoring resumes to specific job postings
Automating parts of the resume creation process

Limitations:

The model's output quality depends on the input job description's detail and clarity
Generated resumes may require human review and editing
The model may not capture nuanced or industry-specific requirements
The model is not tokenized to output "{" or "}", and instead uses "RB>" and "LB>" respectively

Training data

The model was trained on 10,000 pairs of job descriptions and corresponding resume JSON data. The data distribution and any potential biases in the training set are not specified.

Training procedure

The model was fine-tuned using the standard T5 text-to-text framework. Specific hyperparameters and training details are not provided.

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand

from transformers import T5Tokenizer, T5ForConditionalGeneration

def load_model_and_tokenizer(model_path):
    """
    Load the tokenizer and model from the specified path.
    """
    tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base")
    model = T5ForConditionalGeneration.from_pretrained(model_path)
    return tokenizer, model

def generate_text(prompt, tokenizer, model):
    """
    Generate text using the model based on the given prompt.
    """
    # Encode the input prompt to get the tensor
    input_ids = tokenizer(prompt, return_tensors="pt", padding=True).input_ids

    # Generate the output using the model
    outputs = model.generate(input_ids, max_length=512, num_return_sequences=1)

    # Decode the output tensor to human-readable text
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

def main():
    model_path = "nakamoto-yama/t5-resume-generation"
    print(f"Loading model and tokenizer from {model_path}")
    tokenizer, model = load_model_and_tokenizer(model_path)
    
    # Test the model with a prompt
    while True:
        prompt = input("Enter a job description or title: ")
        if prompt.lower() == 'exit':
            break
        response = generate_text(f"generate resume JSON for the following job: {prompt}", tokenizer, model)
        response = response.replace("LB>", "{").replace("RB>", "}")
        print(f"Generated Response: {response}")

if __name__ == "__main__":
    main()

See the Hugging Face T5 docs and a Colab Notebook created by the model developers for more examples.

Ethical considerations

This model automates part of the resume creation process, which could have implications for job seeking and hiring practices. Users should be aware of potential biases in the training data that may affect the generated resumes.

Additional information

For more details on the base T5 model, refer to the T5 paper and the google/t5-base model card.