manojpreveen
/

gpt-neoxt-20b-v9

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Instruction Tuned GPT-NeoXT-20B model on Instruction Tuning dataset as listed below (~5.2M data) using Colossal AI

Base Model: togethercomputer/GPT-NeoXT-Chat-Base-20B (GPT-NeoXT-Chat-Base-20B-v0.16 - fine-tuned on feedback data)

Training Details :

Epochs: 2
Batch Size : 5 instantaneous per device x 1 gradient accumulation steps x 8 gpus = 40
Block Size : 2020
Weight Decay : 0
Learning Rate : 1e-6
Learning Rate Scheduler Type : Cosine
Number of warmup steps : 600
Machine : 8xA100 80GB

Training Data Specifics :

Labels are similar to Input ids but with "human" responses and pad tokens masked so that they don't contribute during the model's error calculation.
Block Size is 2020, Multiple instructions are clubbed together in each data.
"###" is the EOS Token used in the data.

Downloads last month: 12

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train manojpreveen/gpt-neoxt-20b-v9