togethercomputer
/

GPT-JT-6B-v0

Text Generation

Inference Endpoints

Model card Files Files and versions Community

juewang commited on Nov 24, 2022

Commit

41bd193

•

1 Parent(s): fd2fad8

Update README.md

Files changed (1) hide show

README.md +2 -33

README.md CHANGED Viewed

@@ -19,43 +19,12 @@ widget:
   example_title: "Question Answering"
 ---
-<h1>TOGETHER RESEARCH<h1/>
-***!!! Be careful, this repo is still under construction. The content might change recently. !!!***
-# Model Summary
-We present Together-GPT-J-6B-ProxAdam-50x, capable of following human instructions and conduct zero/few-shot inference.
-The model trained in a decentralized fashion with ProxAdam optimizer, requiring only 2% cross-machine communication compared to vanilla data parallel training.
 # Quick Start
 ```python
 from transformers import pipeline
-pipe = pipeline(model='togethercomputer/Together-gpt-J-6B-ProxAdam-50x')
 pipe("Where is Zurich? Ans:")
-```
-# Training Data
-We fine-tune [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) on NI, P3, COT, the pile data.
-- [Natural-Instructions](https://github.com/allenai/natural-instructions)
-- [P3](https://huggingface.co/datasets/Muennighoff/P3)
-- [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json)
-- [the pile](https://huggingface.co/datasets/the_pile)
-The pile is used to keep the general ability of GPT-J.
-Others are instruction-tuning datasets.
-# Hyperparameters
-We used AdamW with a learning rate of 1e-5 and global batch size of 64, and train for 5k steps.
-We used mix-precision training where the activation is in FP16 while the optimizer states are kept in FP32.
-We truncate the input sequence to 2048 tokens, and for input sequence that contains less than 2048 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency.
-# Infrastructure
-We used [the Together Research Computer](https://together.xyz/) to conduct training.
-Specifically, we used 4 data parallel workers, each containing 2 \* A100 80GB GPUs.

   example_title: "Question Answering"
 ---
 # Quick Start
 ```python
 from transformers import pipeline
+pipe = pipeline(model='togethercomputer/GPT-JT-6B-v0')
 pipe("Where is Zurich? Ans:")
+```