osanseviero's picture
Review blog post
c26d8af
|
raw
history blame
1.17 kB

The CodeGen architecture follows a standard transformer decoder with left-to-right causal masking. With rotary position embedding for the positional encoding (Su et al., 2021), and a context length of 2048. CodeGen models are trained in various sizes.

You can load the model and tokenizer directly from 🤗 transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('Salesforce/codegen-16B-mono')
model = AutoModelForCausalLM.from_pretrained('Salesforce/codegen-16B-mono')

inputs = tokenizer("def hello_world():", return_tensors="pt")
outputs = model(**inputs)