File size: 1,718 Bytes
151795f d97087d cf212fc 76df626 50d6392 76df626 0c4db1f 50d6392 cf212fc 151795f cb371e9 8920023 cb371e9 6173efc cb371e9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
[InCoder](https://huggingface.co/facebook/incoder-6B) uses a decoder-only Transformer with Causal Masking objective, to train a left-to-right language model to fill in masked token segments, with a context length of 2048. |Model | # parameters | | - | - | | Decoder |1.3B | | Decoder |6.7B | [Causal Masking objective](https://arxiv.org/abs/2201.07520) is a hybrid approach of Causal and Masked language models, "it combines the benefit of per-token generation with optional bi-directionality specifically tailored to prompting". During the training of InCoder, spans of code were randomly masked and moved to the end of each file, which allows for bidirectional context. Figure 1 from InCoder [paper](https://arxiv.org/pdf/2204.05999.pdf) illustrates the training process. So in addition to program synthesis (via left-to-right generation), InCoder can also perform editing (via infilling). The model gives promising results in some zero-shot code infilling tasks such as type prediction, variable re-naming and comment generation. In the code generation demo, at the end of the blog, we use InCoder 1.3B. You can load the model and tokenizer directly from [`transformers`](https://huggingface.co/docs/transformers/index): ```python from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("facebook/incoder-6B") model = AutoModelWithLMHead.from_pretrained("facebook/incoder-6B") inputs = tokenizer("def hello_world():", return_tensors="pt") outputs = model(**inputs) ``` Or you can use a `pipeline`: ```python from transformers import pipeline pipe = pipeline("text-generation", model="facebook/incoder-6B") outputs = pipe("def hello_world():") ``` |