Rotary Scaling Factor of 4 for 8k context (Do not merge)
#23
by
nbroad
HF staff
- opened
This is a revision that updates the "rotary_scaling_factor" to 4.0 which corresponds with a sequence length of 8192 tokens.
This PR should not be merged, as it is intended only for usage in TEI by specifying the revision argument.
Here is how you can use this model:
model=nomic-ai/nomic-embed-text-v1.5
revision=refs/pr/23
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model --revision $revision
This indicates the scaling factor is 4 for 8k context; the model card documentation indicates that the scaling factor is 2. For a full 8k context, which rotary_scaling_factor
is recommended?
The model natively supports scaling of the sequence length past 2048 tokens. To do so,
- tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') + tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192) - model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True) + model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True, rotary_scaling_factor=2)