Update instructions for usage with infinity
#13
by
michaelfeil
- opened
Ready for review!
docker run --gpus all -v $PWD/data:/app/.cache -e HF_TOKEN=$HF_TOKEN -p "7995":"7997" michaelf34/infinity:0.0.68 v2 --model-id BAAI/bge-multilingual-gemma2 --revision "main" --dtype bfloat16 --batch-size 4
--device cuda --engine torch --port 7997 --no-bettertransformer
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO 2024-11-13 00:08:17,113 infinity_emb INFO: infinity_server.py:89
Creating 1engines:
engines=['BAAI/bge-multilingual-gemma2']
INFO 2024-11-13 00:08:17,117 infinity_emb INFO: Anonymized telemetry.py:30
telemetry can be disabled via environment variable
`DO_NOT_TRACK=1`.
INFO 2024-11-13 00:08:17,124 infinity_emb INFO: select_model.py:64
model=`BAAI/bge-multilingual-gemma2` selected, using
engine=`torch` and device=`cuda`
INFO 2024-11-13 00:08:17,241 SentenceTransformer.py:216
sentence_transformers.SentenceTransformer
INFO: Load pretrained SentenceTransformer:
BAAI/bge-multilingual-gemma2
INFO 2024-11-13 00:08:26,938 SentenceTransformer.py:355
sentence_transformers.SentenceTransformer
INFO: 1 prompts are loaded, with the keys:
['web_search_query']
INFO 2024-11-13 00:08:29,054 infinity_emb INFO: Getting select_model.py:97
timings for batch_size=4 and avg tokens per
sentence=2
0.49 ms tokenization
36.43 ms inference
0.09 ms post-processing
37.01 ms total
embeddings/sec: 108.08