Update instructions for usage with infinity

#13
by michaelfeil - opened

Ready for review!

docker run --gpus all  -v $PWD/data:/app/.cache -e HF_TOKEN=$HF_TOKEN -p "7995":"7997" michaelf34/infinity:0.0.68 v2 --model-id BAAI/bge-multilingual-gemma2 --revision "main" --dtype bfloat16 --batch-size 4 
--device cuda --engine torch --port 7997 --no-bettertransformer
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2024-11-13 00:08:17,113 infinity_emb INFO:        infinity_server.py:89
         Creating 1engines:                                                     
         engines=['BAAI/bge-multilingual-gemma2']                               
INFO     2024-11-13 00:08:17,117 infinity_emb INFO: Anonymized   telemetry.py:30
         telemetry can be disabled via environment variable                     
         `DO_NOT_TRACK=1`.                                                      
INFO     2024-11-13 00:08:17,124 infinity_emb INFO:           select_model.py:64
         model=`BAAI/bge-multilingual-gemma2` selected, using                   
         engine=`torch` and device=`cuda`                                       
INFO     2024-11-13 00:08:17,241                      SentenceTransformer.py:216
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         BAAI/bge-multilingual-gemma2                                           
INFO     2024-11-13 00:08:26,938                      SentenceTransformer.py:355
         sentence_transformers.SentenceTransformer                              
         INFO: 1 prompts are loaded, with the keys:                             
         ['web_search_query']                                                   
INFO     2024-11-13 00:08:29,054 infinity_emb INFO: Getting   select_model.py:97
         timings for batch_size=4 and avg tokens per                            
         sentence=2                                                             
                 0.49     ms tokenization                                       
                 36.43    ms inference                                          
                 0.09     ms post-processing                                    
                 37.01    ms total                                              
         embeddings/sec: 108.08                                                 
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment