Readme, add infinity deployment documentation

#21
by michaelfeil - opened

This PR adds a short example for how to deploy the model via https://github.com/michaelfeil/infinity.

 docker run --gpus "0" -p "7997":"7997" michaelf34/infinity:0.0.68-trt-onnx v2 --model-id Alibaba-NLP/gte-Qwen2-1.5B-instruct --revision "refs/pr/20" --dtype float16 --batch-size 16 -
-device cuda --engine torch --port 7997
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2024-11-12 05:34:57,116 infinity_emb INFO:        infinity_server.py:89
         Creating 1engines:                                                     
         engines=['Alibaba-NLP/gte-Qwen2-1.5B-instruct']                        
INFO     2024-11-12 05:34:57,120 infinity_emb INFO: Anonymized   telemetry.py:30
         telemetry can be disabled via environment variable                     
         `DO_NOT_TRACK=1`.                                                      
INFO     2024-11-12 05:34:57,127 infinity_emb INFO:           select_model.py:64
         model=`Alibaba-NLP/gte-Qwen2-1.5B-instruct`                            
         selected, using engine=`torch` and device=`cuda`                       
INFO     2024-11-12 05:34:57,322                      SentenceTransformer.py:216
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         Alibaba-NLP/gte-Qwen2-1.5B-instruct                                    


INFO     2024-11-12 05:38:59,420                      SentenceTransformer.py:355
         sentence_transformers.SentenceTransformer                              
         INFO: 1 prompts are loaded, with the keys:                             
         ['query']                                                              
INFO     2024-11-12 05:38:59,790 infinity_emb INFO: Adding    acceleration.py:56
         optimizations via Huggingface optimum.                                 
The class `optimum.bettertransformers.transformation.BetterTransformer` is deprecated and will be removed in a future release.
WARNING  2024-11-12 05:38:59,792 infinity_emb WARNING:        acceleration.py:67
         BetterTransformer is not available for model: <class                   
         'transformers_modules.Alibaba-NLP.gte-Qwen2-1.5B-ins                   
         truct.2e8a2b8d43dcd68042d6f2bf7670086f90055a67.model                   
         ing_qwen.Qwen2Model'> Continue without                                 
         bettertransformer modeling code.                                       
INFO     2024-11-12 05:39:00,890 infinity_emb INFO: Getting   select_model.py:97
         timings for batch_size=16 and avg tokens per                           
         sentence=2                                                             
                 2.11     ms tokenization                                       
                 18.11    ms inference                                          
                 0.09     ms post-processing                                    
                 20.30    ms total                                              
         embeddings/sec: 788.19                                                 
INFO     2024-11-12 05:39:01,364 infinity_emb INFO: Getting  select_model.py:103
         timings for batch_size=16 and avg tokens per                           
         sentence=513                                                           
                 9.03     ms tokenization                                       
                 215.76   ms inference                                          
                 0.24     ms post-processing                                    
                 225.03   ms total                                              
         embeddings/sec: 71.10                                                  
INFO     2024-11-12 05:39:01,367 infinity_emb INFO: model    select_model.py:104
         warmed up, between 71.10-788.19 embeddings/sec at                      
         batch_size=16                                                          
INFO     2024-11-12 05:39:01,368 infinity_emb INFO:         batch_handler.py:386
         creating batching engine                                               
INFO     2024-11-12 05:39:01,370 infinity_emb INFO: ready   batch_handler.py:453
         to batch requests.                                                     
INFO     2024-11-12 05:39:01,373 infinity_emb INFO:       infinity_server.py:104
                                                                                
         ♾️  Infinity - Embedding Inference Server                               
         MIT License; Copyright (c) 2023-now Michael Feil                       
         Version 0.0.68                                                         
                                                                                
         Open the Docs via Swagger UI:                                          
         http://0.0.0.0:7997/docs                                               
                                                                                
         Access all deployed models via 'GET':                                  
         curl http://0.0.0.0:7997/models                                        
                                                                                
         Visit the docs for more information:                                   
         https://michaelfeil.github.io/infinity                                 
                                                                                
                                                                                
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)
michaelfeil changed pull request title from Update README.md to Readme, add infinity deployment documentation
thenlper changed pull request status to merged
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment