Spaces:

Intel
/

powered_by_intel_llm_leaderboard

Running

App Files Files Community

eduardo-alvarez commited on Mar 15

Commit

8d9ad4b

•

1 Parent(s): 46f3e87

updating deployment tips

Browse files

Files changed (3) hide show

app.py +3 -3
info/deployment.py +48 -108
info/programs.py +0 -6

app.py CHANGED Viewed

@@ -27,9 +27,9 @@ from info.about import(
     ABOUT)
 from src.processing import filter_benchmarks_table
-inference_endpoint_url = os.environ['inference_endpoint_url']
-submission_form_endpoint_url = os.environ['submission_form_endpoint_url']
-inference_concurrency_limit = os.environ['inference_concurrency_limit']
 demo = gr.Blocks()

     ABOUT)
 from src.processing import filter_benchmarks_table
+#inference_endpoint_url = os.environ['inference_endpoint_url']
+#submission_form_endpoint_url = os.environ['submission_form_endpoint_url']
+#inference_concurrency_limit = os.environ['inference_concurrency_limit']
 demo = gr.Blocks()

info/deployment.py CHANGED Viewed

@@ -19,31 +19,15 @@ helps you choose the best option for your specific use case. Happy building!
     <th>Arc GPU</th>
     <th>Core Ultra</th>
   </tr>
-  <tr>
-    <td>Optimum Habana</td>
-    <td>🚀</td>
-    <td></td>
-    <td></td>
-    <td></td>
-    <td></td>
   </tr>
-  <tr>
-    <td>Intel Extension for PyTorch</td>
-    <td></td>
-    <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
-    <td></td>
-  </tr>
-  <tr>
-    <td>Intel Extension for Transformers</td>
-    <td></td>
     <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
-    <td></td>
   </tr>
-  <tr>
     <td>OpenVINO</td>
     <td></td>
     <td>🚀</td>
@@ -52,53 +36,20 @@ helps you choose the best option for your specific use case. Happy building!
     <td>🚀</td>
   </tr>
   <tr>
-    <td>BigDL</td>
-    <td></td>
-    <td>🚀</td>
-    <td>🚀</td>
-    <td>🚀</td>
     <td>🚀</td>
-  </tr>
-    <tr>
-    <td>NPU Acceleration Library</td>
-    <td></td>
-    <td></td>
-    <td></td>
-    <td></td>
     <td>🚀</td>
-  </tr>
-</tr>
-    <tr>
-    <td>PyTorch</td>
     <td>🚀</td>
     <td>🚀</td>
-    <td></td>
-    <td></td>
     <td>🚀</td>
   </tr>
-</tr>
-    <tr>
-    <td>Tensorflow</td>
-    <td>🚀</td>
-    <td>🚀</td>
-    <td></td>
-    <td></td>
-    <td>🚀</td>
-</tr>
 </table>
 </div>
 <hr>
 # Intel® Gaudi® Accelerators
-The Intel Gaudi 2 accelerator is Intel's most capable deep learning chip. You can learn about Gaudi 2 [here](https://habana.ai/products/gaudi2/).
-Intel Gaudi Software supports PyTorch and DeepSpeed for accelerating LLM training and inference.
-The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
-(e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
-and graph-level optimizations).
-Optimum Habana provides covenient functionality for various tasks. Below is a command line snippet to run inference on Gaudi with meta-llama/Llama-2-7b-hf.
 👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
@@ -118,40 +69,7 @@ python run_generation.py \
 <hr>
-# Intel® Max Series GPU
-The Intel® Data Center GPU Max Series is Intel's highest performing, highest density, general-purpose discrete GPU, which packs over 100 billion transistors into one package and contains up to 128 Xe Cores--Intel's foundational GPU compute building block. You can learn more about this GPU [here](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html).
-### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for Python
-Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU.
-👍 [Intel Extension for Transformers GitHub](https://github.com/intel/intel-extension-for-transformers)
-Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
-👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
-```python
-import intel_extension_for_pytorch as ipex
-from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
-from transformers import AutoTokenizer
-device_map = "xpu"
-model_name ="Qwen/Qwen-7B"
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-prompt = "When winter becomes spring, the flowers..."
-inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device_map)
-model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True,
-                                              device_map=device_map, load_in_4bit=True)
-model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, woq=True, device=device_map)
-output = model.generate(inputs)
-```
-<hr>
 # Intel® Xeon® CPUs
-The Intel® Xeon® CPUs have the most built-in accelerators of any CPU on the market, including Advanced Matrix Extensions (AMX) to accelerate matrix multiplication in deep learning training and inference. Learn more about the Xeon CPUs [here](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html).
 ### Optimum Intel and Intel Extension for PyTorch (no quantization)
 🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
@@ -205,12 +123,53 @@ outputs = model.generate(inputs)
 <hr>
 # Intel® Core Ultra (NPUs and iGPUs)
-Intel® Core™ Ultra Processors are optimized for premium thin and powerful laptops, featuring 3D performance hybrid architecture, advanced AI capabilities, and available with built-in Intel® Arc™ GPU. Learn more about Intel Core Ultra [here](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html). For now, there is support for smaller models like [TinyLama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0).
-### Intel® NPU Acceleration Library
-The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.
 👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
 ```python
@@ -244,25 +203,6 @@ print("Run inference")
 _ = model.generate(**generation_kwargs)
 ```
-### OpenVINO Tooling with Optimum Intel
-OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
-👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
-```python
-from optimum.intel import OVModelForCausalLM
-from transformers import AutoTokenizer, pipeline
-model_id = "helenai/gpt2-ov"
-model = OVModelForCausalLM.from_pretrained(model_id)
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
-pipe("In the spring, beautiful flowers bloom...")
-```
 <hr>
 # Intel® Arc GPUs

     <th>Arc GPU</th>
     <th>Core Ultra</th>
   </tr>
   </tr>
+    <td>PyTorch</td>
     <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
   </tr>
+    <tr>
     <td>OpenVINO</td>
     <td></td>
     <td>🚀</td>
     <td>🚀</td>
   </tr>
   <tr>
+    <td>Hugging Face</td>
     <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
     <td>🚀</td>
   </tr>
 </table>
 </div>
 <hr>
 # Intel® Gaudi® Accelerators
+Gaudi is Intel's most capable deep learning chip. You can learn about Gaudi [here](https://habana.ai/products/gaudi2/).
 👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
 <hr>
 # Intel® Xeon® CPUs
 ### Optimum Intel and Intel Extension for PyTorch (no quantization)
 🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
 <hr>
+# Intel® Max Series GPU
+### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for PyTorch
+👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
+```python
+import intel_extension_for_pytorch as ipex
+from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
+from transformers import AutoTokenizer
+device_map = "xpu"
+model_name ="Qwen/Qwen-7B"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+prompt = "When winter becomes spring, the flowers..."
+inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device_map)
+model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True,
+                                              device_map=device_map, load_in_4bit=True)
+model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, woq=True, device=device_map)
+output = model.generate(inputs)
+```
+<hr>
 # Intel® Core Ultra (NPUs and iGPUs)
+### OpenVINO Tooling with Optimum Intel
+👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
+```python
+from optimum.intel import OVModelForCausalLM
+from transformers import AutoTokenizer, pipeline
+model_id = "helenai/gpt2-ov"
+model = OVModelForCausalLM.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
+pipe("In the spring, beautiful flowers bloom...")
+```
+### Intel® NPU Acceleration Library
 👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
 ```python
 _ = model.generate(**generation_kwargs)
 ```
 <hr>
 # Intel® Arc GPUs

info/programs.py CHANGED Viewed

@@ -41,10 +41,4 @@ others in the community and within Intel
 Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/community/innovators/oneapi-innovator.html
-<hr>
-## Intel DevHub Discord
-Join 5000+ developers on the [Intel DevHub Discord](https://discord.gg/yNYNxK2k) to get support with your submission and talk about everything from GenAI, HPC, to Quantum Computing.
 """


41
42	Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/community/innovators/oneapi-innovator.html
43






44	"""