Spaces:
Runtime error
Runtime error
Benjamin Consolvo
commited on
Commit
·
45cd238
1
Parent(s):
4c1f95a
distilbert model instead
Browse files- app.py +6 -4
- requirements.txt +2 -1
app.py
CHANGED
@@ -2,9 +2,11 @@ import gradio as gr
|
|
2 |
from transformers import pipeline
|
3 |
import time
|
4 |
|
5 |
-
sparse_qa_pipeline = pipeline(task="question-answering",model="Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa")
|
|
|
6 |
|
7 |
-
dense_qa_pipeline = pipeline(task="question-answering",model="csarron/bert-base-uncased-squad-v1")
|
|
|
8 |
|
9 |
def greet(name):
|
10 |
return "Hello " + name + "!!"
|
@@ -34,11 +36,11 @@ def predict(context,question):
|
|
34 |
return sparse_answer,sparse_duration,dense_answer,dense_duration
|
35 |
|
36 |
md = """
|
37 |
-
If you came looking for chatGPT, sorry to disappoint, but this is different. This prediction model is designed to answer a question about a text. It is designed to do reading comprehension. The model does not just answer questions in general -- it only works from the text that you provide. However,
|
38 |
|
39 |
The model is based on the Zafrir et al. (2021) paper: [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754). The model can be found here: https://huggingface.co/Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa. The main idea of this BERT-Base model is that it is much more fast and efficient in deployment than its dense counterpart: (https://huggingface.co/csarron/bert-base-uncased-squad-v1). It has had weight pruning and model distillation applied to create a sparse weight pattern that is maintained even after fine-tuning has been applied. According to Zafrir et al. (2016), their "results show the best compression-to-accuracy ratio for BERT-Base". This model is still in FP32, but can be quantized to INT8 with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) for further compression.
|
40 |
|
41 |
-
The training dataset used is the English Wikipedia dataset (2500M words), and then fine-tuned on the SQuADv1.1 dataset containing 89K training examples by Rajpurkar et al. (2016): [100, 000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250).
|
42 |
|
43 |
Author of Hugging Face Space: Benjamin Consolvo, AI Solutions Engineer Manager at Intel | Date last updated: 01/05/2023
|
44 |
"""
|
|
|
2 |
from transformers import pipeline
|
3 |
import time
|
4 |
|
5 |
+
# sparse_qa_pipeline = pipeline(task="question-answering",model="Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa")
|
6 |
+
sparse_qa_pipeline = pipeline(task="question-answering",model="Intel/distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa-int8")
|
7 |
|
8 |
+
# dense_qa_pipeline = pipeline(task="question-answering",model="csarron/bert-base-uncased-squad-v1")
|
9 |
+
dense_qa_pipeline = pipeline(task="question-answering",model="distilbert-base-uncased-distilled-squad")
|
10 |
|
11 |
def greet(name):
|
12 |
return "Hello " + name + "!!"
|
|
|
36 |
return sparse_answer,sparse_duration,dense_answer,dense_duration
|
37 |
|
38 |
md = """
|
39 |
+
If you came looking for chatGPT, sorry to disappoint, but this is different. This prediction model is designed to answer a question about a given input text. It is designed to do reading comprehension. The model does not just answer questions in general -- it only works from the text that you provide. However, automated reading comprehension can be a valuable task.
|
40 |
|
41 |
The model is based on the Zafrir et al. (2021) paper: [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754). The model can be found here: https://huggingface.co/Intel/bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa. The main idea of this BERT-Base model is that it is much more fast and efficient in deployment than its dense counterpart: (https://huggingface.co/csarron/bert-base-uncased-squad-v1). It has had weight pruning and model distillation applied to create a sparse weight pattern that is maintained even after fine-tuning has been applied. According to Zafrir et al. (2016), their "results show the best compression-to-accuracy ratio for BERT-Base". This model is still in FP32, but can be quantized to INT8 with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) for further compression.
|
42 |
|
43 |
+
The training dataset used is the English Wikipedia dataset (2500M words), and then fine-tuned on the SQuADv1.1 dataset containing 89K training examples, compiled by Rajpurkar et al. (2016): [100, 000+ Questions for Machine Comprehension of Text](https://arxiv.org/abs/1606.05250).
|
44 |
|
45 |
Author of Hugging Face Space: Benjamin Consolvo, AI Solutions Engineer Manager at Intel | Date last updated: 01/05/2023
|
46 |
"""
|
requirements.txt
CHANGED
@@ -1,3 +1,4 @@
|
|
1 |
transformers
|
2 |
torch
|
3 |
-
tensorflow
|
|
|
|
1 |
transformers
|
2 |
torch
|
3 |
+
tensorflow
|
4 |
+
neural_compressor
|