update description in README.nd
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
tags:
|
4 |
- audio
|
5 |
- automatic-speech-recognition
|
@@ -9,69 +9,14 @@ inference: false
|
|
9 |
duplicated_from: philschmid/openai-whisper-endpoint
|
10 |
---
|
11 |
|
12 |
-
#
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
|
|
|
|
17 |
|
18 |
-
|
19 |
-
|
20 |
-
This repository implements a custom `handler` task for `automatic-speech-recognition` for 🤗 Inference Endpoints using OpenAIs new Whisper model. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/philschmid/openai-whisper-endpoint/blob/main/handler.py).
|
21 |
-
|
22 |
-
There is also a [notebook](https://huggingface.co/philschmid/openai-whisper-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`
|
23 |
-
|
24 |
-
### Request
|
25 |
-
|
26 |
-
The endpoint expects a binary audio file. Below is a cURL example and a Python example using the `requests` library.
|
27 |
-
|
28 |
-
**curl**
|
29 |
-
|
30 |
-
```bash
|
31 |
-
# load audio file
|
32 |
-
wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
|
33 |
-
|
34 |
-
# run request
|
35 |
-
curl --request POST \
|
36 |
-
--url https://{ENDPOINT}/ \
|
37 |
-
--header 'Content-Type: audio/x-flac' \
|
38 |
-
--header 'Authorization: Bearer {HF_TOKEN}' \
|
39 |
-
--data-binary '@sample1.flac'
|
40 |
-
```
|
41 |
-
|
42 |
-
**Python**
|
43 |
-
|
44 |
-
```python
|
45 |
-
import json
|
46 |
-
from typing import List
|
47 |
-
import requests as r
|
48 |
-
import base64
|
49 |
-
import mimetypes
|
50 |
-
|
51 |
-
ENDPOINT_URL=""
|
52 |
-
HF_TOKEN=""
|
53 |
-
|
54 |
-
def predict(path_to_audio:str=None):
|
55 |
-
# read audio file
|
56 |
-
with open(path_to_audio, "rb") as i:
|
57 |
-
b = i.read()
|
58 |
-
# get mimetype
|
59 |
-
content_type= mimetypes.guess_type(path_to_audio)[0]
|
60 |
-
|
61 |
-
headers= {
|
62 |
-
"Authorization": f"Bearer {HF_TOKEN}",
|
63 |
-
"Content-Type": content_type
|
64 |
-
}
|
65 |
-
response = r.post(ENDPOINT_URL, headers=headers, data=b)
|
66 |
-
return response.json()
|
67 |
-
|
68 |
-
prediction = predict(path_to_audio="sample1.flac")
|
69 |
-
|
70 |
-
prediction
|
71 |
-
|
72 |
-
```
|
73 |
-
expected output
|
74 |
|
75 |
-
|
76 |
-
{"text": " going along slushy country roads and speaking to damp audiences in draughty school rooms day after day for a fortnight. He'll have to put in an appearance at some place of worship on Sunday morning, and he can come to us immediately afterwards."}
|
77 |
-
```
|
|
|
1 |
---
|
2 |
+
license: gpl-3.0
|
3 |
tags:
|
4 |
- audio
|
5 |
- automatic-speech-recognition
|
|
|
9 |
duplicated_from: philschmid/openai-whisper-endpoint
|
10 |
---
|
11 |
|
12 |
+
# Video Search
|
13 |
|
14 |
+
This project contains 3 different models that can be used for searching videos.
|
15 |
|
16 |
+
1. Whisper to convert mp3 files to audio
|
17 |
+
2. BART Sentence Transformer to generate vector embeddings from text
|
18 |
+
3. BART LFQA to generate long form answers given a context
|
19 |
|
20 |
+
For more context, see: [Atlas: Find Anything on Youtube](https://atila.ca/blog/tomiwa/atlas)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
+
Inspired by [philschmid/openai-whisper-endpoint](https://huggingface.co/philschmid/openai-whisper-endpoint)
|
|
|
|