alexmourachko
commited on
Commit
•
a0f0e17
1
Parent(s):
2209e43
update readme to match github
Browse files
README.md
CHANGED
@@ -3,26 +3,30 @@ license: cc-by-nc-4.0
|
|
3 |
---
|
4 |
|
5 |
# SONAR
|
6 |
-
[[Paper]]()
|
7 |
[[Demo]](#usage)
|
8 |
|
9 |
-
We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space
|
10 |
|
11 |
-
Speech segments can be embedded in the same
|
12 |
-
We also provide a **text decoder for 200 languages**, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations.
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
17 |
-
Model inference support thanks [Fairseq2](https://github.com/facebookresearch/fairseq2)
|
18 |
|
19 |
|
20 |
## Installing
|
21 |
-
|
22 |
-
|
|
|
|
|
|
|
|
|
23 |
|
24 |
## Usage
|
25 |
-
|
|
|
|
|
26 |
```python
|
27 |
from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
|
28 |
t2vec_model = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder",
|
@@ -32,7 +36,7 @@ t2vec_model.predict(sentences, source_lang="eng_Latn").shape
|
|
32 |
# torch.Size([2, 1024])
|
33 |
```
|
34 |
|
35 |
-
Translate with SONAR
|
36 |
```python
|
37 |
from sonar.inference_pipelines.text import TextToTextModelPipeline
|
38 |
t2t_model = TextToTextModelPipeline(encoder="text_sonar_basic_encoder",
|
@@ -44,50 +48,47 @@ t2t_model.predict(sentences, source_lang="eng_Latn", target_lang="fra_Latn")
|
|
44 |
# ['Mon nom est SONAR.', "Je peux intégrer les phrases dans l'espace vectoriel."]
|
45 |
```
|
46 |
|
47 |
-
Compute speech sentence embeddings
|
48 |
```python
|
49 |
-
import
|
50 |
-
|
51 |
-
|
52 |
-
speech_embedding_dp_builder = SpeechToEmbeddingPipeline.load_from_name("sonar_speech_encoder_eng")
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
|
61 |
-
|
62 |
-
|
63 |
-
speech_emb = next(iter(speech_embedding_dp))
|
64 |
-
speech_emb["audio"]["data"].sentence_embeddings
|
65 |
```
|
66 |
|
67 |
-
|
68 |
-
Speech-to-text with SONAR
|
69 |
```python
|
70 |
-
import
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
|
|
|
|
87 |
```
|
88 |
|
89 |
-
|
90 |
-
with BLASER
|
91 |
```Python
|
92 |
import torch
|
93 |
from sonar.models.blaser.loader import load_blaser_model
|
@@ -102,6 +103,7 @@ print(blaser_qe(src=emb, mt=emb).item()) # 4.9819
|
|
102 |
```
|
103 |
|
104 |
See more complete demo notebooks :
|
|
|
105 |
* [sonar text2text similarity and translation](examples/sonar_text_demo.ipynb)
|
106 |
* [sonar speech2text and other data pipeline examples](examples/inference_pipelines.ipynb)
|
107 |
|
|
|
3 |
---
|
4 |
|
5 |
# SONAR
|
6 |
+
[[Paper]](https://fb.workplace.com/groups/831302610278251/permalink/9713798772028546) (TODO: change for external link once published)
|
7 |
[[Demo]](#usage)
|
8 |
|
9 |
+
We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders. It substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks.
|
10 |
|
11 |
+
Speech segments can be embedded in the same SONAR embedding space using language-specific speech encoders trained in a teacher-student setting on speech transcription data. We also provide a single text decoder, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations.
|
|
|
12 |
|
13 |
+
*SONAR* stands for **S**entence-level multim**O**dal and la**N**guage-**A**gnostic **R**epresentations
|
14 |
|
15 |
+
The full list of supported languages (along with download links) can be found here [below](#supported-languages-and-download-links).
|
|
|
16 |
|
17 |
|
18 |
## Installing
|
19 |
+
SONAR depends mainly on [Fairseq2](https://github.com/fairinternal/fairseq2) and can be installed using (tested with `python=3.8`)
|
20 |
+
```bash
|
21 |
+
pip install --upgrade pip
|
22 |
+
pip config set global.extra-index-url https://test.pypi.org/simple/
|
23 |
+
pip install -e .
|
24 |
+
```
|
25 |
|
26 |
## Usage
|
27 |
+
fairseq2 will automatically download models into your `$TORCH_HOME/hub` directory upon using the commands below.
|
28 |
+
|
29 |
+
### Compute text sentence embeddings with SONAR:
|
30 |
```python
|
31 |
from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
|
32 |
t2vec_model = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder",
|
|
|
36 |
# torch.Size([2, 1024])
|
37 |
```
|
38 |
|
39 |
+
### Translate text with SONAR
|
40 |
```python
|
41 |
from sonar.inference_pipelines.text import TextToTextModelPipeline
|
42 |
t2t_model = TextToTextModelPipeline(encoder="text_sonar_basic_encoder",
|
|
|
48 |
# ['Mon nom est SONAR.', "Je peux intégrer les phrases dans l'espace vectoriel."]
|
49 |
```
|
50 |
|
51 |
+
### Compute speech sentence embeddings with SONAR
|
52 |
```python
|
53 |
+
from sonar.inference_pipelines.speech import SpeechToEmbeddingModelPipeline
|
54 |
+
s2vec_model = SpeechToEmbeddingModelPipeline(encoder="sonar_speech_encoder_eng")
|
|
|
|
|
55 |
|
56 |
+
s2vec_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
|
57 |
+
"./tests/integration_tests/data/audio_files/audio_2.wav"]).shape
|
58 |
+
# torch.Size([2, 1024])
|
59 |
+
import torchaudio
|
60 |
+
inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
|
61 |
+
assert sr == 16000, "Sample rate should be 16kHz"
|
62 |
|
63 |
+
s2vec_model.predict([inp]).shape
|
64 |
+
# torch.Size([1, 1024])
|
|
|
|
|
65 |
```
|
66 |
|
67 |
+
### Speech-to-text translation with SONAR
|
|
|
68 |
```python
|
69 |
+
from sonar.inference_pipelines.speech import SpeechToTextModelPipeline
|
70 |
+
|
71 |
+
s2t_model = SpeechToTextModelPipeline(encoder="sonar_speech_encoder_eng",
|
72 |
+
decoder="text_sonar_basic_decoder",
|
73 |
+
tokenizer="text_sonar_basic_decoder")
|
74 |
+
|
75 |
+
import torchaudio
|
76 |
+
inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
|
77 |
+
assert sr == 16000, "Sample rate should be 16kHz"
|
78 |
+
|
79 |
+
# passing loaded audio files
|
80 |
+
s2t_model.predict([inp], target_lang="eng_Latn")
|
81 |
+
# ['Television reports show white smoke coming from the plant.']
|
82 |
+
|
83 |
+
# passing multiple wav files
|
84 |
+
s2t_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
|
85 |
+
"./tests/integration_tests/data/audio_files/audio_2.wav"], target_lang="eng_Latn")
|
86 |
+
# ['Television reports show white smoke coming from the plant.',
|
87 |
+
# 'These couples may choose to make an adoption plan for their baby.']
|
88 |
```
|
89 |
|
90 |
+
|
91 |
+
### Predicting [cross-lingual semantic similarity](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/human_XSTS_eval) with BLASER 2 models
|
92 |
```Python
|
93 |
import torch
|
94 |
from sonar.models.blaser.loader import load_blaser_model
|
|
|
103 |
```
|
104 |
|
105 |
See more complete demo notebooks :
|
106 |
+
|
107 |
* [sonar text2text similarity and translation](examples/sonar_text_demo.ipynb)
|
108 |
* [sonar speech2text and other data pipeline examples](examples/inference_pipelines.ipynb)
|
109 |
|