Commit
·
4cf0f1e
1
Parent(s):
392d071
Add model
Browse files- README.md +28 -0
- config.json +31 -0
- model.safetensors +3 -0
- vocab.json +0 -0
README.md
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- text-classification
|
4 |
+
- language-identification
|
5 |
+
inference: false
|
6 |
+
license: cc-by-sa-3.0
|
7 |
+
language: multilingual
|
8 |
+
library_name: staticvectors
|
9 |
+
base_model:
|
10 |
+
- NeuML/language-id
|
11 |
+
---
|
12 |
+
|
13 |
+
# Language Detection with StaticVectors
|
14 |
+
|
15 |
+
This model is an export of this [FastText Language Identification model](https://fasttext.cc/docs/en/language-identification.html) for [`staticvectors`](https://github.com/neuml/staticvectors). `staticvectors` enables running inference Python with NumPy, helping it maintain solid runtime performance.
|
16 |
+
|
17 |
+
Language detection is an important task and identification with n-gram models is an efficient and highly accurate way to do it.
|
18 |
+
|
19 |
+
_This model is a quantized version of the [base language id model](https://hf.co/neuml/language-id). It's using 2x256 Product Quantization like the original quantized model from FastText. This shrinks this model down to 4MB with only a minor hit on accuracy._
|
20 |
+
|
21 |
+
## Usage with StaticVectors
|
22 |
+
|
23 |
+
```python
|
24 |
+
from staticvectors import StaticVectors
|
25 |
+
|
26 |
+
model = StaticVectors("NeuML/language-id-quantized")
|
27 |
+
model.predict(["What language is this text?"])
|
28 |
+
```
|
config.json
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "staticvectors",
|
3 |
+
"format": "fasttext",
|
4 |
+
"source": "lid.176.bin",
|
5 |
+
"lr": 0.05,
|
6 |
+
"dim": 16,
|
7 |
+
"ws": 5,
|
8 |
+
"epoch": 10,
|
9 |
+
"min_count": 1000,
|
10 |
+
"min_count_label": 0,
|
11 |
+
"neg": 5,
|
12 |
+
"word_ngrams": 1,
|
13 |
+
"loss": "hs",
|
14 |
+
"model": "supervised",
|
15 |
+
"bucket": 2000000,
|
16 |
+
"minn": 2,
|
17 |
+
"maxn": 4,
|
18 |
+
"thread": 12,
|
19 |
+
"lr_update_rate": 100,
|
20 |
+
"t": 0.0001,
|
21 |
+
"label": "__label__",
|
22 |
+
"verbose": 2,
|
23 |
+
"pretrained_vectors": "",
|
24 |
+
"save_output": false,
|
25 |
+
"seed": 0,
|
26 |
+
"qout": false,
|
27 |
+
"retrain": false,
|
28 |
+
"qnorm": false,
|
29 |
+
"cutoff": 0,
|
30 |
+
"dsub": 2
|
31 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a7a96c90618fcb1e2e6f5364f4a620bf2cd87a3f0d437d685c8c49eada1dc151
|
3 |
+
size 4107972
|
vocab.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|