davidmezzetti commited on
Commit
4cf0f1e
·
1 Parent(s): 392d071
Files changed (4) hide show
  1. README.md +28 -0
  2. config.json +31 -0
  3. model.safetensors +3 -0
  4. vocab.json +0 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-classification
4
+ - language-identification
5
+ inference: false
6
+ license: cc-by-sa-3.0
7
+ language: multilingual
8
+ library_name: staticvectors
9
+ base_model:
10
+ - NeuML/language-id
11
+ ---
12
+
13
+ # Language Detection with StaticVectors
14
+
15
+ This model is an export of this [FastText Language Identification model](https://fasttext.cc/docs/en/language-identification.html) for [`staticvectors`](https://github.com/neuml/staticvectors). `staticvectors` enables running inference Python with NumPy, helping it maintain solid runtime performance.
16
+
17
+ Language detection is an important task and identification with n-gram models is an efficient and highly accurate way to do it.
18
+
19
+ _This model is a quantized version of the [base language id model](https://hf.co/neuml/language-id). It's using 2x256 Product Quantization like the original quantized model from FastText. This shrinks this model down to 4MB with only a minor hit on accuracy._
20
+
21
+ ## Usage with StaticVectors
22
+
23
+ ```python
24
+ from staticvectors import StaticVectors
25
+
26
+ model = StaticVectors("NeuML/language-id-quantized")
27
+ model.predict(["What language is this text?"])
28
+ ```
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "staticvectors",
3
+ "format": "fasttext",
4
+ "source": "lid.176.bin",
5
+ "lr": 0.05,
6
+ "dim": 16,
7
+ "ws": 5,
8
+ "epoch": 10,
9
+ "min_count": 1000,
10
+ "min_count_label": 0,
11
+ "neg": 5,
12
+ "word_ngrams": 1,
13
+ "loss": "hs",
14
+ "model": "supervised",
15
+ "bucket": 2000000,
16
+ "minn": 2,
17
+ "maxn": 4,
18
+ "thread": 12,
19
+ "lr_update_rate": 100,
20
+ "t": 0.0001,
21
+ "label": "__label__",
22
+ "verbose": 2,
23
+ "pretrained_vectors": "",
24
+ "save_output": false,
25
+ "seed": 0,
26
+ "qout": false,
27
+ "retrain": false,
28
+ "qnorm": false,
29
+ "cutoff": 0,
30
+ "dsub": 2
31
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7a96c90618fcb1e2e6f5364f4a620bf2cd87a3f0d437d685c8c49eada1dc151
3
+ size 4107972
vocab.json ADDED
The diff for this file is too large to render. See raw diff