Add model

Files changed (4) hide show

README.md ADDED Viewed

+---
+tags:
+  - text-classification
+  - language-identification
+inference: false
+license: cc-by-sa-3.0
+language: multilingual
+library_name: staticvectors
+base_model:
+  - NeuML/language-id
+---
+# Language Detection with StaticVectors
+This model is an export of this [FastText Language Identification model](https://fasttext.cc/docs/en/language-identification.html) for [`staticvectors`](https://github.com/neuml/staticvectors). `staticvectors` enables running inference Python with NumPy, helping it maintain solid runtime performance.
+Language detection is an important task and identification with n-gram models is an efficient and highly accurate way to do it.
+_This model is a quantized version of the [base language id model](https://hf.co/neuml/language-id). It's using 2x256 Product Quantization like the original quantized model from FastText. This shrinks this model down to 4MB with only a minor hit on accuracy._
+## Usage with StaticVectors
+```python
+from staticvectors import StaticVectors
+model = StaticVectors("NeuML/language-id-quantized")
+model.predict(["What language is this text?"])
+```

config.json ADDED Viewed

+{
+  "model_type": "staticvectors",
+  "format": "fasttext",
+  "source": "lid.176.bin",
+  "lr": 0.05,
+  "dim": 16,
+  "ws": 5,
+  "epoch": 10,
+  "min_count": 1000,
+  "min_count_label": 0,
+  "neg": 5,
+  "word_ngrams": 1,
+  "loss": "hs",
+  "model": "supervised",
+  "bucket": 2000000,
+  "minn": 2,
+  "maxn": 4,
+  "thread": 12,
+  "lr_update_rate": 100,
+  "t": 0.0001,
+  "label": "__label__",
+  "verbose": 2,
+  "pretrained_vectors": "",
+  "save_output": false,
+  "seed": 0,
+  "qout": false,
+  "retrain": false,
+  "qnorm": false,
+  "cutoff": 0,
+  "dsub": 2
+}

model.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:a7a96c90618fcb1e2e6f5364f4a620bf2cd87a3f0d437d685c8c49eada1dc151
+size 4107972

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff