1-800-BAD-CODE
commited on
Commit
•
8a502c1
1
Parent(s):
f25200c
Update README.md
Browse files
README.md
CHANGED
@@ -42,7 +42,7 @@ Therefore, language tags do not need to be used and a single batch can contain m
|
|
42 |
## Architecture
|
43 |
This is a data-driven approach to SBD. The model uses a `SentencePiece` tokenizer, a BERT-style encoder, and a linear classifier.
|
44 |
|
45 |
-
Given that this is a relatively-easy NLP task, the model contains
|
46 |
This makes the model very fast and cheap at inference time, as SBD should be.
|
47 |
|
48 |
The BERT encoder is based on the following configuration:
|
@@ -119,7 +119,8 @@ For each input subword `t`, this model predicts the probability that `t` is the
|
|
119 |
|
120 |
This model has been exported to `ONNX` (opset 17) alongside the associated `SentencePiece` tokenizer.
|
121 |
|
122 |
-
|
|
|
123 |
|
124 |
This model can be run directly with a couple of dependencies which most developers may already have installed.
|
125 |
|
@@ -129,7 +130,7 @@ The following snippet will install the dependencies, clone this repo, and run an
|
|
129 |
$ pip install sentencepiece onnxruntime
|
130 |
$ git clone https://huggingface.co/1-800-BAD-CODE/sentence_boundary_detection_multilang
|
131 |
$ cd sentence_boundary_detection_multilang
|
132 |
-
#
|
133 |
# $ python run_example.py
|
134 |
```
|
135 |
|
|
|
42 |
## Architecture
|
43 |
This is a data-driven approach to SBD. The model uses a `SentencePiece` tokenizer, a BERT-style encoder, and a linear classifier.
|
44 |
|
45 |
+
Given that this is a relatively-easy NLP task, the model contains \~5M parameters (\~4M of which are embeddings).
|
46 |
This makes the model very fast and cheap at inference time, as SBD should be.
|
47 |
|
48 |
The BERT encoder is based on the following configuration:
|
|
|
119 |
|
120 |
This model has been exported to `ONNX` (opset 17) alongside the associated `SentencePiece` tokenizer.
|
121 |
|
122 |
+
This model runs with a script after checking out this repo; if there is any interest in it running in the HF API, let me know.
|
123 |
+
For now, I assume no one cares.
|
124 |
|
125 |
This model can be run directly with a couple of dependencies which most developers may already have installed.
|
126 |
|
|
|
130 |
$ pip install sentencepiece onnxruntime
|
131 |
$ git clone https://huggingface.co/1-800-BAD-CODE/sentence_boundary_detection_multilang
|
132 |
$ cd sentence_boundary_detection_multilang
|
133 |
+
# Inspect the content before running an arbitrary file
|
134 |
# $ python run_example.py
|
135 |
```
|
136 |
|