1-800-BAD-CODE commited on
Commit
8a502c1
1 Parent(s): f25200c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -42,7 +42,7 @@ Therefore, language tags do not need to be used and a single batch can contain m
42
  ## Architecture
43
  This is a data-driven approach to SBD. The model uses a `SentencePiece` tokenizer, a BERT-style encoder, and a linear classifier.
44
 
45
- Given that this is a relatively-easy NLP task, the model contains ~5M parameters (~4M of which are embeddings).
46
  This makes the model very fast and cheap at inference time, as SBD should be.
47
 
48
  The BERT encoder is based on the following configuration:
@@ -119,7 +119,8 @@ For each input subword `t`, this model predicts the probability that `t` is the
119
 
120
  This model has been exported to `ONNX` (opset 17) alongside the associated `SentencePiece` tokenizer.
121
 
122
- The predictions are applied to the input by separating the token sequence where the predicted value exceeds a threshold for sentence boundary classification.
 
123
 
124
  This model can be run directly with a couple of dependencies which most developers may already have installed.
125
 
@@ -129,7 +130,7 @@ The following snippet will install the dependencies, clone this repo, and run an
129
  $ pip install sentencepiece onnxruntime
130
  $ git clone https://huggingface.co/1-800-BAD-CODE/sentence_boundary_detection_multilang
131
  $ cd sentence_boundary_detection_multilang
132
- # Verify the content before running file
133
  # $ python run_example.py
134
  ```
135
 
 
42
  ## Architecture
43
  This is a data-driven approach to SBD. The model uses a `SentencePiece` tokenizer, a BERT-style encoder, and a linear classifier.
44
 
45
+ Given that this is a relatively-easy NLP task, the model contains \~5M parameters (\~4M of which are embeddings).
46
  This makes the model very fast and cheap at inference time, as SBD should be.
47
 
48
  The BERT encoder is based on the following configuration:
 
119
 
120
  This model has been exported to `ONNX` (opset 17) alongside the associated `SentencePiece` tokenizer.
121
 
122
+ This model runs with a script after checking out this repo; if there is any interest in it running in the HF API, let me know.
123
+ For now, I assume no one cares.
124
 
125
  This model can be run directly with a couple of dependencies which most developers may already have installed.
126
 
 
130
  $ pip install sentencepiece onnxruntime
131
  $ git clone https://huggingface.co/1-800-BAD-CODE/sentence_boundary_detection_multilang
132
  $ cd sentence_boundary_detection_multilang
133
+ # Inspect the content before running an arbitrary file
134
  # $ python run_example.py
135
  ```
136