1-800-BAD-CODE
/

sentence_boundary_detection_multilang

sentence boundary detection

token classification

Model card Files Files and versions Community

1-800-BAD-CODE commited on Mar 5, 2023

Commit

8dcef59

•

1 Parent(s): b21595d

Update README.md

Files changed (1) hide show

README.md +0 -6

README.md CHANGED Viewed

@@ -70,12 +70,6 @@ Therefore, language tags do not need to be used and a single batch can contain m
 # Model Inputs and Outputs
 The model inputs should be **punctuated** texts.
-The inputs should be packed into a batch with shape `[B, T]` , with padding being the SPE model's `<pad>` token ID.
-The `<pad>` ID is required to generate a proper attention mask.
-The model was trained on a maximum sequence length of 256 (subwords), and may crash or perform poorly if a longer batch is processed.
-Optimal handling of longer sequences would require some inference-time logic (wrapping/overlapping inputs and re-combining outputs).
 For each input subword `t`, this model predicts the probability that `t` is the final token of a sentence (i.e., a sentence boundary).
 # Example Usage

 # Model Inputs and Outputs
 The model inputs should be **punctuated** texts.
 For each input subword `t`, this model predicts the probability that `t` is the final token of a sentence (i.e., a sentence boundary).
 # Example Usage