oliverguhr
/

fullstop-punctuation-multilingual-base

@@ -5,9 +5,11 @@ language:
 - fr
 - it
 - nl
 tags:
 - punctuation prediction
 - punctuation
 datasets: wmt/europarl
 license: mit
 widget:
@@ -18,14 +20,103 @@ widget:
 - text: "Ist das eine Frage Frau Müller"
   example_title: "German"
 - text: "My name is Clara and I live in Berkeley California"
-  example_title: "English"
 metrics:
 - f1
 ---
-# Work in progress
-## Classification report over all languages
 ```
              precision    recall  f1-score   support
@@ -39,4 +130,94 @@ metrics:
     accuracy                           0.98  54504270
    macro avg       0.83      0.75      0.78  54504270
 weighted avg       0.98      0.98      0.98  54504270
-```

 - fr
 - it
 - nl
 tags:
 - punctuation prediction
 - punctuation
 datasets: wmt/europarl
 license: mit
 widget:
 - text: "Ist das eine Frage Frau Müller"
   example_title: "German"
 - text: "My name is Clara and I live in Berkeley California"
+  example_title: "English"
 metrics:
 - f1
 ---
+# Model Card for fullstop-punctuation-multilingual-base
+# Model Details
+## Model Description
+The goal of this task consists in training NLP models that can predict the end of sentence (EOS) and punctuation marks on automatically generated or transcribed texts.
+- **Developed by:** Oliver Guhr
+- **Shared by [Optional]:** Oliver Guhr
+- **Model type:** Token Classification
+- **Language(s) (NLP):** English, German, French, Italian, Dutch
+- **License:** MIT
+- **Parent Model:** xlm-roberta-base
+- **Resources for more information:**
+    - [GitHub Repo](https://github.com/oliverguhr/fullstop-deep-punctuation-prediction)
+   - [Associated Paper](https://www.researchgate.net/profile/Oliver-Guhr/publication/355038679_FullStop_Multilingual_Deep_Models_for_Punctuation_Prediction/links/615a0ce3a6fae644fbd08724/FullStop-Multilingual-Deep-Models-for-Punctuation-Prediction.pdf)
+# Uses
+## Direct Use
+This model can be used for the task of Token Classification
+## Downstream Use [Optional]
+More information needed.
+## Out-of-Scope Use
+The model should not be used to intentionally create hostile or alienating environments for people.
+# Bias, Risks, and Limitations
+Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
+## Recommendations
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+# Training Details
+## Training Data
+The model authors note in the [associated paper](https://www.researchgate.net/profile/Oliver-Guhr/publication/355038679_FullStop_Multilingual_Deep_Models_for_Punctuation_Prediction/links/615a0ce3a6fae644fbd08724/FullStop-Multilingual-Deep-Models-for-Punctuation-Prediction.pdf):
+> The task consists in predicting EOS and punctua- tion marks on unpunctuated lowercased text. The organizers of the SeppNLG shared task provided 470 MB of English, German, French, and Italian text. This data set consists of a training and a de- velopment set.
+## Training Procedure
+### Preprocessing
+More information needed
+### Speeds, Sizes, Times
+More information needed
+# Evaluation
+## Testing Data, Factors & Metrics
+### Testing Data
+More information needed
+### Factors
+More information needed
+### Metrics
+More information needed
+## Results
+### Classification report over all languages
 ```
              precision    recall  f1-score   support
     accuracy                           0.98  54504270
    macro avg       0.83      0.75      0.78  54504270
 weighted avg       0.98      0.98      0.98  54504270
+```
+# Model Examination
+More information needed
+# Environmental Impact
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** More information needed
+- **Hours used:** More information needed
+- **Cloud Provider:** More information needed
+- **Compute Region:** More information needed
+- **Carbon Emitted:** More information needed
+# Technical Specifications [optional]
+## Model Architecture and Objective
+More information needed
+## Compute Infrastructure
+More information needed
+### Hardware
+More information needed
+### Software
+More information needed.
+# Citation
+**BibTeX:**
+```bibtex
+@article{guhr-EtAl:2021:fullstop,
+  title={FullStop: Multilingual Deep Models for Punctuation Prediction},
+  author    = {Guhr, Oliver  and  Schumann, Anne-Kathrin  and  Bahrmann, Frank  and  Böhme, Hans Joachim},
+  booktitle      = {Proceedings of the Swiss Text Analytics Conference 2021},
+  month          = {June},
+  year           = {2021},
+  address        = {Winterthur, Switzerland},
+  publisher      = {CEUR Workshop Proceedings},
+  url       = {http://ceur-ws.org/Vol-2957/sepp_paper4.pdf}
+}
+```
+# Glossary [optional]
+More information needed
+# More Information [optional]
+More information needed
+# Model Card Authors [optional]
+ Oliver Guhr in collaboration with Ezi Ozoani and the Hugging Face team
+# Model Card Contact
+More information needed
+# How to Get Started with the Model
+Use the code below to get started with the model.
+<details>
+<summary> Click to expand </summary>
+```python
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
+tokenizer = AutoTokenizer.from_pretrained("oliverguhr/fullstop-punctuation-multilingual-base")
+model = AutoModelForTokenClassification.from_pretrained("oliverguhr/fullstop-punctuation-multilingual-base")
+ ```
+</details>