Spaces:
Runtime error
Runtime error
Pablo
commited on
Commit
•
ae788a5
1
Parent(s):
e951a81
Further format improvements
Browse files
app.py
CHANGED
@@ -52,12 +52,13 @@ st.sidebar.image(LOGO)
|
|
52 |
# Body
|
53 |
st.markdown(
|
54 |
"""
|
55 |
-
BERTIN is a series of BERT-based models for Spanish.
|
|
|
56 |
The models are trained with Flax and using TPUs sponsored by Google since this is part of the
|
57 |
[Flax/Jax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
|
58 |
organised by HuggingFace.
|
59 |
|
60 |
-
All models are variations of RoBERTa-base trained from scratch in Spanish using the mc4 dataset
|
61 |
We reduced the dataset size to 50 million documents to keep training times shorter, and also to be able to bias training examples based on their perplexity.
|
62 |
|
63 |
The idea is to favour examples with perplexities that are neither too small (short, repetitive texts) or too long (potentially poor quality).
|
|
|
52 |
# Body
|
53 |
st.markdown(
|
54 |
"""
|
55 |
+
BERTIN is a series of BERT-based models for Spanish.
|
56 |
+
|
57 |
The models are trained with Flax and using TPUs sponsored by Google since this is part of the
|
58 |
[Flax/Jax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
|
59 |
organised by HuggingFace.
|
60 |
|
61 |
+
All models are variations of **RoBERTa-base** trained from scratch in **Spanish** using the **mc4 dataset**.
|
62 |
We reduced the dataset size to 50 million documents to keep training times shorter, and also to be able to bias training examples based on their perplexity.
|
63 |
|
64 |
The idea is to favour examples with perplexities that are neither too small (short, repetitive texts) or too long (potentially poor quality).
|