MarcBrun commited on
Commit
e77d274
1 Parent(s): 32e0b86

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - es
5
+ - eu
6
+ datasets:
7
+ - squad
8
+ widget:
9
+ - text: "When was Florence Nightingale born?"
10
+ context: "Florence Nightingale, known for being the founder of modern nursing, was born in Florence, Italy, in 1820."
11
+ example_title: "English"
12
+ - text: "¿Por qué provincias pasa el Tajo?"
13
+ context: "El Tajo es el río más largo de la península ibérica, a la que atraviesa en su parte central, siguiendo un rumbo este-oeste, con una leve inclinación hacia el suroeste, que se acentúa cuando llega a Portugal, donde recibe el nombre de Tejo.
14
+
15
+ Nace en los montes Universales, en la sierra de Albarracín, sobre la rama occidental del sistema Ibérico y, después de recorrer 1007 km, llega al océano Atlántico en la ciudad de Lisboa. En su desembocadura forma el estuario del mar de la Paja, en el que vierte un caudal medio de 456 m³/s. En sus primeros 816 km atraviesa España, donde discurre por cuatro comunidades autónomas (Aragón, Castilla-La Mancha, Madrid y Extremadura) y un total de seis provincias (Teruel, Guadalajara, Cuenca, Madrid, Toledo y Cáceres)."
16
+ example_title: "Español"
17
+ - text: "Zer beste izenak ditu Tartalo?"
18
+ context: "Tartalo euskal mitologiako izaki begibakar artzain erraldoia da. Tartalo izena zenbait euskal hizkeratan herskari-bustidurarekin ahoskatu ohi denez, horrelaxe ere idazten da batzuetan: Ttarttalo. Euskal Herriko zenbait tokitan, Torto edo Anxo ere esaten diote."
19
+ example_title: "Euskara"
20
+ ---
21
+
22
+ # ixambert-base-cased finetuned for QA
23
+
24
+ This is a basic implementation of the multilingual model ["ixambert-base-cased"](https://huggingface.co/ixa-ehu/ixambert-base-cased), fine-tuned on SQuAD v1.1 and an experimental version of SQuAD1.1 in Basque (1/3 size of original SQuAD1.1), that is able to answer basic factual questions in English, Spanish and Basque.
25
+
26
+ ## Overview
27
+
28
+ * **Language model:** ixambert-base-cased
29
+ * **Languages:** English, Spanish and Basque
30
+ * **Downstream task:** Extractive QA
31
+ * **Training data:** SQuAD v1.1 + experimental SQuAD1.1 in Basque
32
+ * **Eval data:** SQuAD v1.1 + experimental SQuAD1.1 in Basque
33
+ * **Infrastructure:** 1x GeForce RTX 2080
34
+
35
+ ## Outputs
36
+
37
+ The model outputs the answer to the question, the start and end positions of the answer in the original context, and a score for the probability for that span of text to be the correct answer. For example:
38
+
39
+ ```python
40
+ {'score': 0.9667195081710815, 'start': 101, 'end': 105, 'answer': '1820'}
41
+ ```
42
+
43
+ ## How to use
44
+
45
+ ```python
46
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
47
+
48
+ model_name = "MarcBrun/ixambert-finetuned-squad-eu-en"
49
+
50
+ # To get predictions
51
+ context = "Florence Nightingale, known for being the founder of modern nursing, was born in Florence, Italy, in 1820"
52
+ question = "When was Florence Nightingale born?"
53
+ qa = pipeline("question-answering", model=model_name, tokenizer=model_name)
54
+ pred = qa(question=question,context=context)
55
+
56
+ # To load the model and tokenizer
57
+ model = AutoModelForQuestionAnswering.from_pretrained(model_name)
58
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
59
+ ```
60
+
61
+ ## Hyperparameters
62
+
63
+ ```
64
+ batch_size = 8
65
+ n_epochs = 3
66
+ learning_rate = 2e-5
67
+ optimizer = AdamW
68
+ lr_schedule = linear
69
+ max_seq_len = 384
70
+ doc_stride = 128
71
+ ```