aehrm commited on
Commit
17a0df6
1 Parent(s): 24f7972

Update README

Browse files
Files changed (1) hide show
  1. README.md +31 -1
README.md CHANGED
@@ -2,6 +2,23 @@
2
  datasets:
3
  - aehrm/dtaec-lexica
4
  language: de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  # DTAEC Type Normalizer
@@ -31,7 +48,20 @@ model = AutoModelForSeq2SeqLM.from_pretrained('aehrm/dtaec-type-normalizer')
31
  model_in = tokenizer(['Freyheit', 'seyn', 'selbstthätig'], return_tensors='pt', padding=True)
32
  model_out = model.generate(**model_in)
33
 
34
- print(tokenizer.batch_decode(model_out))
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ```
36
 
37
 
 
2
  datasets:
3
  - aehrm/dtaec-lexica
4
  language: de
5
+ pipeline_tag: translation
6
+ model-index:
7
+ - name: aehrm/dtaec-type-normalizer
8
+ results:
9
+ - task:
10
+ name: Historic Text Normalization (type-level)
11
+ type: translation
12
+ dataset:
13
+ name: DTA-EC Lexicon
14
+ type: aehrm/dtaec-lexica
15
+ metrics:
16
+ - name: Word Accuracy
17
+ type: accuracy
18
+ value: 0.9546
19
+ - name: Word Accuracy OOV
20
+ type: accuracy
21
+ value: 0.9096
22
  ---
23
 
24
  # DTAEC Type Normalizer
 
48
  model_in = tokenizer(['Freyheit', 'seyn', 'selbstthätig'], return_tensors='pt', padding=True)
49
  model_out = model.generate(**model_in)
50
 
51
+ print(tokenizer.batch_decode(model_out, skip_special_tokens=True))
52
+ # >>> ['Freiheit', 'sein', 'selbsttätig']
53
+ ```
54
+
55
+ Or, more compact using the huggingface `pipeline`:
56
+
57
+ ```python
58
+ from transformers import pipeline
59
+
60
+ pipe = pipeline(model="aehrm/dtaec-type-normalizer")
61
+ out = pipe(['Freyheit', 'seyn', 'selbstthätig'])
62
+
63
+ print(out)
64
+ # >>> [{'generated_text': 'Freiheit'}, {'generated_text': 'sein'}, {'generated_text': 'selbsttätig'}]
65
  ```
66
 
67