Danil
commited on
Commit
·
2d049c9
1
Parent(s):
5dea49b
Update README.md
Browse files
README.md
CHANGED
@@ -14,18 +14,28 @@ widget:
|
|
14 |
example_title: "Технологии"
|
15 |
---
|
16 |
## keyT5. Large version
|
|
|
|
|
17 |
|
18 |
-
[Large version](https://huggingface.co/0x7194633/keyt5-large)
|
19 |
|
20 |
-
[
|
|
|
|
|
21 |
|
|
|
22 |
Example usage (the code returns a list with keywords. duplicates are possible):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
```python
|
24 |
from itertools import groupby
|
25 |
import torch
|
26 |
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
27 |
-
|
28 |
-
model_name = "0x7194633/keyt5-large"
|
29 |
tokenizer = T5Tokenizer.from_pretrained(model_name)
|
30 |
model = T5ForConditionalGeneration.from_pretrained(model_name)
|
31 |
|
@@ -34,21 +44,29 @@ def generate(text, **kwargs):
|
|
34 |
with torch.no_grad():
|
35 |
hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
|
36 |
s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
|
37 |
-
s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')
|
38 |
s = [el for el, _ in groupby(s)]
|
39 |
return s
|
40 |
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
article =
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
Причиной этого является несбалансированное питание, акцент в котором сделан на
|
48 |
-
углеводистую и жирную пищу, а также массовая приверженность фастфудом."""
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
-
|
52 |
|
53 |
-
|
54 |
-
```
|
|
|
14 |
example_title: "Технологии"
|
15 |
---
|
16 |
## keyT5. Large version
|
17 |
+
Supported languages: ru
|
18 |
+
Github - [text2keywords](https://github.com/0x7o/text2keywords/edit/main/README.md)
|
19 |
|
|
|
20 |
|
21 |
+
[Pretraining Large version](https://huggingface.co/0x7194633/keyt5-large)
|
22 |
+
|
|
23 |
+
[Pretraining Base version](https://huggingface.co/0x7194633/keyt5-base)
|
24 |
|
25 |
+
# Usage
|
26 |
Example usage (the code returns a list with keywords. duplicates are possible):
|
27 |
+
|
28 |
+
[![Try Model Training In Colab!](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/0x7o/text2keywords/blob/main/example/keyT5_use.ipynb)
|
29 |
+
|
30 |
+
```
|
31 |
+
pip install transformers sentencepiece
|
32 |
+
```
|
33 |
+
|
34 |
```python
|
35 |
from itertools import groupby
|
36 |
import torch
|
37 |
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
38 |
+
model_name = "0x7194633/keyt5-large" # or 0x7194633/keyt5-base
|
|
|
39 |
tokenizer = T5Tokenizer.from_pretrained(model_name)
|
40 |
model = T5ForConditionalGeneration.from_pretrained(model_name)
|
41 |
|
|
|
44 |
with torch.no_grad():
|
45 |
hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
|
46 |
s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
|
47 |
+
s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')[:-1]
|
48 |
s = [el for el, _ in groupby(s)]
|
49 |
return s
|
50 |
|
51 |
+
article = """Reuters сообщил об отмене 3,6 тыс. авиарейсов из-за «омикрона» и погоды
|
52 |
+
Наибольшее число отмен авиарейсов 2 января пришлось на американские авиакомпании
|
53 |
+
SkyWest и Southwest, у каждой — более 400 отмененных рейсов. При этом среди
|
54 |
+
отмененных 2 января авиарейсов — более 2,1 тыс. рейсов в США. Также свыше 6400
|
55 |
+
рейсов были задержаны."""
|
56 |
|
57 |
+
print(generate(article, top_p=1.0, max_length=64))
|
58 |
+
# ['авиаперевозки', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов']
|
59 |
+
```
|
60 |
+
# Training
|
61 |
+
To teach the keyT5-base and keyT5-large models, you will need a table in csv format, like this:
|
|
|
|
|
62 |
|
63 |
+
KeyT5 models were trained on ~7000 compressed habr.com articles. [data.csv](https://github.com/0x7o/text2keywords/blob/main/dataset/train.csv) [collect.py](https://github.com/0x7o/text2keywords/blob/main/dataset/collect.py)
|
64 |
+
Exclusively supports the Russian language!
|
65 |
+
| X | Y |
|
66 |
+
|:--:|:--:|
|
67 |
+
| Some text that is fed to the input | The text that should come out |
|
68 |
+
| Some text that is fed to the input | The text that should come out |
|
69 |
|
70 |
+
Go to the training notebook and learn more about it:
|
71 |
|
72 |
+
[![Try Model Training In Colab!](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/0x7o/text2keywords/blob/main/example/keyT5_train.ipynb)
|
|