File size: 3,467 Bytes
b7ae4f8
 
16b03fe
 
 
 
 
 
b7ae4f8
16b03fe
 
 
 
 
 
 
 
 
 
 
 
 
 
ab72170
 
 
 
 
16b03fe
 
 
 
ab72170
16b03fe
 
 
 
ab72170
16b03fe
 
 
 
ab72170
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d1ae1a
ab72170
 
5d1ae1a
 
 
 
 
 
 
 
 
 
 
ab72170
16b03fe
 
 
 
 
ab72170
16b03fe
 
 
ab72170
16b03fe
 
 
 
ab72170
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: apache-2.0
language:
- en
metrics:
- rouge
- bleu
library_name: transformers
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** **விபின்**
- **Model type:** T5-small 
- **Language(s) (NLP):** English
- **License:** Apache 2.0 license
- **Finetuned from model [optional]:** T5-small model


## Uses

This model aims to respond with extractive and abstractive keyphrases for the given content. Kindly use "find keyphrase: " as the task prefix prompt to get the desired outputs.


## Bias, Risks, and Limitations

This model response is based on the inputs given to it. So if any Harmful sentences given to this model, it will respond according to that.


## How to Get Started with the Model

```
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

model_dir = "rv2307/keyphrase-abstraction-t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_dir)
model = T5ForConditionalGeneration.from_pretrained(model_dir, torch_dtype=torch.bfloat16)
device = "cuda"
model.to(device)

def generate(text):
    
    text = "find keyphrase: "  + text
    inputs = tokenizer(text, max_length=512, padding=True, truncation=True, return_tensors='pt')
    inputs = {k:v.to(model.device) for k,v in inputs.items()}

    
    with torch.no_grad():
        outputs = model.generate(
            inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=100,
            use_cache=True
        )

    output_list = tokenizer.decode(outputs[0],skip_special_tokens=True)
        
    return output_list

content = "Use of BICs by businesses has been recommended by the Task Force on Nature-related Financial Disclosures[2] and the first provider of BICs for sale is Botanic Gardens Conservation International (BGCI). The credits are generated by BGCI's international member organisations by rebuilding the populations of tree species at high risk of extinction under the IUCN Red List methodology.[3]"
outputs = generate(content)
print(outputs)
"""
[
  "BICs for businesses",
  "Task Force on Naturerelated Financial Disclosures",
  "Botanic Gardens Conservation International (BGCI)",
  "Rebuilding tree species at high risk",
  "IUCN Red List methodology",
  "Credits generated by BGCI",
  "International member organisations"
]
"""
```

## Training Details

### Training Data

Mostly used open source datasets for these tasks, which are already available on the huggingface.

### Training Procedure

This model has been fine tuned for 6 epochs with 40k datasets collected from the internet. 


### Results

```
Epoch	Training Loss	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1	0.105800	0.087497	43.840900	19.029900	40.303200	40.320300	16.306200
2	0.097600	0.081029	46.335000	21.246800	42.377400	42.387500	16.404900
3	0.091800	0.077546	47.721200	22.467200	43.622400	43.632000	16.308200
4	0.087600	0.075441	48.633700	23.351300	44.493800	44.504300	16.359000
5	0.088200	0.074088	48.977500	23.747000	44.804900	44.813200	16.300500
6	0.084900	0.073381	49.347300	24.029500	45.097100	45.108300	16.332600
```