system HF staff commited on
Commit
e8449bc
1 Parent(s): b44be3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -14
README.md CHANGED
@@ -1,16 +1,16 @@
1
  ---
2
- language: "en"
3
  datasets:
4
  - Spotify Podcasts Dataset
5
  tags:
6
  - t5
7
- - summarization
8
  - pytorch
9
  - lm-head
10
  metrics:
11
  - ROUGE
12
  pipeline:
13
- - summarization
14
  ---
15
 
16
  # T5 for Automatic Podcast Summarisation
@@ -26,24 +26,23 @@ Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
26
 
27
  ## Intended uses & limitations
28
  This model is intended to be used for automatic podcast summarisation. As creator provided descriptions
29
- were used for training, the model also learned to generate promotional material in its summaries, as such
30
  some post processing may be required on the model's outputs.
31
 
 
 
 
32
  #### How to use
33
- A 'summarize:' tag must be pre-pended to the source text before it is passed to the T5 model.
34
 
35
- ```python
36
- from transformers import T5Tokenizer, T5ForConditionalGeneration
37
 
38
- tokenizer = T5Tokenizer.from_pretrained('paulowoicho/t5-podcast-summarisation')
39
- model = T5ForConditionalGeneration.from_pretrained('paulowoicho/t5-podcast-summarisation')
40
 
41
- podcast_transcript = 'summarize: ' + podcast_transcript
42
- tokens = tokenizer.encode(podcast_transcript, return_tensors="pt")
43
- summary_ids = model.generate(tokens, max_length=150, num_beams=2, repetition_penalty=2.5, length_penalty=1.0, early_stopping=True)
44
- output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
45
 
46
- print(output)
47
  ```
48
 
49
  ## Training data
 
1
  ---
2
+ language: "[en]"
3
  datasets:
4
  - Spotify Podcasts Dataset
5
  tags:
6
  - t5
7
+ - summarisation
8
  - pytorch
9
  - lm-head
10
  metrics:
11
  - ROUGE
12
  pipeline:
13
+ -summarisation
14
  ---
15
 
16
  # T5 for Automatic Podcast Summarisation
 
26
 
27
  ## Intended uses & limitations
28
  This model is intended to be used for automatic podcast summarisation. As creator provided descriptions
29
+ were used for training, the model also learned to generate promotional material (links, hashtags, etc) in its summaries, as such
30
  some post processing may be required on the model's outputs.
31
 
32
+ If using on Colab, the instance will crash if the number of tokens in the transcript exceeds 7000. I discovered that the model
33
+ generated reasonable summaries even when the podcast transcript was truncated to reduce the number of tokens.
34
+
35
  #### How to use
 
36
 
37
+ The model can be used with the summarisation as follows:
 
38
 
39
+ ```python
40
+ from transformers import pipeline
41
 
42
+ summarizer = pipeline("summarization", model="paulowoicho/t5-podcast-summarisation", tokenizer="paulowoicho/t5-podcast-summarisation")
43
+ summary = summarizer(podcast_transcript, min_length=5, max_length=20)
 
 
44
 
45
+ print(summary[0]['summary_text'])
46
  ```
47
 
48
  ## Training data