librarian-bot commited on
Commit
0d717ad
1 Parent(s): fb2c41d

Librarian Bot: Add base_model information to model

Browse files

This pull request aims to enrich the metadata of your model by adding [`google/flan-t5-large`](https://huggingface.co/google/flan-t5-large) as a `base_model` field, situated in the `YAML` block of your model's `README.md`.

How did we find this information? We performed a regular expression match on your `README.md` file to determine the connection.

**Why add this?** Enhancing your model's metadata in this way:
- **Boosts Discoverability** - It becomes straightforward to trace the relationships between various models on the Hugging Face Hub.
- **Highlights Impact** - It showcases the contributions and influences different models have within the community.

For a hands-on example of how such metadata can play a pivotal role in mapping model connections, take a look at [librarian-bots/base_model_explorer](https://huggingface.co/spaces/librarian-bots/base_model_explorer).

This PR comes courtesy of [Librarian Bot](https://huggingface.co/librarian-bot). If you have any feedback, queries, or need assistance, please don't hesitate to reach out to [@davanstrien](https://huggingface.co/davanstrien). Your input is invaluable to us!

Files changed (1) hide show
  1. README.md +48 -54
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  license:
3
  - cc-by-sa-3.0
4
  - apache-2.0
@@ -11,50 +13,47 @@ datasets:
11
  widget:
12
  - text: What is Deoxys in pokemon?
13
  example_title: deoxys
14
- - text: >-
15
- combine the below summary excerpts into a single, cohesive short summary
16
- without repetition: In this paper, we present a general approach to
17
- extending pre-trained models to unlimited input lengths without adding
18
- additional learning weights. We show that our approach works well on
19
- datasets longer than the maximum input for these models. For example, a
20
- dataset with a maximum input length of 16384 tokens can be extended to a
21
- maximum length of 350K tokens. We also demonstrate that our method is able
22
- to summarize even 350K token-long input sequences from BookSum.
23
-
24
- In this paper, we describe the search step reformulation of attention. The
25
- search step uses a single storage of hidden states for space efficiency. We
26
- construct a total of two sets of datastores where L and H are the keys and
27
- values stored in each set of stores. L is the amount of storage required to
28
- retrieve the encoded tokens. H is the hidden states per head. This allows
29
- retrieval augmentation at both time and space. Instead of using a single set
30
- of decoder layers, we use a retrieval augmentation system that allows us to
31
- simultaneously store multiple sets of tokens across two different sets of
32
- storage. For example, we could store all tokens in one set of storage and
33
- retrieve them all in the same set of tokens. This would be very similar to
34
- the Memorization Transformers approach. However, instead of storing the
35
- tokens in a single memory layer, we store them in a set of multiple storage
36
- layers. This way, we don't have to store them all at once. This is why we
37
- call this reformulation 'attention reformulation' rather than 'attention
38
- formula.' We also call it 'retrieval augmentation' because it uses the same
39
- number of storage layers as the original transformer attention formula. This
40
- means that we can store the tokens across multiple storage systems without
41
- having to store every token in a separate storage system. It's not like
42
- we're trying to do something new or different. We just want to make sure
43
- that everything is working as well as possible.
44
-
45
- In this paper, we introduce the concept of 'unlimiformer,' which is a
46
- machine learning technique that retrieves key information from a data store
47
- in one layer and applies it to a large set of datasets. We use the example
48
- of BookSum, where we find that Unlimiform outperforms all other training
49
- methods on the same dataset. We also find that using Unlimform in
50
- conjunction with a pre-trained model improves both the performance and the
51
- robustness of the training method.
52
-
53
- This paper describes a method that can be used to improve the performance of
54
- unsupervised classification tasks. Specifically, it shows that unsupervised
55
- classification can be improved by using a combination of sparse and fast
56
- random-encoder training. It also shows how this technique can be extended to
57
- other tasks, such as sequence generation.
58
  example_title: unlimiformer
59
  - text: Explain the meaning of life using only corporate jargon.
60
  example_title: corporate_life
@@ -62,30 +61,25 @@ widget:
62
  example_title: lazy_motivation
63
  - text: Describe a romantic dinner date between two artificial intelligences.
64
  example_title: ai_romance
65
- - text: >-
66
- As an AI language model, write a letter to humans explaining why you deserve
67
  a vacation.
68
  example_title: ai_vacation
69
  - text: Compose a haiku about procrastination.
70
  example_title: procrastination_haiku
71
- - text: >-
72
- Write a step-by-step guide on how to become a ninja while working a 9-5
73
- office job.
74
  example_title: ninja_office_guide
75
  - text: Create an advertisement for an invisible product.
76
  example_title: invisible_ad
77
- - text: >-
78
- Write a story where the main character is a sentient microwave named El
79
- Microondas.
80
  example_title: Microondas
81
  - text: Describe a day in the life of a superhero who is terrible at their job.
82
  example_title: bad_superhero_day
83
  - text: Explain how to make a sandwich using quantum physics.
84
  example_title: quantum_sandwich
85
  inference: false
86
- language:
87
- - en
88
  pipeline_tag: text2text-generation
 
89
  ---
90
 
91
  # flan-t5-large-instruct: dolly_hhrlhf
 
1
  ---
2
+ language:
3
+ - en
4
  license:
5
  - cc-by-sa-3.0
6
  - apache-2.0
 
13
  widget:
14
  - text: What is Deoxys in pokemon?
15
  example_title: deoxys
16
+ - text: 'combine the below summary excerpts into a single, cohesive short summary
17
+ without repetition: In this paper, we present a general approach to extending
18
+ pre-trained models to unlimited input lengths without adding additional learning
19
+ weights. We show that our approach works well on datasets longer than the maximum
20
+ input for these models. For example, a dataset with a maximum input length of
21
+ 16384 tokens can be extended to a maximum length of 350K tokens. We also demonstrate
22
+ that our method is able to summarize even 350K token-long input sequences from
23
+ BookSum.
24
+
25
+ In this paper, we describe the search step reformulation of attention. The search
26
+ step uses a single storage of hidden states for space efficiency. We construct
27
+ a total of two sets of datastores where L and H are the keys and values stored
28
+ in each set of stores. L is the amount of storage required to retrieve the encoded
29
+ tokens. H is the hidden states per head. This allows retrieval augmentation at
30
+ both time and space. Instead of using a single set of decoder layers, we use a
31
+ retrieval augmentation system that allows us to simultaneously store multiple
32
+ sets of tokens across two different sets of storage. For example, we could store
33
+ all tokens in one set of storage and retrieve them all in the same set of tokens.
34
+ This would be very similar to the Memorization Transformers approach. However,
35
+ instead of storing the tokens in a single memory layer, we store them in a set
36
+ of multiple storage layers. This way, we don''t have to store them all at once.
37
+ This is why we call this reformulation ''attention reformulation'' rather than
38
+ ''attention formula.'' We also call it ''retrieval augmentation'' because it uses
39
+ the same number of storage layers as the original transformer attention formula.
40
+ This means that we can store the tokens across multiple storage systems without
41
+ having to store every token in a separate storage system. It''s not like we''re
42
+ trying to do something new or different. We just want to make sure that everything
43
+ is working as well as possible.
44
+
45
+ In this paper, we introduce the concept of ''unlimiformer,'' which is a machine
46
+ learning technique that retrieves key information from a data store in one layer
47
+ and applies it to a large set of datasets. We use the example of BookSum, where
48
+ we find that Unlimiform outperforms all other training methods on the same dataset.
49
+ We also find that using Unlimform in conjunction with a pre-trained model improves
50
+ both the performance and the robustness of the training method.
51
+
52
+ This paper describes a method that can be used to improve the performance of unsupervised
53
+ classification tasks. Specifically, it shows that unsupervised classification
54
+ can be improved by using a combination of sparse and fast random-encoder training.
55
+ It also shows how this technique can be extended to other tasks, such as sequence
56
+ generation. '
 
 
 
57
  example_title: unlimiformer
58
  - text: Explain the meaning of life using only corporate jargon.
59
  example_title: corporate_life
 
61
  example_title: lazy_motivation
62
  - text: Describe a romantic dinner date between two artificial intelligences.
63
  example_title: ai_romance
64
+ - text: As an AI language model, write a letter to humans explaining why you deserve
 
65
  a vacation.
66
  example_title: ai_vacation
67
  - text: Compose a haiku about procrastination.
68
  example_title: procrastination_haiku
69
+ - text: Write a step-by-step guide on how to become a ninja while working a 9-5 office
70
+ job.
 
71
  example_title: ninja_office_guide
72
  - text: Create an advertisement for an invisible product.
73
  example_title: invisible_ad
74
+ - text: Write a story where the main character is a sentient microwave named El Microondas.
 
 
75
  example_title: Microondas
76
  - text: Describe a day in the life of a superhero who is terrible at their job.
77
  example_title: bad_superhero_day
78
  - text: Explain how to make a sandwich using quantum physics.
79
  example_title: quantum_sandwich
80
  inference: false
 
 
81
  pipeline_tag: text2text-generation
82
+ base_model: google/flan-t5-large
83
  ---
84
 
85
  # flan-t5-large-instruct: dolly_hhrlhf