Librarian Bot: Add base_model information to model

This pull request aims to enrich the metadata of your model by adding [`google/flan-t5-large`](https://huggingface.co/google/flan-t5-large) as a `base_model` field, situated in the `YAML` block of your model's `README.md`.

How did we find this information? We performed a regular expression match on your `README.md` file to determine the connection.

**Why add this?** Enhancing your model's metadata in this way:
- **Boosts Discoverability** - It becomes straightforward to trace the relationships between various models on the Hugging Face Hub.
- **Highlights Impact** - It showcases the contributions and influences different models have within the community.

For a hands-on example of how such metadata can play a pivotal role in mapping model connections, take a look at [librarian-bots/base_model_explorer](https://huggingface.co/spaces/librarian-bots/base_model_explorer).

This PR comes courtesy of [Librarian Bot](https://huggingface.co/librarian-bot). If you have any feedback, queries, or need assistance, please don't hesitate to reach out to [@davanstrien](https://huggingface.co/davanstrien). Your input is invaluable to us!

Files changed (1) hide show

README.md +48 -54

README.md CHANGED Viewed

@@ -1,4 +1,6 @@
 ---
 license:
 - cc-by-sa-3.0
 - apache-2.0
@@ -11,50 +13,47 @@ datasets:
 widget:
 - text: What is Deoxys in pokemon?
   example_title: deoxys
-- text: >-
-    combine the below summary excerpts into a single, cohesive  short summary
-    without repetition: In this paper, we present a general approach to
-    extending pre-trained models to unlimited input lengths without adding
-    additional learning weights. We show that our approach works well on
-    datasets longer than the maximum input for these models. For example, a
-    dataset with a maximum input length of 16384 tokens can be extended to a
-    maximum length of 350K tokens. We also demonstrate that our method is able
-    to summarize even 350K token-long input sequences from BookSum.
-    In this paper, we describe the search step reformulation of attention. The
-    search step uses a single storage of hidden states for space efficiency. We
-    construct a total of two sets of datastores where L and H are the keys and
-    values stored in each set of stores. L is the amount of storage required to
-    retrieve the encoded tokens. H is the hidden states per head. This allows
-    retrieval augmentation at both time and space. Instead of using a single set
-    of decoder layers, we use a retrieval augmentation system that allows us to
-    simultaneously store multiple sets of tokens across two different sets of
-    storage. For example, we could store all tokens in one set of storage and
-    retrieve them all in the same set of tokens. This would be very similar to
-    the Memorization Transformers approach. However, instead of storing the
-    tokens in a single memory layer, we store them in a set of multiple storage
-    layers. This way, we don't have to store them all at once. This is why we
-    call this reformulation 'attention reformulation' rather than 'attention
-    formula.' We also call it 'retrieval augmentation' because it uses the same
-    number of storage layers as the original transformer attention formula. This
-    means that we can store the tokens across multiple storage systems without
-    having to store every token in a separate storage system. It's not like
-    we're trying to do something new or different. We just want to make sure
-    that everything is working as well as possible.
-    In this paper, we introduce the concept of 'unlimiformer,' which is a
-    machine learning technique that retrieves key information from a data store
-    in one layer and applies it to a large set of datasets. We use the example
-    of BookSum, where we find that Unlimiform outperforms all other training
-    methods on the same dataset. We also find that using Unlimform in
-    conjunction with a pre-trained model improves both the performance and the
-    robustness of the training method.
-    This paper describes a method that can be used to improve the performance of
-    unsupervised classification tasks. Specifically, it shows that unsupervised
-    classification can be improved by using a combination of sparse and fast
-    random-encoder training. It also shows how this technique can be extended to
-    other tasks, such as sequence generation.
   example_title: unlimiformer
 - text: Explain the meaning of life using only corporate jargon.
   example_title: corporate_life
@@ -62,30 +61,25 @@ widget:
   example_title: lazy_motivation
 - text: Describe a romantic dinner date between two artificial intelligences.
   example_title: ai_romance
-- text: >-
-    As an AI language model, write a letter to humans explaining why you deserve
     a vacation.
   example_title: ai_vacation
 - text: Compose a haiku about procrastination.
   example_title: procrastination_haiku
-- text: >-
-    Write a step-by-step guide on how to become a ninja while working a 9-5
-    office job.
   example_title: ninja_office_guide
 - text: Create an advertisement for an invisible product.
   example_title: invisible_ad
-- text: >-
-    Write a story where the main character is a sentient microwave named El
-    Microondas.
   example_title: Microondas
 - text: Describe a day in the life of a superhero who is terrible at their job.
   example_title: bad_superhero_day
 - text: Explain how to make a sandwich using quantum physics.
   example_title: quantum_sandwich
 inference: false
-language:
-- en
 pipeline_tag: text2text-generation
 ---
 # flan-t5-large-instruct: dolly_hhrlhf

 ---
+language:
+- en
 license:
 - cc-by-sa-3.0
 - apache-2.0
 widget:
 - text: What is Deoxys in pokemon?
   example_title: deoxys
+- text: 'combine the below summary excerpts into a single, cohesive  short summary
+    without repetition: In this paper, we present a general approach to extending
+    pre-trained models to unlimited input lengths without adding additional learning
+    weights. We show that our approach works well on datasets longer than the maximum
+    input for these models. For example, a dataset with a maximum input length of
+    16384 tokens can be extended to a maximum length of 350K tokens. We also demonstrate
+    that our method is able to summarize even 350K token-long input sequences from
+    BookSum.
+    In this paper, we describe the search step reformulation of attention. The search
+    step uses a single storage of hidden states for space efficiency. We construct
+    a total of two sets of datastores where L and H are the keys and values stored
+    in each set of stores. L is the amount of storage required to retrieve the encoded
+    tokens. H is the hidden states per head. This allows retrieval augmentation at
+    both time and space. Instead of using a single set of decoder layers, we use a
+    retrieval augmentation system that allows us to simultaneously store multiple
+    sets of tokens across two different sets of storage. For example, we could store
+    all tokens in one set of storage and retrieve them all in the same set of tokens.
+    This would be very similar to the Memorization Transformers approach. However,
+    instead of storing the tokens in a single memory layer, we store them in a set
+    of multiple storage layers. This way, we don''t have to store them all at once.
+    This is why we call this reformulation ''attention reformulation'' rather than
+    ''attention formula.'' We also call it ''retrieval augmentation'' because it uses
+    the same number of storage layers as the original transformer attention formula.
+    This means that we can store the tokens across multiple storage systems without
+    having to store every token in a separate storage system. It''s not like we''re
+    trying to do something new or different. We just want to make sure that everything
+    is working as well as possible.
+    In this paper, we introduce the concept of ''unlimiformer,'' which is a machine
+    learning technique that retrieves key information from a data store in one layer
+    and applies it to a large set of datasets. We use the example of BookSum, where
+    we find that Unlimiform outperforms all other training methods on the same dataset.
+    We also find that using Unlimform in conjunction with a pre-trained model improves
+    both the performance and the robustness of the training method.
+    This paper describes a method that can be used to improve the performance of unsupervised
+    classification tasks. Specifically, it shows that unsupervised classification
+    can be improved by using a combination of sparse and fast random-encoder training.
+    It also shows how this technique can be extended to other tasks, such as sequence
+    generation. '
   example_title: unlimiformer
 - text: Explain the meaning of life using only corporate jargon.
   example_title: corporate_life
   example_title: lazy_motivation
 - text: Describe a romantic dinner date between two artificial intelligences.
   example_title: ai_romance
+- text: As an AI language model, write a letter to humans explaining why you deserve
     a vacation.
   example_title: ai_vacation
 - text: Compose a haiku about procrastination.
   example_title: procrastination_haiku
+- text: Write a step-by-step guide on how to become a ninja while working a 9-5 office
+    job.
   example_title: ninja_office_guide
 - text: Create an advertisement for an invisible product.
   example_title: invisible_ad
+- text: Write a story where the main character is a sentient microwave named El Microondas.
   example_title: Microondas
 - text: Describe a day in the life of a superhero who is terrible at their job.
   example_title: bad_superhero_day
 - text: Explain how to make a sandwich using quantum physics.
   example_title: quantum_sandwich
 inference: false
 pipeline_tag: text2text-generation
+base_model: google/flan-t5-large
 ---
 # flan-t5-large-instruct: dolly_hhrlhf