Iker
/

ClickbaitFighter-7B

@@ -1,199 +1,267 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 ---
+license: cc-by-nc-sa-4.0
+datasets:
+  - Iker/NoticIA
+language:
+  - es
+metrics:
+  - rouge
+library_name: transformers
+pipeline_tag: text-generation
+base_model: openchat/openchat-3.5-0106
+tags:
+  - clickbait
+  - noticia
+  - spanish
+  - summary
+  - summarization
+widget:
+  - example_title: Summary Example
+    messages:
+      - role: user
+        content: "Ahora eres una Inteligencia Artificial experta en desmontar titulares
+          sensacionalistas o clickbait. Tu tarea consiste en analizar noticias
+          con titulares sensacionalistas y generar un resumen de una sola frase
+          que revele la verdad detrás del titular.\\nEste es el titular de la
+          noticia: Le compra un abrigo a su abuela de 97 años y la reacción de
+          esta es una fantasía\\nEl titular plantea una pregunta o proporciona
+          información incompleta. Debes buscar en el cuerpo de la noticia una
+          frase que responda lo que se sugiere en el título. Siempre que puedas
+          cita el texto original, especialmente si se trata de una frase que
+          alguien ha dicho. Si citas una frase que alguien ha dicho, usa
+          comillas para indicar que es una cita. Usa siempre las mínimas
+          palabras posibles. No es necesario que la respuesta sea una oración
+          completa. Puede ser sólo el foco de la pregunta. Recuerda responder
+          siempre en Español.\\nEste es el cuerpo de la noticia:\\nLa usuaria de
+          X @Kokreta1 ha relatado la conversación que ha tenido con su abuela de
+          97 años cuando le ha dado el abrigo que le ha comprado para su
+          cumpleaños.\\nTeniendo en cuenta la avanzada edad de la señora, la
+          tuitera le ha regalado una prenda acorde a sus años, algo con lo que
+          su yaya no ha estado de acuerdo.\\nEl abrigo es de vieja, ha opinado
+          la mujer cuando lo ha visto. Os juro que soy muy fan. Mañana vamos las
+          dos (a por otro). Eso sí, la voy a llevar al Bershka, ha asegurado
+          entre risas la joven.\\nSegún la propia cadena de ropa, la cual
+          pertenece a Inditex, su público se caracteriza por ser jóvenes
+          atrevidos, conocedores de las últimas tendencias e interesados en la
+          música, las redes sociales y las nuevas tecnologías, por lo que la
+          gente mayor no suele llevar este estilo.\\nLa inusual personalidad de
+          la señora ha encantado a los usuarios de la red. Es por eso que el
+          relato ha acumulado más de 1.000 me gusta y cerca de 100 retuits,
+          además de una multitud de comentarios.\\n"
+---
+<table>
+<tr>
+<td style="width:100%"><img src="https://github.com/ikergarcia1996/NoticIA/blob/main/assets/head.png?raw=true" align="right" width="100%"> </td>
+</tr>
+</table>
+A model finetuned with the [NoticIA Dataset](https://huggingface.co/datasets/Iker/NoticIA). This model can generate summaries of clickbait headlines
+- 📖 Paper: [Coming soon]()
+- 📓 NoticIA Dataset: [https://huggingface.co/datasets/Iker/NoticIA](https://huggingface.co/datasets/Iker/NoticIA)
+- 💻 Baseline Code: [https://github.com/ikergarcia1996/NoticIA](https://github.com/ikergarcia1996/NoticIA)
+- 🤖 Pre Trained Models [https://huggingface.co/collections/Iker/noticia-and-clickbaitfighter-65fdb2f80c34d7c063d3e48e](https://huggingface.co/collections/Iker/noticia-and-clickbaitfighter-65fdb2f80c34d7c063d3e48e)
+- 🔌 Online Demo: [https://iker-clickbaitfighter.hf.space/](https://iker-clickbaitfighter.hf.space/)
+# Open Source Models
+<table border="1" cellspacing="0" cellpadding="5">
+    <thead>
+        <tr>
+            <th></th>
+            <th><a href="https://huggingface.co/Iker/ClickbaitFighter-2B">Iker/ClickbaitFighter-2B</a></th>
+            <th><a href="https://huggingface.co/Iker/ClickbaitFighter-7B">Iker/ClickbaitFighter-7B</a></th>
+            <th><a href="https://huggingface.co/Iker/ClickbaitFighter-10B">Iker/ClickbaitFighter-10B</a></th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Param. no.</td>
+            <td>2B</td>
+            <td>7B</td>
+            <td>10M</td>
+        </tr>
+        <tr>
+            <td>ROUGE</td>
+            <td>36.26</td>
+            <td>49.81</td>
+            <td>52.01</td>
+        </tr>
+        <tr>
+    </tbody>
+</table>
+# Evaluation Results
+<table>
+<tr>
+<td style="width:100%"><img src="https://github.com/ikergarcia1996/NoticIA/raw/main/results/Results.png" align="right" width="100%"> </td>
+</tr>
+</table>
+# Usage example:
+## Summarize a web article
+```python
+import torch # pip install torch
+from newspaper import Article #pip3 install newspaper3k
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig # pip install transformers
+article_url ="https://www.huffingtonpost.es/virales/le-compra-abrigo-abuela-97nos-reaccion-fantasia.html"
+article = Article(article_url)
+article.download()
+article.parse()
+headline=article.title
+body = article.text
+def prompt(
+    headline: str,
+    body: str,
+) -> str:
+    """
+    Generate the prompt for the model.
+    Args:
+        headline (`str`):
+            The headline of the article.
+        body (`str`):
+            The body of the article.
+    Returns:
+        `str`: The formatted prompt.
+    """
+    return (
+        f"Ahora eres una Inteligencia Artificial experta en desmontar titulares sensacionalistas o clickbait. "
+        f"Tu tarea consiste en analizar noticias con titulares sensacionalistas y "
+        f"generar un resumen de una sola frase que revele la verdad detrás del titular.\n"
+        f"Este es el titular de la noticia: {headline}\n"
+        f"El titular plantea una pregunta o proporciona información incompleta. "
+        f"Debes buscar en el cuerpo de la noticia una frase que responda lo que se sugiere en el título. "
+        f"Siempre que puedas cita el texto original, especialmente si se trata de una frase que alguien ha dicho. "
+        f"Si citas una frase que alguien ha dicho, usa comillas para indicar que es una cita. "
+        f"Usa siempre las mínimas palabras posibles. No es necesario que la respuesta sea una oración completa. "
+        f"Puede ser sólo el foco de la pregunta. "
+        f"Recuerda responder siempre en Español.\n"
+        f"Este es el cuerpo de la noticia:\n"
+        f"{body}\n"
+    )
+prompt = prompt(headline=headline, body=body)
+tokenizer = AutoTokenizer.from_pretrained("Iker/ClickbaitFighter-7B")
+model = AutoModelForCausalLM.from_pretrained(
+    "Iker/ClickbaitFighter-2B", torch_dtype=torch.bfloat16, device_map="auto"
+)
+formatted_prompt = tokenizer.apply_chat_template(
+    [{"role": "user", "content": prompt}],
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer(
+    [formatted_prompt], return_tensors="pt", add_special_tokens=False
+)
+model_output = model.generate(**model_inputs.to(model.device), generation_config=GenerationConfig(
+  max_new_tokens=32,
+  min_new_tokens=1,
+  do_sample=False,
+  num_beams=1,
+  use_cache=True
+))
+summary = tokenizer.batch_decode(model_output,skip_special_tokens=True)[0]
+print(summary.strip().split("\n")[-1]) # Get only the summary, without the prompt.
+```
+## Run inference in the NoticIA dataset
+```python
+import torch # pip install torch
+from newspaper import Article #pip3 install newspaper3k
+from datasets import load_dataset # pip install datasets
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig # pip install transformers
+dataset = load_dataset("Iker/NoticIA")
+example = dataset["test"][0]
+headline = example["web_headline"]
+body = example["web_text"]
+def prompt(
+    headline: str,
+    body: str,
+) -> str:
+    """
+    Generate the prompt for the model.
+    Args:
+        headline (`str`):
+            The headline of the article.
+        body (`str`):
+            The body of the article.
+    Returns:
+        `str`: The formatted prompt.
+    """
+    return (
+        f"Ahora eres una Inteligencia Artificial experta en desmontar titulares sensacionalistas o clickbait. "
+        f"Tu tarea consiste en analizar noticias con titulares sensacionalistas y "
+        f"generar un resumen de una sola frase que revele la verdad detrás del titular.\n"
+        f"Este es el titular de la noticia: {headline}\n"
+        f"El titular plantea una pregunta o proporciona información incompleta. "
+        f"Debes buscar en el cuerpo de la noticia una frase que responda lo que se sugiere en el título. "
+        f"Siempre que puedas cita el texto original, especialmente si se trata de una frase que alguien ha dicho. "
+        f"Si citas una frase que alguien ha dicho, usa comillas para indicar que es una cita. "
+        f"Usa siempre las mínimas palabras posibles. No es necesario que la respuesta sea una oración completa. "
+        f"Puede ser sólo el foco de la pregunta. "
+        f"Recuerda responder siempre en Español.\n"
+        f"Este es el cuerpo de la noticia:\n"
+        f"{body}\n"
+    )
+prompt = prompt(headline=headline, body=body)
+tokenizer = AutoTokenizer.from_pretrained("Iker/ClickbaitFighter-7B")
+model = AutoModelForCausalLM.from_pretrained(
+    "Iker/ClickbaitFighter-2B", torch_dtype=torch.bfloat16, device_map="auto"
+)
+formatted_prompt = tokenizer.apply_chat_template(
+    [{"role": "user", "content": prompt}],
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer(
+    [formatted_prompt], return_tensors="pt", add_special_tokens=False
+)
+model_output = model.generate(**model_inputs.to(model.device), generation_config=GenerationConfig(
+  max_new_tokens=32,
+  min_new_tokens=1,
+  do_sample=False,
+  num_beams=1,
+  use_cache=True
+))
+summary = tokenizer.batch_decode(model_output,skip_special_tokens=True)[0]
+print(summary.strip().split("\n")[-1]) # Get only the summary, without the prompt.
+```
+# Citation
+Paper coming soon, for now, you can use this citation:
+```bittext
+@misc{garcia-ferrero-etal-2024-noticia,
+    title = "NoticIA: A Clickbait Article Summarization Dataset in Spanish.",
+    author = "Garc{\'\i}a-Ferrero, Iker  and Altuna, Bego{\~n}a",
+    year = "2024",
+    url = "https://github.com/ikergarcia1996/NoticIA"
+}
+```