kgreenewald's picture
Update README.md
1e0e6e8 verified
|
raw
history blame
11.8 kB
metadata
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
library_name: transformers

Granite RAG 3.0 8b

Model Summary

Granite RAG 3.0 8b is a RAG specific LoRA adapter for ibm-granite/granite-3.0-8b-instruct, which adds hallucination detection and citation generation capability, in addition to retaining the full abilities of the ibm-granite/granite-3.0-8b-instruct model.

Usage

Intended use

Granite RAG 3.0 8b is a LoRA adaptor for ibm-granite/granite-3.0-8b-instruct. This is a RAG specific adaptor which gives the ability to generate an output, detect whether hallucinations exist in the generated output and generate citations for the generate output. The output is generated as a json object, which contains output sentences, hallucination detections and citations.

Model input: The input to the model is a list of conversational turns converted to a string using apply_chat_template function. The first turn of the conversation is a system turn, the content field of which contains a json structure (converted to string). The json structure includes:

  1. instruction : the model is trained with the following system instruction: Respond to the user's latest question based solely on the information provided in the documents. Ensure that your response is strictly aligned with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data. Make sure that your response follows the attributes mentioned in the 'meta' field.
  2. documents : list of documents where, each item in the list is a dictionary with fields doc_id and text.
  3. meta: a dictionary where the output features can be controlled by two fields: hallucination_tags and citations

The rest of the turns in the conversation are user and assistant turns, where the content field contain a string.

Model output: The model output is a json structure. The output contains a list, where each entity contains following fields:

  1. sentense: output sentence.
  2. meta: This is a dictionary, which contains two fields hallucination_level and citation. hallucination_level could take values of high or low, where high indicates that the generated sentence likely contain some hallucinated content that is not grounded on the provided documents. citation is a dictionary, which contains fields snippet which indicates the sentences that grounds the response and doc_id which points to the document that contains the snippet.

Granite Rag 3.0 8b is not intended to detect the hallucinations of responses generated by any other models besides itself or ibm-granite/granite-3.0-8b-instruct.

Quickstart Example

The following code describes how to use the Granite Rag 3.0 8b in a RAG setting to generate answers, detect hallucinations and generate citations.

import  torch
from  transformers  import  AutoTokenizer,  AutoModelForCausalLM
from  peft  import  PeftModel,  PeftConfig
import  json

BASE_NAME  =  "ibm-granite/granite-3.0-8b-instruct"
LORA_NAME  =  "ibm-granite/granite-rag-3.0-8b-lora"

device=torch.device('cuda'  if  torch.cuda.is_available() else  'cpu')

tokenizer =  AutoTokenizer.from_pretrained(BASE_NAME,  padding_side='left',  trust_remote_code=True)
model_base =  AutoModelForCausalLM.from_pretrained(BASE_NAME,  device_map="auto")
model_rag =  PeftModel.from_pretrained(model_base,  LORA_NAME)
question_chat = [
{
    "role":  "system",
    "content":  "{\"instruction\": \"Respond to the user's latest question based solely on the information provided in the documents. Ensure that your response is strictly aligned with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data. Make sure that your response follows the attributes mentioned in the 'meta' field.\", \"documents\": [{\"doc_id\": 1, \"text\": \"Audrey Faith McGraw (born September 21, 1967) is an American singer and record producer. She is one of the most successful country artists of all time, having sold more than 40 million albums worldwide. Hill is married to American singer Tim McGraw, with whom she has recorded several duets. Hill's first two albums, Take Me as I Am (1993) and It Matters to Me (1995), were major successes and placed a combined three number ones on Billboard's country charts. Hill's debut album was Take Me as I Am (1993); sales were strong, buoyed by the chart success of \\\"Wild One\\\". Hill became the first female country singer in 30 years to hold Billboard's number one position for four consecutive weeks when \\\"Wild One\\\" managed the feat in 1994. Her version of \\\"Piece of My Heart\\\", also went to the top of the country charts in 1994. The album sold a total of 3 million copies. Other singles from the album include \\\"Take Me as I Am\\\". The recording of Faith's second album was delayed by surgery to repair a ruptured blood vessel on her vocal cords. It Matters to Me finally appeared in 1995 and was another success, with the title track becoming her third number-one country single. Several other top 10 singles followed, and more than 3 million copies of the album were sold. The fifth single from the album, \\\"I Can't Do That Anymore\\\", was written by country music artist Alan Jackson. Other singles from the album include \\\"You Can't Lose Me\\\", \\\"Someone Else's Dream\\\", and \\\"Let's Go to Vegas\\\". During this period, Hill appeared on the acclaimed PBS music program Austin City Limits. In spring 1996, Hill began the Spontaneous Combustion Tour with country singer Tim McGraw. At that time, Hill had recently become engaged to her former producer, Scott Hendricks, and McGraw had recently broken an engagement. McGraw and Hill were quickly attracted to each other and began an affair. After discovering that Hill was pregnant with their first child, the couple married on October 6, 1996. The couple have three daughters together: Gracie Katherine (born 1997), Maggie Elizabeth (born 1998) and Audrey Caroline (born 2001). Since their marriage, Hill and McGraw have endeavored never to be apart for more than three consecutive days. After the release of It Matters to Me, Hill took a three-year break from recording to give herself a rest from four years of touring and to begin a family with McGraw. During her break, she joined forces with her husband for their first duet, \\\"It's Your Love\\\". The song stayed at number one for six weeks, and won awards from both the Academy of Country Music and the Country Music Association. Hill has remarked that sometimes when they perform the song together, \\\"it [doesn't] feel like anybody else was really watching.\\\"\"}], \"meta\": {\"hallucination_tags\": true, \"citations\": true}}"
},
{
    "role":  "user",
    "content":  "Did Faith Hill take a break from recording after releasing her second album, It Matters to Me?"
}
]
# Generate answer
input_text = tokenizer.apply_chat_template(question_chat,  tokenize=False,  add_generation_prompt=True)
inputs =  tokenizer(input_text,  return_tensors="pt")
output = model_rag.generate(inputs["input_ids"].to(device),  attention_mask=inputs["attention_mask"].to(device),  max_new_tokens=500)
output_text = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:],  skip_special_tokens=True)
print("Output: "  + output_text)

Training Details

The Granite RAG 3.0 8b model is a LoRA adapter fine-tuned to generate responses, detect hallucinations and generate citations. The details of data creation for RAG response generation is available at Granite Technical Report. For creating the hallucination labels for responses, the technique available at Achintalwar, et al. was used. Citations were created by identifying the best overlapping snippets of the documents with the responses by using the ROUGE-L metric.

Training Data

The following public datasets were used for finetuning the RAG model.

Evaluation

  1. Evaluation of RAG response generation on the test sets of the RAGBench benchmark using RAGAS evaluation framework. Note: All the evaluations are done using Azure OpenAI Service.

a. Evaluation using RAGAS Faithfulness metric.

Dataset Granite 3.0 8B Instruct Granite RAG 3.0 8b
CovidQA 87.08 84.00
DelucionQA 87.97 89.85
EManual 83.83 87.74
ExpertQA 61.01 63.36
HAGRID 85.51 85.44
HotpotQA 88.34 86.99
MS Marco 88.85 90.52
PubMedQA 81.25 80.67
TAT-QA 81.28 76.12
TechQA 31.69 54.76
FinQA 63.34 47.79
Average 76.38 77.02

b. Evaluation using RAGAS Answer Correctness metric.

Dataset Granite 3.0 8B Instruct Granite RAG 3.0 8b
CovidQA 63.23 65.70
DelucionQA 66.95 71.05
EManual 66.30 67.85
ExpertQA 55.71 52.00
HAGRID 64.06 69.10
HotpotQA 75.87 76.24
MS Marco 65.13 65.62
PubMedQA 65.15 66.36
TAT-QA 73.05 71.67
TechQA 38.19 41.68
FinQA 55.78 73.25
Average 62.68 65.50
  1. Evaluation of the hallucination detection accuracy of the model.

The hallucination detection of the model was evaluated using the BEGIN dataset of the TRUE Factual Consistency Evaluation framework. The following table shows the Class-wise F1 score of the Granite RAG 3.0 8b compared with the teacher model Achintalwar, et al.

Dataset Achintalwar et al. (Deberta) Granite RAG 3.0 8b
BEGIN Hallucination (0.823) Hallucination (0.811)
No Hallucination (0.531) No Hallucination (0.493)

Model Card Authors

Chulaka Gunasekara