doberst commited on
Commit
df060c4
·
verified ·
1 Parent(s): 22620c4

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -1
README.md CHANGED
@@ -1,3 +1,118 @@
1
  ---
2
- license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ inference: false
4
  ---
5
+
6
+ # dragon-phi-3-answer-tool
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+ dragon-phi-3-answer-tool is part of the DRAGON ("Delivering RAG On ...") model series, RAG-instruct trained on top of a Microsoft Phi-3 base model.
11
+
12
+ DRAGON models are fine-tuned with high-quality custom instruct datasets, designed for production use in RAG scenarios.
13
+
14
+
15
+ ### Benchmark Tests
16
+
17
+ Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
18
+ Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
19
+
20
+ --**Accuracy Score**: **100.0** correct out of 100
21
+ --Not Found Classification: 95.0%
22
+ --Boolean: 97.5%
23
+ --Math/Logic: 80.0%
24
+ --Complex Questions (1-5): 4 (Above Average - multiple-choice, causal)
25
+ --Summarization Quality (1-5): 4 (Above Average)
26
+ --Hallucinations: No hallucinations observed in test runs.
27
+
28
+ For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
29
+
30
+ ### Model Description
31
+
32
+ <!-- Provide a longer summary of what this model is. -->
33
+
34
+ - **Developed by:** llmware
35
+ - **Model type:** Dragon
36
+ - **Language(s) (NLP):** English
37
+ - **License:** Apache 2.0
38
+ - **Finetuned from model:** Microsoft Phi-3
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ The intended use of BLING models is two-fold:
45
+
46
+ 1. Provide high-quality RAG-Instruct models designed for fact-based, no "hallucination" question-answering in connection with an enterprise RAG workflow.
47
+
48
+ 2. BLING models are fine-tuned on top of leading base foundation models, generally in the 1-3B+ range, and purposefully rolled-out across multiple base models to provide choices and "drop-in" replacements for RAG specific use cases.
49
+
50
+
51
+ ### Direct Use
52
+
53
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
54
+
55
+ BLING is designed for enterprise automation use cases, especially in knowledge-intensive industries, such as financial services,
56
+ legal and regulatory industries with complex information sources.
57
+
58
+ BLING models have been trained for common RAG scenarios, specifically: question-answering, key-value extraction, and basic summarization as the core instruction types
59
+ without the need for a lot of complex instruction verbiage - provide a text passage context, ask questions, and get clear fact-based responses.
60
+
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
67
+
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ The fastest way to get started with BLING is through direct import in transformers:
72
+
73
+ from transformers import AutoTokenizer, AutoModelForCausalLM
74
+ tokenizer = AutoTokenizer.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
75
+ model = AutoModelForCausalLM.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
76
+
77
+ Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model. The **generation_test_llmware_script.py** includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.
78
+
79
+ The dRAGon model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
80
+
81
+ full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
82
+
83
+ The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
84
+
85
+ 1. Text Passage Context, and
86
+ 2. Specific question or instruction based on the text passage
87
+
88
+ To get the best results, package "my_prompt" as follows:
89
+
90
+ my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
91
+
92
+
93
+ If you are using a HuggingFace generation script:
94
+
95
+ # prepare prompt packaging used in fine-tuning process
96
+ new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"
97
+
98
+ inputs = tokenizer(new_prompt, return_tensors="pt")
99
+ start_of_output = len(inputs.input_ids[0])
100
+
101
+ # temperature: set at 0.3 for consistency of output
102
+ # max_new_tokens: set at 100 - may prematurely stop a few of the summaries
103
+
104
+ outputs = model.generate(
105
+ inputs.input_ids.to(device),
106
+ eos_token_id=tokenizer.eos_token_id,
107
+ pad_token_id=tokenizer.eos_token_id,
108
+ do_sample=True,
109
+ temperature=0.3,
110
+ max_new_tokens=100,
111
+ )
112
+
113
+ output_only = tokenizer.decode(outputs[0][start_of_output:],skip_special_tokens=True)
114
+
115
+
116
+ ## Model Card Contact
117
+
118
+ Darren Oberst & llmware team