oliverwang15's picture
Update README.md
b313b78
metadata
license: mit
datasets:
  - oliverwang15/fingpt_chatglm2_sentiment_instruction_lora_ft_dataset
language:
  - en
metrics:
  - accuracy
  - f1
tags:
  - ChatGLM2
  - LoRA

FinGPT_ChatGLM2_Sentiment_Instruction_LoRA_FT(FinGPT v3) is a LLM finetuned with LoRA method on the News and Tweets sentiment analysis dataset which achieve best scores on most of the financial sentiment analysis datasets.

Ⅰ. Try our model

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel

# Load Models
base_model = "THUDM/chatglm2-6b"
peft_model = "oliverwang15/FinGPT_ChatGLM2_Sentiment_Instruction_LoRA_FT"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModel.from_pretrained(base_model, trust_remote_code=True,  device_map = "auto")
model = PeftModel.from_pretrained(model, peft_model)

# Make prompts
prompt = [
'''Instruction: What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}
Input: FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is aggressively pursuing its growth strategy by increasingly focusing on technologically more demanding HDI printed circuit boards PCBs .
Answer: ''',
'''Instruction: What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}
Input: According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .
Answer: ''',
'''Instruction: What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}
Input: A tinyurl link takes users to a scamming site promising that users can earn thousands of dollars by becoming a Google ( NASDAQ : GOOG ) Cash advertiser .
Answer: ''',
]

# Generate results
tokens = tokenizer(prompt, return_tensors='pt', padding=True, max_length=512)
res = model.generate(**tokens, max_length=512)
res_sentences = [tokenizer.decode(i) for i in res]
out_text = [o.split("Answer: ")[1] for o in res_sentences]

# show results
for sentiment in out_text:
    print(sentiment)

# Output:    
# positive
# neutral
# negative

Ⅱ. Benchmark Results

ACC/F1 Micro BloombergGPT ChatGLM2 ChatGLM2 (8-bit*) FinGPT v3 FinGPT v3 (8-bit*)
FPB [1] - 0.464 0.476 0.8 0.784
FiQA-SA [2] - 0.822 0.833 0.815 0.818
TFNS [3] - 0.331 0.332 0.738 0.721
NWGI [4] - 0.560 0.561 0.588 0.588
Macro F1
FPB [1] - 0.487 0.5 0.774 0.754
FiQA-SA [2] - 0.56 0.57 0.665 0.645
TFNS [3] - 0.34 0.34 0.681 0.652
NWGI [4] - 0.489 0.492 0.579 0.576
Weighted F1
FPB [1] 0.511 0.381 0.398 0.795 0.778
FiQA-SA [2] 0.751 0.79 0.801 0.806 0.801
TFNS [3] - 0.189 0.19 0.74 0.721
NWGI [4] - 0.449 0.452 0.578 0.578
  • '8-bit' doesn't refer to finetuning in 8-bit, but refers to loading the trained model and inferencing in 8-bit mode.

[1] Financial_Phrasebank (FPB) is a financial news sentiment analysis benchmark, the labels are "positive", "negative" and "neutral". We use the same split as BloombergGPT. BloombergGPT only use 5-shots in the test to show their model's outstanding performance without further finetuning. However, is our task, all data in the 'train' part were used in finetuning, So our results are far better than Bloomberg's.

[2] FiQA SA consists of 17k sentences from microblog headlines and financial news. These labels were changed to "positive", "negative" and "neutral" according to BloombergGPT's paper. We have tried to use the same split as BloombergGPT's paper. However, the amounts of each label can't match exactly when the seed was set to 42.

[3] Twitter Financial News Sentiment (TFNS) dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their sentiment. The dataset holds 11,932 documents annotated with 3 labels: "Bearish" ("negative"), "Bullish" ("positive"), and "Neutral".

[4] News With GPT Instruction (MWGI) is a dataset whose labels were generated by ChatGPT. The train set has 16.2k samples and the test set has 4.05k samples. The dataset not only contains 7 classification labels: "strong negative", "moderately negative", "mildly negative", "neutral", "mildly positive", "moderately positive", "strong positive". but it also has the reasons for that result, which might be helpful in the instruction finetuning.

Ⅲ. How to Train

Coming Soon.