---
language:
- en
pipeline_tag: text-generation
tags:
- llama-3.1
- finance
- economics
- math
- reasoning
- finetuning
license: other
library_name: transformers
---
Model v1.0
![image/jpeg](https://i.imgur.com/sdN0Aqg.jpeg)
## Llama-3.1-Hawkish-8B v1
Model has been further finetuned on a set of newly generated 50m high quality tokens related to Financial topics covering topics such as Economics, Fixed Income, Equities, Corporate Financing, Derivatives and Portfolio Management. Data was gathered from publicly available sources and went through several stages of curation into instruction data from the initial amount of 250m+ tokens. To aid in mitigating forgetting information from the original finetune, the data was mixed with instruction sets on the topics of Coding, General Knowledge, NLP and Conversational Dialogue.
The model has shown to improve over a number of benchmarks over the original model, notably in Math and Economics. This model represents the first time a 8B model has been able to convincingly get a passing score on the CFA Level 1 exam, requiring a typical 300 hours of studying, indicating a significant improvement in Financial Knowledge.
![image/png](https://i.imgur.com/4PzKe7W.png)
## CFA Level 1 Mock Exam Results
If you work in the financial and investment sectors, you will know about the CFA - their exam is known to be **“the world's toughest”**, requiring typically a total of over 1000 hours to study all 3 levels. Below is a comparison of different models on a sample Level 1 CFA Mock Exam, showing Llama Hawkish outperforming much larger models on the exam. The same prompt was used for all models, results are all 0-shot CoT. Sample mock exam with comparison to other models shown below.
CFA Level 1 |
GPT-4o-mini (%) |
Llama Hawkish 8B (%) |
Meta-Llama Instruct 8B (%) |
Meta-Llama Instruct 70B (%) |
Palmyra Fin 70B (%) |
Ethical and Professional Standards |
77.77 |
77.77 |
55.55 |
66.6 |
61.11 |
Quantitative Methods |
71.43 |
71.4 |
64.28 |
85.71 |
71.4 |
Economics |
66.66 |
75 |
58 |
58.33 |
41.66 |
Financial Reporting |
79.2 |
87.5 |
66.6 |
70.83 |
50 |
Corporate Finance |
80 |
60 |
50 |
80 |
50 |
Equity Investments |
50 |
50 |
41.6 |
66.6 |
41.66 |
Fixed Income |
78.57 |
50 |
28.57 |
50 |
42.85 |
Derivatives |
50 |
66.7 |
33.3 |
33.3 |
50 |
Alternative Investments |
100 |
100 |
75 |
100 |
75 |
Portfolio Management |
83.3 |
83.3 |
50 |
100 |
83.3 |
Weighted Average |
73.49 |
71.43 |
52.7672 |
69.86 |
54.77 |
- |
Result |
PASS |
PASS |
FAIL |
PASS |
FAIL |
The mock exams are all of varying difficulty and pass rates can be anywhere from 64% to 72% for different Level 1 Mock Exams, with the average being around 67% which is above the 12 year average MPS of 65% for all CFA years. (https://300hours.com/cfa-passing-score/)
Some other Frontier Models were tested on CFA Level 1 Mock Exam, their scores shown below for reference, due to this being a 8B model it is not deemed feasible to outperform these however they leave benchmarks for larger versions of Hawkish or others researching financial data mixes in fine tuning.
Model | Performance (%) |
Qwen2.5 - 70B | 81.66% |
Nemotron - 70B | 79.16% |
GPT-4o | 85.83% |
GPT-4o-mini | 73.49% |
Gemini 1.5 Pro | 75% |
Llama 405B Instruct | 77.5% |
DeepSeekChat v2.5 | 66.6% |
### Model Usage Example
```python
from vllm import LLM, SamplingParams
# Load the LLM using vLLM
llm = LLM(model="mukaj/Llama-3.1-Hawkish-8B", dtype="auto")
tokenizer = llm.get_tokenizer()
messages = [
{"role": "system", "content": "You are a Financial Analyst. Reason step by step before answering."},
{"role": "user", "content": """Given that an equal-weighted index and a market-capitalization-weighted index consist of the same securities, underperformance by small-cap stocks will most likely result in the market-capitalization-weighted index exhibiting what price returns compared to the equal weighted index?""" }
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Sampling configuration for vLLM
sampling_params = SamplingParams(temperature=0.2, max_tokens=512)
# Generate response using vLLM
generation = llm.generate(prompt, sampling_params)
# Decode response
generated_response = generation[0].outputs[0].text
print(generated_response)
```
#### Output
=============
To analyze the impact of underperformance by small-cap stocks on the price returns of an equal-weighted index versus a market-capitalization-weighted index, we need to consider how each index is constructed and how their weights affect their performance.
1. **Index Construction**:
- **Equal-Weighted Index**: Each security in the index has the same weight, regardless of its market capitalization. This means that all stocks contribute equally to the index's performance.
- **Market-Capitalization-Weighted Index**: Each security's weight in the index is proportional to its market capitalization. Larger companies have a greater influence on the index's performance.
2. **Impact of Underperformance**:
- If small-cap stocks underperform, it means that these stocks are not performing as well as they were expected to. In an equal-weighted index, the underperformance of small-cap stocks will still be reflected in the index's performance, but since all stocks have equal weights, the overall impact will be more pronounced.
- In a market-capitalization-weighted index, the underperformance of small-cap stocks will have a more muted effect on the index's performance because larger stocks (which typically have higher market caps) will still dominate the index's returns. The influence of small-cap stocks, even if they are underperforming, is less significant due to their lower weight.
3. **Price Returns Comparison**:
- As a result of the underperformance of small-cap stocks, the equal-weighted index is likely to show a more significant decline in price returns compared to the market-capitalization-weighted index. This is because the equal-weighted index is more sensitive to the performance of all stocks, including the underperforming small-cap stocks.
- Conversely, the market-capitalization-weighted index will likely exhibit less severe price declines because the larger stocks will continue to drive the index's performance, mitigating the impact of the underperformance of smaller stocks.
In summary, underperformance by small-cap stocks will most likely result in the market-capitalization-weighted index exhibiting **higher price returns** compared to the equal-weighted index. The equal-weighted index will likely show a more pronounced decline in price returns due to the equal weighting of all stocks, including the underperforming small-cap stocks.
================
## Disclaimer & Intended Uses
### Model & License
This research model is derived from Meta's LLaMA 3.1 architecture and is subject to LLaMA 3.1 community license terms, with additional restrictions below. While based on LLaMA 3.1, this is an independent research implementation focused on studying financial data influences in training. Users acknowledge this is experimental research, use at own risk, and accept full responsibility for any implementation or application.
### Permitted Use
- Academic and research purposes only
- No production environments or real-world applications
- No financial decision-making or advisory use
### Liability & Responsibility
The creators of this model:
- Accept no responsibility for any use of the model
- Provide no warranties or guarantees
- Make no claims about accuracy or reliability
### Intellectual Property & Attribution
- All findings and opinions are solely those of the authors
- Not endorsed by or affiliated with Meta, CFA Institute or any institutions
- All trademarks belong to respective owners
The creators reserve the right to modify these terms at any time.