umuthopeyildirim
/

fin-rwkv-169M

Text Generation

Inference Endpoints

Model card Files Files and versions Community

fin-rwkv-169M / README.md

umuthopeyildirim's picture

umuthopeyildirim

Update README.md

500efce verified 12 months ago

|

2.5 kB

	---
	license: apache-2.0
	datasets:
	- gbharti/finance-alpaca
	language:
	- en
	library_name: transformers
	tags:
	- finance
	---

	# Fin-RWKV: Attention Free Financal Expert (WIP)
	Fin-RWKV is a cutting-edge, attention-free model designed specifically for financial analysis and prediction. Developed as part of a MindsDB Hackathon, this model leverages the simplicity and efficiency of the RWKV architecture to process financial data, providing insights and forecasts with remarkable accuracy. Fin-RWKV is tailored for professionals and enthusiasts in the finance sector who seek to integrate advanced deep learning techniques into their financial analyses.

	## Features
	- Attention-Free Architecture: Utilizes the RWKV (Recurrent Weighted Kernel-based) model, which bypasses the complexity of attention mechanisms while maintaining high performance.
	- Lower Costs: 10x to over a 100x+ lower inference cost, 2x to 10x lower training cost
	- Tinyyyy: Lightweight enough to run on CPUs in real-time bypassing the GPU - and is able to run on your laptop today
	- Finance-Specific Training: Trained on the gbharti/finance-alpaca dataset, ensuring that the model is finely tuned for financial data analysis.
	- Transformers Library Integration: Built on the popular 'transformers' library, ensuring easy integration with existing ML pipelines and applications.

	## Competing Against
	- [BloombergGPT](https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/)
	- [FinGPT](https://huggingface.co/FinGPT)

	\| Architecture \| Status \| Compute Efficiency \| Largest Model \| Trained Token \| Link \|
	\|--------------\|--------\|--------------------\|---------------\|---------------\|------\|
	\| (Fin)RWKV \| In Production \| O ( N ) \| 14B \| 500B++ (the pile+) \| [Paper](https://arxiv.org/abs/2305.13048) \|
	\| Ret Net (Microsoft) \| Research \| O ( N ) \| 6.7B \| 100B (mixed) \| [Paper](https://arxiv.org/abs/2307.08621) \|
	\| State Space (Stanford) \| Prototype \| O ( Log N ) \| 355M \| 15B (the pile, subset) \| [Paper](https://arxiv.org/abs/2302.10866) \|
	\| Liquid (MIT) \| Research \| - \| <1M \| - \| [Paper](https://arxiv.org/abs/2302.10866) \|
	\| Transformer Architecture (included for contrasting reference) \| In Production \| O ( N^2 ) \| 800B (est) \| 13T++ (est) \| - \|

	<img src="https://cdn-uploads.huggingface.co/production/uploads/631ea4247beada30465fa606/7vAOYsXH1vhTyh22o6jYB.png" width="500" alt="Inference computational cost vs. Number of tokens">

	_Note: Needs more data and training, testing purposes only._