Ambuj Varshney
commited on
Commit
•
49d06ad
1
Parent(s):
6082f0b
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- HuggingFaceFW/fineweb
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
library_name: transformers
|
8 |
+
tags:
|
9 |
+
- IoT
|
10 |
+
- sensor
|
11 |
+
- embedded
|
12 |
+
---
|
13 |
+
|
14 |
+
# TinyLLM
|
15 |
+
|
16 |
+
## Overview
|
17 |
+
|
18 |
+
This repository hosts a small language model developed as part of the TinyLLM framework ([arxiv link]). These models are specifically designed and fine-tuned with sensor data to support embedded sensing applications. They enable locally hosted language models on low-computing-power devices, such as single-board computers. The models, based on the GPT-2 architecture, are trained using Nvidia's H100 GPUs. This repo provides base models that can be further fine-tuned for specific downstream tasks related to embedded sensing.
|
19 |
+
## Model Information
|
20 |
+
|
21 |
+
- **Parameters:** 51M (Hidden Size = 512)
|
22 |
+
- **Architecture:** Decoder-only transformer
|
23 |
+
- **Training Data:** Up to 10B tokens from the [SHL](http://www.shl-dataset.org/) and [Fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) datasets, combined in a 4:6 ratio
|
24 |
+
- **Input and Output Modality:** Text
|
25 |
+
- **Context Length:** 1024
|
26 |
+
|
27 |
+
## Acknowledgements
|
28 |
+
|
29 |
+
We would like to acknowledge the open-source frameworks [llm.c](https://github.com/karpathy/llm.c) and [llama.cpp](https://github.com/ggerganov/llama.cpp), which were instrumental in training and testing these models.
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
|
33 |
+
The model can be used in two primary ways:
|
34 |
+
1. **With Hugging Face’s Transformers Library**
|
35 |
+
2. **With llama.cpp**
|
36 |
+
|
37 |
+
## Disclaimer
|
38 |
+
|
39 |
+
This model is intended solely for research purposes.
|