prithivMLmods commited on
Commit
c6a5f53
·
verified ·
1 Parent(s): 633f667

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -20,6 +20,8 @@ tags:
20
 
21
  # Tulu-MathLingo-8B Model Files
22
 
 
 
23
  | **File Name** | **Size** | **Description** | **Upload Status** |
24
  |-----------------------------------|--------------|------------------------------------------------|-------------------|
25
  | `.gitattributes` | 1.57 kB | Configures LFS tracking for large files. | Updated |
@@ -35,4 +37,87 @@ tags:
35
  | `tokenizer.json` | 17.2 MB | Full tokenizer configuration. | Uploaded (LFS) |
36
  | `tokenizer_config.json` | 57.6 kB | Metadata for tokenizer usage. | Uploaded |
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  # Tulu-MathLingo-8B Model Files
22
 
23
+ The **Tulu-MathLingo-8B** model is a fine-tuned version of **meta-llama/Llama-3.1-8B**, optimized for solving mathematical word problems and reasoning tasks in English and the Tulu language. The model integrates advanced language understanding and reasoning capabilities with a focus on providing solutions to math-related queries.
24
+
25
  | **File Name** | **Size** | **Description** | **Upload Status** |
26
  |-----------------------------------|--------------|------------------------------------------------|-------------------|
27
  | `.gitattributes` | 1.57 kB | Configures LFS tracking for large files. | Updated |
 
37
  | `tokenizer.json` | 17.2 MB | Full tokenizer configuration. | Uploaded (LFS) |
38
  | `tokenizer_config.json` | 57.6 kB | Metadata for tokenizer usage. | Uploaded |
39
 
40
+ ### **Key Features**
41
+
42
+ 1. **Multilingual Math Reasoning:**
43
+ - Designed for solving complex math problems in **English** and **Tulu**.
44
+
45
+ 2. **Text Generation:**
46
+ - Generates detailed and contextually accurate text responses.
47
+
48
+ 3. **Fine-Tuned Specializations:**
49
+ - Trained on the **microsoft/orca-math-word-problems-200k** dataset for word problem-solving.
50
+
51
+ 4. **Special Token Mapping:**
52
+ - Configured to use tokens for specific functions such as `<PAD>` and `<EOS>` effectively.
53
+
54
+ 5. **Secure and Efficient Storage:**
55
+ - Model weights are stored in the **Safetensors** format for secure and faster inference.
56
+
57
+ 6. **Large Parameter Size:**
58
+ - 8.03 billion parameters enable handling complex queries and multi-turn conversations.
59
+
60
  ---
61
+
62
+ ### **Training Details**
63
+
64
+ - **Base Model:** [meta-llama/Llama-3.1-8B](#)
65
+ - **Fine-Tuned:**
66
+ - Through multiple stages: **SFT (Supervised Fine-Tuning)** and **DPO (Direct Preference Optimization)**.
67
+
68
+ - **Dataset:**
69
+ - Trained on **200k word problems** from the **Microsoft Orca Math Word Problems Dataset**.
70
+
71
+ - **Model Size:**
72
+ - 8.03B parameters, optimized for **FP16** tensor type.
73
+
74
+ ---
75
+
76
+ ### **Applications**
77
+
78
+ 1. **Mathematical Word Problems:**
79
+ - Solve structured or unstructured math problems in natural language.
80
+
81
+ 2. **Conversational AI for Math:**
82
+ - Engage users in interactive dialogues focused on math and logic reasoning.
83
+
84
+ 3. **Multilingual Support:**
85
+ - Supports queries in **Tulu** and **English**, enhancing accessibility.
86
+
87
+ 4. **Education Tools:**
88
+ - Useful in tutoring systems for math, helping students with problem-solving.
89
+
90
+ ---
91
+
92
+ ### **Usage**
93
+
94
+ #### **Loading the Model**
95
+ ```python
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
+
98
+ model_name = "prithivMLmods/Tulu-MathLingo-8B"
99
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
100
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="fp16")
101
+ ```
102
+
103
+ ---
104
+
105
+ ##### **Math Word Problem**
106
+ ```python
107
+ query = "If a train travels 60 miles in 2 hours, what is its average speed?"
108
+ inputs = tokenizer(query, return_tensors="pt")
109
+ outputs = model.generate(**inputs, max_length=100)
110
+
111
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
112
+ print("Answer:", response)
113
+ ```
114
+ ### **Performance Requirements**
115
+
116
+ - **Hardware:**
117
+ - Requires a GPU with at least **24GB VRAM** for optimal performance due to model size and FP16 usage.
118
+
119
+ - **Optimization:**
120
+ - Use mixed precision (`fp16`) for reduced memory footprint.
121
+ - Split inference across multiple GPUs if necessary.
122
+
123
+ ---