crumb
/

nano-mistral

@@ -17,73 +17,62 @@ tags: []
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
@@ -92,25 +81,23 @@ Use the code below to get started with the model.
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 [More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
 #### Factors
@@ -122,21 +109,26 @@ Use the code below to get started with the model.
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
-[More Information Needed]
-#### Summary
 ## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
@@ -144,29 +136,29 @@ Use the code below to get started with the model.
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
 ### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
 ## Citation [optional]

 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** me
+- **Model type:** Mistral
+- **Language(s) (NLP):** en
+- **License:** apache
 ## Uses
+general web text completions at extremely low resource use
 ### Out-of-Scope Use
+not an instruct model
 ## Bias, Risks, and Limitations
+trained on web text, though filtered no guarantees theres not toxic stuff in there
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral")
+tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral")
+inputs = tokenizer(["Once upon a time,"], return_tensors="pt")
+inputs = {k:v.to(model.device) for k,v in dict(inputs).items()}
+outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True)
+outputs = tokenizer.batch_decode(outputs)
+for i in outputs:
+  print(i)
+```
 ## Training Details
 ### Training Data
+[crumb/askmistral-pile-2-15](https://huggingface.co/datasets/crumb/askmistral-pile-2-15)
 ### Training Procedure
+| Parameter | Value |
+| - | - |
+| Context Length | 2048 |
+| Batch Size | 128 |
+| Learning Rate | 6e-4 |
+| Scheduler | One-Cycle |
+| Adam eps | 1e-8 |
+| Adam beta1 | 0.9 |
+| Adam beta2 | 0.95 |
+| Weight Decay | 0.1 |
+| Max Grad Norm | 1.0 |
+| Optimizer | adamw_torch |
+| Tokens | 3,401,640,960 |
 #### Preprocessing [optional]
 #### Training Hyperparameters
+- **Training regime:** bf16 non-mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
+train_runtime 62541.9424
+train_samples_per_second 26.557
 [More Information Needed]
 ## Evaluation
 ### Testing Data, Factors & Metrics
 #### Testing Data
+held out set of [crumb/askmistral-pile-2-15](https://huggingface.co/datasets/crumb/askmistral-pile-2-15)
 #### Factors
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
+open llm leaderboard eval datasets and settings
 ### Results
+|    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
+|-------------|------:|------|-----:|--------|-----:|---|-----:|
+|arc_challenge|      1|none  |    25|acc     |0.1843|±  |0.0113|
+|             |       |none  |    25|acc_norm|0.2167|±  |0.0120|
+|truthfulqa_mc2|      2|none  |     0|acc   |0.4719|±  |0.0156|
+|winogrande|      1|none  |     5|acc   |0.517|±  | 0.014|
+|hellaswag|      1|none  |    10|acc     |0.2803|±  |0.0045|
+|         |       |none  |    10|acc_norm|0.2886|±  |0.0045|
+#### Summary
 ## Model Examination [optional]
+its ok
 ## Environmental Impact
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** A6000
+- **Hours used:** 34.74
+- **Cloud Provider:** n/a
+- **Compute Region** iowa
+- **Carbon Emitted:** 4.5kg CO2eq.
 ## Technical Specifications [optional]
 ### Model Architecture and Objective
+mistral, causal language modelling
 ### Compute Infrastructure
+what
 #### Hardware
+lambda vector 2xA6000
 #### Software
+huggingface transformers / pytorch / custom trainer
 ## Citation [optional]