Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: me
  • Model type: Mistral
  • Language(s) (NLP): en
  • License: apache

Uses

general web text completions at extremely low resource use

Out-of-Scope Use

not an instruct model

Bias, Risks, and Limitations

trained on web text, though filtered no guarantees theres not toxic stuff in there

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral")
tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral")

inputs = tokenizer(["Once upon a time,"], return_tensors="pt")
inputs = {k:v.to(model.device) for k,v in dict(inputs).items()}
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True)
outputs = tokenizer.batch_decode(outputs)
for i in outputs:
  print(i)

Training Details

Training Data

crumb/askmistral-pile-2-15

Training Procedure

Parameter Value
Context Length 2048
Batch Size 128
Learning Rate 6e-4
Scheduler One-Cycle
Adam eps 1e-8
Adam beta1 0.9
Adam beta2 0.95
Weight Decay 0.1
Max Grad Norm 1.0
Optimizer adamw_torch
Tokens 3,401,640,960

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: bf16 non-mixed precision

Speeds, Sizes, Times [optional]

train_runtime 62541.9424

train_samples_per_second 26.557

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

held out set of crumb/askmistral-pile-2-15

Factors

[More Information Needed]

Metrics

open llm leaderboard eval datasets and settings

Results

OpenLLM Leaderboard Mean Score + Stderr: (29.30, 0.42)

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 25 acc 0.1843 ± 0.0113
none 25 acc_norm 0.2167 ± 0.0120
truthfulqa_mc2 2 none 0 acc 0.4719 ± 0.0156
winogrande 1 none 5 acc 0.517 ± 0.014
hellaswag 1 none 10 acc 0.2803 ± 0.0045
none 10 acc_norm 0.2886 ± 0.0045
gsm8k 3 strict-match 5 exact_match 0.0008 ± 0.0008
flexible-extract 5 exact_match 0.0099 ± 0.0027

MMLU

value, stderr = (0.253980701754386, 0.004428598058450528)

Tasks Version Filter n-shot Metric Value Stderr
world_religions 0 none 5 acc 0.2222 ± 0.0319
virology 0 none 5 acc 0.2711 ± 0.0346
us_foreign_policy 0 none 5 acc 0.3300 ± 0.0473
sociology 0 none 5 acc 0.2388 ± 0.0301
security_studies 0 none 5 acc 0.2367 ± 0.0272
public_relations 0 none 5 acc 0.2273 ± 0.0401
professional_psychology 0 none 5 acc 0.2484 ± 0.0175
professional_medicine 0 none 5 acc 0.4596 ± 0.0303
professional_law 0 none 5 acc 0.2464 ± 0.0110
professional_accounting 0 none 5 acc 0.2021 ± 0.0240
prehistory 0 none 5 acc 0.2130 ± 0.0228
philosophy 0 none 5 acc 0.2219 ± 0.0236
nutrition 0 none 5 acc 0.2157 ± 0.0236
moral_scenarios 0 none 5 acc 0.2380 ± 0.0142
moral_disputes 0 none 5 acc 0.2486 ± 0.0233
miscellaneous 0 none 5 acc 0.2516 ± 0.0155
medical_genetics 0 none 5 acc 0.3000 ± 0.0461
marketing 0 none 5 acc 0.2265 ± 0.0274
management 0 none 5 acc 0.1748 ± 0.0376
machine_learning 0 none 5 acc 0.3125 ± 0.0440
logical_fallacies 0 none 5 acc 0.2393 ± 0.0335
jurisprudence 0 none 5 acc 0.2315 ± 0.0408
international_law 0 none 5 acc 0.3140 ± 0.0424
human_sexuality 0 none 5 acc 0.2519 ± 0.0381
human_aging 0 none 5 acc 0.3049 ± 0.0309
high_school_world_history 0 none 5 acc 0.2658 ± 0.0288
high_school_us_history 0 none 5 acc 0.2451 ± 0.0302
high_school_statistics 0 none 5 acc 0.4722 ± 0.0340
high_school_psychology 0 none 5 acc 0.1963 ± 0.0170
high_school_physics 0 none 5 acc 0.3046 ± 0.0376
high_school_microeconomics 0 none 5 acc 0.2773 ± 0.0291
high_school_mathematics 0 none 5 acc 0.2667 ± 0.0270
high_school_macroeconomics 0 none 5 acc 0.2667 ± 0.0224
high_school_government_and_politics 0 none 5 acc 0.2591 ± 0.0316
high_school_geography 0 none 5 acc 0.2424 ± 0.0305
high_school_european_history 0 none 5 acc 0.2242 ± 0.0326
high_school_computer_science 0 none 5 acc 0.2800 ± 0.0451
high_school_chemistry 0 none 5 acc 0.2857 ± 0.0318
high_school_biology 0 none 5 acc 0.3129 ± 0.0264
global_facts 0 none 5 acc 0.1500 ± 0.0359
formal_logic 0 none 5 acc 0.1905 ± 0.0351
elementary_mathematics 0 none 5 acc 0.2513 ± 0.0223
electrical_engineering 0 none 5 acc 0.2759 ± 0.0372
econometrics 0 none 5 acc 0.2456 ± 0.0405
conceptual_physics 0 none 5 acc 0.2638 ± 0.0288
computer_security 0 none 5 acc 0.1800 ± 0.0386
college_physics 0 none 5 acc 0.2549 ± 0.0434
college_medicine 0 none 5 acc 0.2023 ± 0.0306
college_mathematics 0 none 5 acc 0.2900 ± 0.0456
college_computer_science 0 none 5 acc 0.2700 ± 0.0446
college_chemistry 0 none 5 acc 0.2500 ± 0.0435
college_biology 0 none 5 acc 0.2222 ± 0.0348
clinical_knowledge 0 none 5 acc 0.2377 ± 0.0262
business_ethics 0 none 5 acc 0.2100 ± 0.0409
astronomy 0 none 5 acc 0.1776 ± 0.0311
anatomy 0 none 5 acc 0.2593 ± 0.0379
abstract_algebra 0 none 5 acc 0.2200 ± 0.0416

Summary

Model Examination [optional]

its ok

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: A6000
  • Hours used: 34.74
  • Cloud Provider: n/a
  • Compute Region iowa
  • Carbon Emitted: 4.5kg CO2eq.

Technical Specifications [optional]

Model Architecture and Objective

mistral, causal language modelling

Compute Infrastructure

what

Hardware

lambda vector 2xA6000

Software

huggingface transformers / pytorch / custom trainer

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
706
Safetensors
Model size
170M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for crumb/nano-mistral

Quantizations
3 models

Dataset used to train crumb/nano-mistral