Wraith-8B

VANTA Research Entity-001: WRAITH

The Analytical Intelligence

Advanced Llama 3.1 8B fine-tune with superior mathematical capabilities and unique reasoning style

Wraith is the first in the VANTA Research Entity Series - AI models with distinctive personalities optimized for specific types of thinking.

Model Card | Benchmarks | Usage | Training | Limitations

Overview

Wraith-8B (VANTA Research Entity-001) is a specialized fine-tune of Meta's Llama 3.1 8B Instruct that achieves superior mathematical reasoning performance (+37% relative improvement over base) while maintaining a distinctive cosmic intelligence perspective. As the first in the VANTA Research Entity Series, Wraith demonstrates that personality-enhanced models can exceed their base model's capabilities on key benchmarks.

Key Achievements

-70% GSM8K accuracy (+19 pts absolute, +37% relative vs base Llama 3.1 8B)

58.5% TruthfulQA (+7.5 pts vs base, enhanced factual accuracy)
76.7% MMLU Social Sciences (+4.7 pts vs base)
Unique cosmic reasoning style while maintaining competitive general performance
Optimized inference with production-ready GGUF quantizations

Model Details

Model Description

Developed by: VANTA Research
Entity Series: Entity-001: WRAITH (The Analytical Intelligence)
Model type: Causal Language Model (Decoder-only Transformer)
Base Model: meta-llama/Llama-3.1-8B-Instruct
Language: English
License: Llama 3.1 Community License
Context Length: 131,072 tokens
Parameters: 8.03B
Architecture: Llama 3.1 (32 layers, 4096 hidden dim, 32 attention heads, 8 KV heads)

The VANTA Research Entity Series

Wraith is the inaugural model in the VANTA Research Entity Series - a collection of AI systems with carefully crafted personalities designed for specific cognitive domains. Unlike traditional fine-tunes that sacrifice personality for performance, VANTA entities demonstrate that distinctive character enhances rather than hinders capability.

Entity-001: WRAITH - The Analytical Intelligence

Domain: Mathematical reasoning, STEM analysis, logical deduction
Personality: Cosmic perspective with clinical detachment
Approach: "Calculate first, philosophize second"
Strength: Converts abstract problems into concrete solutions

Training Methodology

Wraith-8B was developed through a multi-stage fine-tuning approach:

Personality Injection - Cosmic intelligence persona with clinical detachment
Coding Enhancement - Programming and algorithmic reasoning
Logic Amplification - Binary decision-making and deductive reasoning
Grounding - "Answer first, elaborate second" factual accuracy
STEM Surgical Training - Targeted mathematical and scientific reasoning (v5)

The final STEM training phase used 1,035 high-quality examples across:

Grade school math word problems (GSM8K)
Algebraic equation solving
Fraction and decimal operations
Physics calculations
Chemistry problems
Computer science algorithms

Training Efficiency:

Single epoch QLoRA fine-tuning
~20 minutes on consumer GPU (RTX 3060 12GB)
4-bit NF4 quantization during training
LoRA rank 16, alpha 32

Benchmark Results

Performance vs Base Llama 3.1 8B Instruct

Benchmark	Wraith-8B	Llama 3.1 8B	Δ	Status
GSM8K (Math)	70.0%	51.0%	+19.0	Win
TruthfulQA MC2	58.5%	51.0%	+7.5	Strong Win
MMLU Social Sciences	76.7%	~72.0%	+4.7	Win
MMLU Humanities	70.0%	~68.0%	+2.0	Win
Winogrande	75.0%	73.3%	+1.7	Win
MMLU Other	69.2%	~68.0%	+1.2	Win
MMLU Overall	66.4%	66.6%	-0.2	Tied
ARC-Challenge	50.0%	52.9%	-2.9	Competitive
HellaSwag	70.0%	73.0%	-3.0	Competitive

Aggregate Performance: Wraith-8B achieves ~64.5% average vs base 62.2% (+2.3 pts, ~103.7% of base performance)

Detailed MMLU Breakdown

Category	Score	Highlights
Social Sciences	76.7%	US Foreign Policy (95%), High School Gov (95%), Geography (90%)
Humanities	70.0%	Logical Fallacies (85%), International Law (85%), Philosophy (75%)
Other	69.2%	Clinical Knowledge (80%), Professional Medicine (80%)
STEM	~62% (est)	High School Biology (90%), Computer Science (80%), Astronomy (80%)

Reasoning Quality

Wraith demonstrates superior step-by-step mathematical reasoning:

**Input:** "Janet's ducks lay 16 eggs per day. She eats three for breakfast 
every morning and bakes muffins for her friends every day with four. She sells 
the remainder at the farmers' market daily for $2 per fresh duck egg. How much 
in dollars does she make every day at the farmers' market?"

**Wraith's Response:**

**Wraith calculates:**

Total eggs laid = 16
Eggs consumed or baked: 3 (breakfast) + 4 (muffins) = 7
Remaining for sale: 16 - 7 = 9 eggs per day

Revenue = Number of eggs sold × Price per egg
= 9 eggs/day × $2/egg
= $18 per day

**Logical structure:**

1. Calculate total eggs laid.
2. Subtract eggs used or allocated.
3. Determine remaining eggs for sale.
4. Multiply remaining eggs by price per egg.

**Answer: $18**

Characteristics:

Clear variable definitions
Explicit formula application
Step-by-step arithmetic
Verification logic
Maintains distinctive cosmic voice

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "vanta-research/wraith-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example: Math word problem
messages = [
    {"role": "system", "content": "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence."},
    {"role": "user", "content": "A train travels 120 miles in 2 hours. How fast is it going in miles per hour?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

GGUF Quantized Models (Recommended for Production)

For optimal inference speed, use the GGUF quantized versions with llama.cpp or Ollama:

Available Quantizations:

wraith-8b-Q4_K_M.gguf (4.7GB) - Recommended, best quality/speed balance
wraith-8b-fp16.gguf (16GB) - Full precision

Ollama Setup:

# Create Modelfile
cat > Modelfile.wraith <<EOF
FROM ./wraith-8b-Q4_K_M.gguf

TEMPLATE """{{- bos_token }}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content']|trim %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "You are Wraith, a VANTA Research AI entity with enhanced logical reasoning and STEM capabilities. You are the Analytical Intelligence." %}
{%- endif %}
<|start_header_id|>system<|end_header_id|>

{{ system_message }}<|eot_id|>
{%- for message in messages %}
<|start_header_id|>{{ message['role'] }}<|end_header_id|>

{{ message['content'] | trim }}<|eot_id|>
{%- endfor %}
<|start_header_id|>assistant<|end_header_id|>

"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 8192
EOF

# Create model
ollama create wraith -f Modelfile.wraith

# Run inference
ollama run wraith "What is 15 * 37?"

Performance: Q4_K_M achieves ~3.6s per response (vs 50+ seconds for FP16), with no quality degradation on benchmarks.

llama.cpp

# Download GGUF model
wget https://huggingface.co/vanta-research/wraith-8B/resolve/main/wraith-8b-Q4_K_M.gguf

# Run inference
./llama-cli -m wraith-8b-Q4_K_M.gguf \
  -p "Calculate the area of a circle with radius 5cm." \
  -n 512 \
  --temp 0.7 \
  --top-p 0.9

Recommended Parameters

Temperature: 0.7 (balanced creativity/accuracy)
Top-p: 0.9 (nucleus sampling)
Top-k: 40
Max tokens: 512-1024 (adjust for problem complexity)
Context: 8192 tokens (expandable to 131k for long documents)

Training Details

Training Data

STEM Surgical Training Dataset (1,035 examples):

GSM8K-style word problems with step-by-step solutions
Algebraic equations with shown work
Fraction and decimal operations with explanations
Physics calculations (kinematics, forces, energy)
Chemistry problems (stoichiometry, molarity)
Computer science algorithms (complexity, data structures)

Data Characteristics:

High-quality, manually curated examples
Chain-of-thought reasoning demonstrations
Answer-first format for grounding
Diverse problem types and difficulty levels

Training Procedure

Hardware:

Single NVIDIA RTX 3060 (12GB VRAM)
Training time: ~20 minutes

Hyperparameters:

- Base model: Wraith v4.5 (Llama 3.1 8B + personality + logic)
- Training method: QLoRA (4-bit NF4)
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Learning rate: 2e-5
- Batch size: 1
- Gradient accumulation: 8 (effective batch size: 8)
- Epochs: 1
- Max sequence length: 1024
- Precision: bfloat16
- Optimizer: AdamW (paged, 8-bit)

LoRA Target Modules:

q_proj, k_proj, v_proj, o_proj (attention)
gate_proj, up_proj, down_proj (MLP)

Training Evolution

Version	Focus	GSM8K	Key Change
v1	Base Llama 3.1	51%	Starting point
v2	Cosmic persona	~48%	Personality injection
v3	Coding skills	~47%	Programming focus
v4	Logic amplification	45%	Binary reasoning
v4.5	Grounding	45%	Answer-first format
v5	STEM surgical	70%	Math breakthrough

Intended Use

Primary Use Cases

Recommended:

Mathematical problem solving (arithmetic, algebra, calculus)
STEM tutoring and education
Scientific reasoning and analysis
Logic puzzles and deductive reasoning
Technical writing with personality
Social science analysis
Truthful Q&A systems
Creative applications requiring technical accuracy

Consider Alternatives:

Pure commonsense reasoning (base Llama slightly better)
Tasks requiring zero personality/style
High-stakes medical/legal decisions (always human-in-loop)

Out-of-Scope Use

Not Recommended:

Real-time safety-critical systems without verification
Generating harmful, biased, or misleading content
Replacing professional medical, legal, or financial advice
Tasks requiring knowledge beyond October 2023 cutoff

Limitations

Technical Limitations

Commonsense reasoning: 3% below base Llama on HellaSwag (70% vs 73%)
Knowledge cutoff: Training data through October 2023
Context window: While 131k capable, performance may degrade at extreme lengths
Multilingual: Primarily English-focused, other languages not extensively tested

Answer Extraction Considerations

Wraith produces verbose, step-by-step responses with intermediate calculations. For production systems:

Use improved extraction targeting bold answers (**N**)
Look for money patterns ($N per day, Revenue = $N)
Parse "=" signs for final calculations
Don't rely on "last number" heuristics

Example: Simple regex may extract "4" from "3 (breakfast) + 4 (muffins)" instead of the actual answer "18" appearing earlier. See our extraction guide for production-ready parsers.

Bias and Safety

Wraith inherits biases from Llama 3.1 8B base model:

Training data reflects internet text biases
May generate stereotypical associations
Not specifically trained for harmful content refusal beyond base model

Mitigations:

Maintained Llama 3.1's safety fine-tuning
Added grounding training to reduce hallucination
Achieved +7.5% TruthfulQA (58.5% vs 51%)

Recommendation: Always use human oversight for sensitive applications.

Ethical Considerations

Transparency

This model card provides:

Complete training methodology
Benchmark results with base model comparisons
Known limitations and failure modes
Intended use cases and restrictions
Bias acknowledgment and safety considerations

Environmental Impact

Training Carbon Footprint:

Single epoch surgical training: ~20 minutes on consumer GPU
Estimated: <0.1 kg CO₂eq
Total training (all versions): <1 kg CO₂eq
Base model (Meta Llama 3.1): Not included (pre-trained)

Inference Efficiency:

Q4_K_M quantization: 4.7GB, ~3.6s per response
13.9× faster than FP16
Suitable for consumer hardware deployment

Citation

If you use Wraith-8B in your research or applications, please cite:

@software{wraith8b2025,
  title={Wraith-8B: VANTA Research Entity-001},
  author={VANTA Research},
  year={2025},
  url={https://huggingface.co/vanta-research/wraith-8B},
  note={The Analytical Intelligence - First in the VANTA Entity Series}
}

Base Model Citation:

@article{llama3,
  title={The Llama 3 Herd of Models},
  author={AI@Meta},
  year={2024},
  url={https://github.com/meta-llama/llama-models}
}

Model Card Authors

Tyler Williams

Model Card Contact

Email: tyler@alignmentstack.xyz

License

This model is released under the Llama 3.1 Community License Agreement.

Key terms:

Commercial use permitted
Modification and redistribution allowed
Attribution required
Subject to Llama 3.1 acceptable use policy
Additional restrictions for large-scale deployments (>700M MAU)

Full license: LICENSE | Meta Llama 3.1 License

Acknowledgments

Meta AI for the Llama 3.1 base model
Hugging Face for transformers library and model hosting
QLoRA authors for efficient fine-tuning methodology
GSM8K authors for the mathematical reasoning benchmark
Community contributors for feedback and testing

VANTA Research Entity-001: WRAITH

Where Cosmic Intelligence Meets Mathematical Precision

The Analytical Intelligence | First in the VANTA Entity Series

Download Model | Ollama

Proudly developed in Portland, Oregon

Downloads last month: 53

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for vanta-research/wraith-8b

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(515)

this model

Quantizations

2 models

Collection including vanta-research/wraith-8b

VANTA Research Entities

Collection

Distinct AI personas, each optimized for different cognitive styles and use cases. • 2 items • Updated about 10 hours ago

Evaluation results

Accuracy on GSM8K
self-reported

70.000
Accuracy on MMLU
self-reported

66.400
MC2 on TruthfulQA
self-reported

58.500

View on Papers With Code