SmolLM-135M Fine-tuned on Dostoyevsky

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M on a curated dataset of Fyodor Dostoyevsky's major works. The model has been trained to generate text in the distinctive style of the Russian literary master.

Model Details

Model Description

  • Developed by: satyapratheek
  • Model type: Causal Language Model
  • Language(s): English
  • License: MIT
  • Finetuned from model: HuggingFaceTB/SmolLM-135M

Dataset

The model was trained on a custom dataset consisting of four major works by Fyodor Dostoyevsky:

  • Crime and Punishment (Project Gutenberg #2554)
  • The Brothers Karamazov (Project Gutenberg #28054)
  • The Idiot
  • Notes from the Underground (Project Gutenberg #600)

Dataset Statistics:

  • Total chunks: 6,217 text segments
  • Average chunk length: 512 tokens
  • All texts are public domain English translations

Training Details

Training Data

The dataset was preprocessed using the following pipeline:

  1. Raw texts cleaned with gutenberg-cleaner to remove headers/footers
  2. Text normalization with ftfy and unidecode
  3. Chunking into 512-token segments
  4. Filtering for substantial paragraphs (>200 characters)

Training Procedure

Training Hardware:

  • Device: Apple MacBook Air M1 (8GB unified memory)
  • Compute: Apple Metal Performance Shaders (MPS)
  • Memory Usage: Peak ~6GB unified memory

Training Hyperparameters:

  • Training regime: LoRA (Low-Rank Adaptation)
  • LoRA rank: 8
  • LoRA alpha: 16
  • LoRA dropout: 0.05
  • Epochs: 3
  • Batch size: 2 (per device)
  • Gradient accumulation steps: 4
  • Effective batch size: 8
  • Learning rate: 2e-4
  • Optimizer: AdamW
  • Learning rate scheduler: Linear decay
  • Max sequence length: 512 tokens

Training Results:

  • Total training time: 4 hours, 56 minutes, 35 seconds
  • Training steps: 2,334
  • Final training loss: 3.254
  • Training samples per second: 1.048
  • Trainable parameters: 460,800 (LoRA adapters only)

Framework Versions

  • Transformers: 4.53.0
  • PyTorch: Latest with MPS support
  • PEFT: Latest
  • Datasets: Latest

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("satyapratheek/smollm-dostoyevsky")
tokenizer = AutoTokenizer.from_pretrained("satyapratheek/smollm-dostoyevsky")

# Generate text
prompt = "The man walked through the streets of St. Petersburg, contemplating"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.8,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_text = tokenizer.decode(outputs, skip_special_tokens=True)
print(generated_text)

Model Performance

The model demonstrates strong adaptation to Dostoyevsky's writing style, including:

  • Philosophical depth: Captures the existential and psychological themes
  • Character introspection: Generates internal monologues characteristic of Dostoyevsky's protagonists
  • Russian cultural context: Maintains appropriate historical and cultural references
  • Narrative complexity: Preserves the multi-layered storytelling approach

Limitations and Biases

  • Time period bias: Reflects 19th-century perspectives and social norms
  • Translation artifacts: Based on English translations, may not capture original Russian nuances
  • Dataset scope: Limited to four major works, may not represent Dostoyevsky's complete style evolution
  • Model size: As a 135M parameter model, has limited capacity compared to larger language models

Ethical Considerations

This model is trained exclusively on public domain texts and is intended for:

  • Educational purposes
  • Creative writing assistance
  • Literary style analysis
  • Research into author-specific language patterns

Users should be aware that the model may generate content reflecting historical perspectives that may not align with contemporary values.

Citation

@misc{smollm-dostoyevsky-2025,
  author = {satyapratheek},
  title = {SmolLM-135M Fine-tuned on Dostoyevsky},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/satyapratheek/smollm-dostoyevsky}
}

Acknowledgments

  • Base model: HuggingFaceTB/SmolLM-135M
  • Dataset source: Project Gutenberg
  • Training framework: Hugging Face Transformers with PEFT
  • Hardware: Apple M1 MacBook Air (8GB)

This model was trained as part of a fine-tuning experiment to explore author-style adaptation using efficient training methods on consumer hardware.

Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for satyapratheek/smollm-dostoyevsky

Adapter
(9)
this model