(discord https://discord.gg/DUzP7CXqJt)

License

This model is licensed under the MIT License.

ViorikaLM-CHAT

🚧 Experimental Under-Training Model (~250M parameters) based on a custom 12-layer/12-head Transformer architecture.
Primarily supports English πŸ‡¬πŸ‡§. This is my first model.

πŸ“– Description

ViorikaLM-CHAT is an experimental generative language model designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: from implementing the architecture and training from scratch to uploading models to the Hugging Face Hub.

βš™οΈ Model Details

  • Architecture: Custom Transformer Decoder (12 layers, 12 attention heads)
  • Model size: ~250M parameters #
  • Training Approach: Pre-trained from scratch on WikiText
  • Languages: Primarily English
  • License: MIT

πŸ‹οΈ Training Details

  • Dataset: wikitext-103-raw-v1 (or similar WikiText format)
  • Hardware: Single NVIDIA GTX 1070 (8GB VRAM)
  • Training Status: Very early checkpoint (Under-trained)
  • Epochs: 2
  • Batch size: 8
  • Optimizer: Adam, lr = 3e-4
  • Max sequence length: 128 tokens

πŸš€ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ViorikaAI/ViorikaLM-CHAT"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ViorikaAI/ViorikaLM-CHAT