Qwen 1.5B Book Rarity Detector (GRPO Fine-tuned)

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct using GRPO (Group Relative Policy Optimization) for improved reasoning about book rarity and marketplace value.

🎯 Training Objective

The model was trained to:

  • Detect and correct foreign language bias in rarity assessments
  • Provide structured reasoning about book value (holdings, tier, language, age)
  • Make nuanced classifications (HIGH_INTEREST, PROMISING, LOW_INTEREST, ELIMINATE)
  • Explain decisions with step-by-step analysis

πŸ“Š Training Results

  • Training Method: GRPO with Unsloth
  • Base Model: Qwen 2.5 1.5B Instruct
  • Training Data: 1,602 book classification examples with corrected reasoning
  • Reward Improvement: +67% (1.86 β†’ 3.17)
  • Key Achievement: Successfully learned to identify foreign language bias in rare book detection

πŸš€ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ambrosfitz/qwen-1.5b-book-rarity-grpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = """Analyze this book for rarity and marketplace value. Provide step-by-step reasoning.

Title: First Edition Book Title
Author: Author Name
Year: 1990
Holdings: 5 libraries
Tier: 2
Thesis: 0
Gov Doc: 0

Think through: holdings, language, document type, age, and rarity tier."""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ“ˆ Training Metrics

Metric Start Final Improvement
Reward 1.86 3.17 +67%
Reward Std 0.99 0.53 -46% (more stable)
KL Divergence 0.001 0.013 Controlled

πŸŽ“ Model Capabilities

Structured Reasoning

The model provides analysis across multiple dimensions:

  1. Holdings Analysis - Library availability assessment
  2. Language Detection - Identifies foreign language bias
  3. Document Type - Recognizes theses, gov docs, etc.
  4. Age Factor - Historical context and value
  5. Rarity Tier - Interprets scarcity indicators

Key Improvements Over Base Model

  • βœ… Foreign Language Detection: Correctly identifies non-English titles and adjusts rarity assessment
  • βœ… Nuanced Classifications: Avoids automatic HIGH_INTEREST for 0-holding foreign books
  • βœ… Explainable AI: Provides reasoning chain for every decision
  • βœ… Consistent Output: Lower variance in reward scores (0.99 β†’ 0.53)

πŸ”§ Training Configuration

  • LoRA r: 16
  • LoRA alpha: 16
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Learning rate: 5e-6
  • Batch size: 4 (effective 16 with gradient accumulation)
  • GRPO beta: 0.1
  • Training steps: 360
  • Quantization: 4-bit with Unsloth optimizations

πŸ“š Use Cases

  • Book marketplace valuation
  • Library collection assessment
  • Rare book identification
  • Automated book triage for resellers
  • Detection of common vs. rare editions

⚠️ Limitations

  • Trained primarily on English-language library data
  • Best for books with WorldCat holdings data
  • May need adjustment for specialized collections (art books, music scores, etc.)
  • 256 token generation limit in training

πŸ“„ License

Apache 2.0 (inherits from Qwen 2.5 base model)

πŸ™ Acknowledgments

  • Built with Unsloth for optimized training
  • Uses TRL for GRPO implementation
  • Based on Qwen 2.5 by Alibaba Cloud

πŸ“§ Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
7
Safetensors
Model size
2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ambrosfitz/qwen-1.5b-book-rarity-grpo

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1382)
this model