Qwen 1.5B Book Rarity Detector (GRPO Fine-tuned)
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct using GRPO (Group Relative Policy Optimization) for improved reasoning about book rarity and marketplace value.
π― Training Objective
The model was trained to:
- Detect and correct foreign language bias in rarity assessments
- Provide structured reasoning about book value (holdings, tier, language, age)
- Make nuanced classifications (HIGH_INTEREST, PROMISING, LOW_INTEREST, ELIMINATE)
- Explain decisions with step-by-step analysis
π Training Results
- Training Method: GRPO with Unsloth
- Base Model: Qwen 2.5 1.5B Instruct
- Training Data: 1,602 book classification examples with corrected reasoning
- Reward Improvement: +67% (1.86 β 3.17)
- Key Achievement: Successfully learned to identify foreign language bias in rare book detection
π Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ambrosfitz/qwen-1.5b-book-rarity-grpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = """Analyze this book for rarity and marketplace value. Provide step-by-step reasoning.
Title: First Edition Book Title
Author: Author Name
Year: 1990
Holdings: 5 libraries
Tier: 2
Thesis: 0
Gov Doc: 0
Think through: holdings, language, document type, age, and rarity tier."""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Training Metrics
| Metric | Start | Final | Improvement |
|---|---|---|---|
| Reward | 1.86 | 3.17 | +67% |
| Reward Std | 0.99 | 0.53 | -46% (more stable) |
| KL Divergence | 0.001 | 0.013 | Controlled |
π Model Capabilities
Structured Reasoning
The model provides analysis across multiple dimensions:
- Holdings Analysis - Library availability assessment
- Language Detection - Identifies foreign language bias
- Document Type - Recognizes theses, gov docs, etc.
- Age Factor - Historical context and value
- Rarity Tier - Interprets scarcity indicators
Key Improvements Over Base Model
- β Foreign Language Detection: Correctly identifies non-English titles and adjusts rarity assessment
- β Nuanced Classifications: Avoids automatic HIGH_INTEREST for 0-holding foreign books
- β Explainable AI: Provides reasoning chain for every decision
- β Consistent Output: Lower variance in reward scores (0.99 β 0.53)
π§ Training Configuration
- LoRA r: 16
- LoRA alpha: 16
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Learning rate: 5e-6
- Batch size: 4 (effective 16 with gradient accumulation)
- GRPO beta: 0.1
- Training steps: 360
- Quantization: 4-bit with Unsloth optimizations
π Use Cases
- Book marketplace valuation
- Library collection assessment
- Rare book identification
- Automated book triage for resellers
- Detection of common vs. rare editions
β οΈ Limitations
- Trained primarily on English-language library data
- Best for books with WorldCat holdings data
- May need adjustment for specialized collections (art books, music scores, etc.)
- 256 token generation limit in training
π License
Apache 2.0 (inherits from Qwen 2.5 base model)
π Acknowledgments
- Built with Unsloth for optimized training
- Uses TRL for GRPO implementation
- Based on Qwen 2.5 by Alibaba Cloud
π§ Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 7