# Device Ranking System ## Overview The ranking system implements a multi-dimensional approach to evaluate and compare device performance across different aspects of LLM (GGUF) model runs. ## Scoring Algorithm ### Standard Benchmark Conditions ```python PP_CONFIG = 512 # Standard prompt processing token count TG_CONFIG = 128 # Standard token generation count # Component Weights TG_WEIGHT = 0.6 # Token generation weight (60%) PP_WEIGHT = 0.4 # Prompt processing weight (40%) ``` - PP given 40% weight as it's a one-time cost per prompt - TG given higher weight (60%) as it represents ongoing performance ### Quantization Quality Factors ```python QUANT_TIERS = { "F16": 1.0, "F32": 1.0, "Q8": 0.8, "Q6": 0.6, "Q5": 0.5, "Q4": 0.4, "Q3": 0.3, "Q2": 0.2, "Q1": 0.1, } ``` - Linear scale from 0.1 to 1.0 based on quantization level - F16/F32 are considered 1.0 (this skews the results a bit towards quantization) ### Performance Score Formula The final performance score is calculated as follows: 1. **Base Performance**: ``` base_score = (TG_speed * TG_WEIGHT + PP_speed * PP_WEIGHT) ``` 2. **Size and Quantization Adjustment**: ``` # Direct multiplication by model size (in billions) performance_score = base_score * model_size * quant_factor ``` - Linear multiplier by model size 3. **Normalization**: ``` normalized_score = (performance_score / max_performance_score) * 100 ``` ### Filtering - Only benchmarks matching standard conditions are considered: - PP_CONFIG (512) tokens for prompt processing - TG_CONFIG (128) tokens for token generation ## Data Aggregation Strategy ### Primary Grouping - Groups data by `Normalized Device ID` and `Platform` - Uses normalized device IDs to ensure consistent device identification across different submissions ```python def normalize_device_id(device_info: dict) -> str: if device_info["systemName"].lower() == "ios": return f"iOS/{device_info['model']}" memory_tier = f"{device_info['totalMemory'] // (1024**3)}GB" return f"{device_info['brand']}/{device_info['model']}/{memory_tier}" ```