ai-phone-leaderboard / docs /ranking_system.md
agh123's picture
feat(scoring): use model size as direct multiplier
19c7047

A newer version of the Streamlit SDK is available: 1.42.0

Upgrade

Device Ranking System

Overview

The ranking system implements a multi-dimensional approach to evaluate and compare device performance across different aspects of LLM (GGUF) model runs.

Scoring Algorithm

Standard Benchmark Conditions

PP_CONFIG = 512  # Standard prompt processing token count
TG_CONFIG = 128  # Standard token generation count

# Component Weights
TG_WEIGHT = 0.6  # Token generation weight (60%) 
PP_WEIGHT = 0.4  # Prompt processing weight (40%)
  • PP given 40% weight as it's a one-time cost per prompt
  • TG given higher weight (60%) as it represents ongoing performance

Quantization Quality Factors

QUANT_TIERS = {
    "F16": 1.0,
    "F32": 1.0,
    "Q8": 0.8,
    "Q6": 0.6,
    "Q5": 0.5,
    "Q4": 0.4,
    "Q3": 0.3,
    "Q2": 0.2, 
    "Q1": 0.1, 
}
  • Linear scale from 0.1 to 1.0 based on quantization level
  • F16/F32 are considered 1.0 (this skews the results a bit towards quantization)

Performance Score Formula

The final performance score is calculated as follows:

  1. Base Performance:

    base_score = (TG_speed * TG_WEIGHT + PP_speed * PP_WEIGHT)
    
  2. Size and Quantization Adjustment:

    # Direct multiplication by model size (in billions)
    performance_score = base_score * model_size * quant_factor
    
    • Linear multiplier by model size
  3. Normalization:

    normalized_score = (performance_score / max_performance_score) * 100
    

Filtering

  • Only benchmarks matching standard conditions are considered:
    • PP_CONFIG (512) tokens for prompt processing
    • TG_CONFIG (128) tokens for token generation

Data Aggregation Strategy

Primary Grouping

  • Groups data by Normalized Device ID and Platform
  • Uses normalized device IDs to ensure consistent device identification across different submissions
def normalize_device_id(device_info: dict) -> str:
    if device_info["systemName"].lower() == "ios":
        return f"iOS/{device_info['model']}"

    memory_tier = f"{device_info['totalMemory'] // (1024**3)}GB"
    return f"{device_info['brand']}/{device_info['model']}/{memory_tier}"