File size: 2,154 Bytes
19c7047
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Device Ranking System

## Overview
The ranking system implements a multi-dimensional approach to evaluate and compare device performance across different aspects of LLM (GGUF) model runs. 

## Scoring Algorithm

### Standard Benchmark Conditions
```python
PP_CONFIG = 512  # Standard prompt processing token count
TG_CONFIG = 128  # Standard token generation count

# Component Weights
TG_WEIGHT = 0.6  # Token generation weight (60%) 
PP_WEIGHT = 0.4  # Prompt processing weight (40%)
```
- PP given 40% weight as it's a one-time cost per prompt
- TG given higher weight (60%) as it represents ongoing performance

### Quantization Quality Factors
```python
QUANT_TIERS = {
    "F16": 1.0,
    "F32": 1.0,
    "Q8": 0.8,
    "Q6": 0.6,
    "Q5": 0.5,
    "Q4": 0.4,
    "Q3": 0.3,
    "Q2": 0.2, 
    "Q1": 0.1, 
}
```

- Linear scale from 0.1 to 1.0 based on quantization level
- F16/F32 are considered 1.0 (this skews the results a bit towards quantization)


### Performance Score Formula
The final performance score is calculated as follows:

1. **Base Performance**:
   ```
   base_score = (TG_speed * TG_WEIGHT + PP_speed * PP_WEIGHT)
   ```

2. **Size and Quantization Adjustment**:
   ```
   # Direct multiplication by model size (in billions)
   performance_score = base_score * model_size * quant_factor
   ```
   - Linear multiplier by model size

3. **Normalization**:
   ```
   normalized_score = (performance_score / max_performance_score) * 100
   ```

### Filtering
- Only benchmarks matching standard conditions are considered:
  - PP_CONFIG (512) tokens for prompt processing
  - TG_CONFIG (128) tokens for token generation

## Data Aggregation Strategy

### Primary Grouping
- Groups data by `Normalized Device ID` and `Platform`
- Uses normalized device IDs to ensure consistent device identification across different submissions

```python
def normalize_device_id(device_info: dict) -> str:
    if device_info["systemName"].lower() == "ios":
        return f"iOS/{device_info['model']}"

    memory_tier = f"{device_info['totalMemory'] // (1024**3)}GB"
    return f"{device_info['brand']}/{device_info['model']}/{memory_tier}"
```