bradduy commited on
Commit
aa63a75
·
verified ·
1 Parent(s): d91a75b

Upload blur-detector checkpoints (exp29, exp45, exp42) + model card

Browse files
README.md ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - image-classification
6
+ - blur-detection
7
+ - image-quality
8
+ - mobilenet
9
+ - efficientnet
10
+ - magika
11
+ pipeline_tag: image-classification
12
+ datasets:
13
+ - GoPro-Large
14
+ metrics:
15
+ - f1
16
+ - accuracy
17
+ - roc_auc
18
+ ---
19
+
20
+ # Magika Blur Detector
21
+
22
+ A **Magika-inspired lightweight blur detector** — a fast image quality gate that
23
+ classifies images as `sharp`, `blurred`, or `uncertain`. Trained on the GoPro
24
+ Large dataset (paired sharp / motion-blurred frames).
25
+
26
+ > "Is this photo sharp enough to OCR / archive / show the user?" — answered in
27
+ > a few milliseconds on CPU with a < 20 MB model.
28
+
29
+ ## Results (GoPro Large official test split)
30
+
31
+ | Rank | Checkpoint | Backbone | Res | F1 | Accuracy | Precision | Recall | AUC |
32
+ |---|---|---|---:|---:|---:|---:|---:|---:|
33
+ | 1 | `mobilenet_v3_large_384_gamma` + multi-scale TTA | MobileNetV3-Large | 320/384/448 | **0.9745** | 0.9748 | 0.9880 | 0.9613 | 0.9979 |
34
+ | 2 | `mobilenet_v3_large_384_gamma` (single-scale) | MobileNetV3-Large | 384 | 0.9722 | - | - | - | - |
35
+ | 3 | `mobilenet_v3_large_384_wd` | MobileNetV3-Large | 384 | 0.9597 | - | - | - | - |
36
+ | 4 | `efficientnet_b0_384` | EfficientNet-B0 | 384 | ~0.965 | - | - | - | - |
37
+ | - | Baseline (MNV3-Small 128px) | MobileNetV3-Small | 128 | 0.8188 | 0.8218 | 0.8326 | 0.8056 | 0.9061 |
38
+
39
+ **46 experiments across 8 sweeps** were run to reach this. See the
40
+ [project repository](https://github.com/bradduy/MagikaDocumentFromPixel) for the
41
+ full autoresearch log.
42
+
43
+ ## Files
44
+
45
+ | Path | What |
46
+ |---|---|
47
+ | `mobilenet_v3_large_384_gamma/best.pt` | **Recommended.** Champion checkpoint. PyTorch `state_dict` for torchvision `mobilenet_v3_large` with the final classifier re-initialized to 2 classes. |
48
+ | `mobilenet_v3_large_384_wd/best.pt` | Regularized sibling (wd=5e-3). Diversifies well with the champion for ensembling. |
49
+ | `efficientnet_b0_384/best.pt` | EfficientNet-B0 alternative. |
50
+
51
+ All checkpoints are raw `state_dict`s — no optimizer / scheduler state. Load
52
+ them into a torchvision backbone with the final classifier layer swapped to
53
+ `Linear(in_features, 2)`:
54
+
55
+ ```python
56
+ import torch
57
+ import torchvision.models as tvm
58
+ from huggingface_hub import hf_hub_download
59
+
60
+ ckpt = hf_hub_download(
61
+ repo_id="bradduy/magika-blur-detector",
62
+ filename="mobilenet_v3_large_384_gamma/best.pt",
63
+ )
64
+
65
+ model = tvm.mobilenet_v3_large(weights=None)
66
+ in_features = model.classifier[-1].in_features
67
+ model.classifier[-1] = torch.nn.Linear(in_features, 2)
68
+ model.load_state_dict(torch.load(ckpt, map_location="cpu"))
69
+ model.eval()
70
+ ```
71
+
72
+ ## Inference
73
+
74
+ Classes: `0 = sharp`, `1 = blurred`. The project applies a confidence threshold
75
+ (default `0.60`) and routes low-confidence predictions to an `uncertain` bucket
76
+ for human review.
77
+
78
+ ```python
79
+ from PIL import Image
80
+ import torch
81
+ from torchvision import transforms
82
+
83
+ preprocess = transforms.Compose([
84
+ transforms.Resize(384),
85
+ transforms.CenterCrop(384),
86
+ transforms.ToTensor(),
87
+ transforms.Normalize(mean=[0.485, 0.456, 0.406],
88
+ std=[0.229, 0.224, 0.225]),
89
+ ])
90
+
91
+ img = Image.open("photo.jpg").convert("RGB")
92
+ x = preprocess(img).unsqueeze(0)
93
+
94
+ with torch.inference_mode():
95
+ logits = model(x)
96
+ probs = logits.softmax(dim=-1)[0]
97
+
98
+ blur_prob = probs[1].item()
99
+ if blur_prob >= 0.60:
100
+ label = "blurred"
101
+ elif blur_prob <= 0.40:
102
+ label = "sharp"
103
+ else:
104
+ label = "uncertain"
105
+ print(label, f"(p(blur)={blur_prob:.3f})")
106
+ ```
107
+
108
+ ### Multi-scale TTA (free +0.23% F1)
109
+
110
+ The champion checkpoint hits F1 0.9745 (vs 0.9722 single-scale) by averaging
111
+ softmax probabilities across resolutions 320, 384, 448:
112
+
113
+ ```python
114
+ scales = [320, 384, 448]
115
+ probs_sum = 0
116
+ for s in scales:
117
+ t = transforms.Compose([
118
+ transforms.Resize(s),
119
+ transforms.CenterCrop(s),
120
+ transforms.ToTensor(),
121
+ transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
122
+ ])
123
+ with torch.inference_mode():
124
+ probs_sum = probs_sum + model(t(img).unsqueeze(0)).softmax(dim=-1)
125
+ probs = probs_sum / len(scales)
126
+ ```
127
+
128
+ ## Training
129
+
130
+ - **Dataset**: GoPro Large, paired sharp / motion-blurred frames (Strategy A labeling).
131
+ - **Augmentation**: random crop, horizontal flip, mild brightness/contrast jitter (±0.2).
132
+ - **blur_gamma**: the champion uses the "gamma" variant of GoPro Large which ships
133
+ a second set of synthetically blurred frames from the same sharp sources,
134
+ effectively doubling the blurred class. Worth a consistent +1% F1.
135
+ - **Optimizer**: AdamW, lr=1e-4, weight_decay=1e-4 (5e-3 for the `_wd` variant).
136
+ - **Schedule**: CosineAnnealingLR, 25 epochs, early stopping on val F1.
137
+ - **Loss**: cross-entropy (label smoothing and focal loss were tried and hurt).
138
+ - **Hardware**: single GPU (GCP VM).
139
+
140
+ ### Key findings from the sweep
141
+
142
+ 1. **Resolution is the biggest lever.** 128 -> 384 px moved F1 from 0.82 to 0.97.
143
+ MobileNetV3-Large helps *only* at >=384 px; at 160-320 it was neutral vs Small.
144
+ 2. **blur_gamma extra data** gives a consistent +1% F1.
145
+ 3. **Multi-scale TTA is free money** — +0.23% F1 with no retraining.
146
+ 4. **Ensembles and threshold tuning** are marginal at this ceiling; the
147
+ single-model champion is already ~1% below the multi-scale TTA ceiling.
148
+ 5. **Training past 25 epochs at 384 px overfits.**
149
+
150
+ ## Intended use
151
+
152
+ - **Document / receipt capture**: reject blurred shots before OCR.
153
+ - **Archival pipelines**: flag low-quality frames for re-capture.
154
+ - **Smartphone camera apps**: real-time shutter-hint / auto-retake.
155
+
156
+ Not designed to distinguish motion blur vs defocus vs low-light noise — it's a
157
+ binary gate trained specifically on motion blur from the GoPro dataset.
158
+
159
+ ## Citation
160
+
161
+ ```
162
+ @misc{magika-blur-detector,
163
+ author = {Duy Tran Thanh (Brad Duy)},
164
+ title = {Magika Blur Detector},
165
+ year = {2026},
166
+ howpublished = {\url{https://huggingface.co/bradduy/magika-blur-detector}},
167
+ }
168
+ ```
169
+
170
+ ## License
171
+
172
+ MIT. See the project repository for details.
efficientnet_b0_384/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b17779c640c7704b12f1e390469027421bb140857e3b810c6f40a93be8cf331
3
+ size 16319717
mobilenet_v3_large_384_gamma/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fea1314291ac90e12d2c3343f51e0dd4f8b7f7d5b180ab53084aa077b7ef634d
3
+ size 17011669
mobilenet_v3_large_384_wd/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecedd7c7fd1a9e5e1289e5bfc2f4da5e6790d197466634cfb70a59191121517e
3
+ size 17011669