Upload blur-detector checkpoints (exp29, exp45, exp42) + model card
Browse files- README.md +172 -0
- efficientnet_b0_384/best.pt +3 -0
- mobilenet_v3_large_384_gamma/best.pt +3 -0
- mobilenet_v3_large_384_wd/best.pt +3 -0
README.md
ADDED
|
@@ -0,0 +1,172 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- image-classification
|
| 6 |
+
- blur-detection
|
| 7 |
+
- image-quality
|
| 8 |
+
- mobilenet
|
| 9 |
+
- efficientnet
|
| 10 |
+
- magika
|
| 11 |
+
pipeline_tag: image-classification
|
| 12 |
+
datasets:
|
| 13 |
+
- GoPro-Large
|
| 14 |
+
metrics:
|
| 15 |
+
- f1
|
| 16 |
+
- accuracy
|
| 17 |
+
- roc_auc
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
# Magika Blur Detector
|
| 21 |
+
|
| 22 |
+
A **Magika-inspired lightweight blur detector** — a fast image quality gate that
|
| 23 |
+
classifies images as `sharp`, `blurred`, or `uncertain`. Trained on the GoPro
|
| 24 |
+
Large dataset (paired sharp / motion-blurred frames).
|
| 25 |
+
|
| 26 |
+
> "Is this photo sharp enough to OCR / archive / show the user?" — answered in
|
| 27 |
+
> a few milliseconds on CPU with a < 20 MB model.
|
| 28 |
+
|
| 29 |
+
## Results (GoPro Large official test split)
|
| 30 |
+
|
| 31 |
+
| Rank | Checkpoint | Backbone | Res | F1 | Accuracy | Precision | Recall | AUC |
|
| 32 |
+
|---|---|---|---:|---:|---:|---:|---:|---:|
|
| 33 |
+
| 1 | `mobilenet_v3_large_384_gamma` + multi-scale TTA | MobileNetV3-Large | 320/384/448 | **0.9745** | 0.9748 | 0.9880 | 0.9613 | 0.9979 |
|
| 34 |
+
| 2 | `mobilenet_v3_large_384_gamma` (single-scale) | MobileNetV3-Large | 384 | 0.9722 | - | - | - | - |
|
| 35 |
+
| 3 | `mobilenet_v3_large_384_wd` | MobileNetV3-Large | 384 | 0.9597 | - | - | - | - |
|
| 36 |
+
| 4 | `efficientnet_b0_384` | EfficientNet-B0 | 384 | ~0.965 | - | - | - | - |
|
| 37 |
+
| - | Baseline (MNV3-Small 128px) | MobileNetV3-Small | 128 | 0.8188 | 0.8218 | 0.8326 | 0.8056 | 0.9061 |
|
| 38 |
+
|
| 39 |
+
**46 experiments across 8 sweeps** were run to reach this. See the
|
| 40 |
+
[project repository](https://github.com/bradduy/MagikaDocumentFromPixel) for the
|
| 41 |
+
full autoresearch log.
|
| 42 |
+
|
| 43 |
+
## Files
|
| 44 |
+
|
| 45 |
+
| Path | What |
|
| 46 |
+
|---|---|
|
| 47 |
+
| `mobilenet_v3_large_384_gamma/best.pt` | **Recommended.** Champion checkpoint. PyTorch `state_dict` for torchvision `mobilenet_v3_large` with the final classifier re-initialized to 2 classes. |
|
| 48 |
+
| `mobilenet_v3_large_384_wd/best.pt` | Regularized sibling (wd=5e-3). Diversifies well with the champion for ensembling. |
|
| 49 |
+
| `efficientnet_b0_384/best.pt` | EfficientNet-B0 alternative. |
|
| 50 |
+
|
| 51 |
+
All checkpoints are raw `state_dict`s — no optimizer / scheduler state. Load
|
| 52 |
+
them into a torchvision backbone with the final classifier layer swapped to
|
| 53 |
+
`Linear(in_features, 2)`:
|
| 54 |
+
|
| 55 |
+
```python
|
| 56 |
+
import torch
|
| 57 |
+
import torchvision.models as tvm
|
| 58 |
+
from huggingface_hub import hf_hub_download
|
| 59 |
+
|
| 60 |
+
ckpt = hf_hub_download(
|
| 61 |
+
repo_id="bradduy/magika-blur-detector",
|
| 62 |
+
filename="mobilenet_v3_large_384_gamma/best.pt",
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
model = tvm.mobilenet_v3_large(weights=None)
|
| 66 |
+
in_features = model.classifier[-1].in_features
|
| 67 |
+
model.classifier[-1] = torch.nn.Linear(in_features, 2)
|
| 68 |
+
model.load_state_dict(torch.load(ckpt, map_location="cpu"))
|
| 69 |
+
model.eval()
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
## Inference
|
| 73 |
+
|
| 74 |
+
Classes: `0 = sharp`, `1 = blurred`. The project applies a confidence threshold
|
| 75 |
+
(default `0.60`) and routes low-confidence predictions to an `uncertain` bucket
|
| 76 |
+
for human review.
|
| 77 |
+
|
| 78 |
+
```python
|
| 79 |
+
from PIL import Image
|
| 80 |
+
import torch
|
| 81 |
+
from torchvision import transforms
|
| 82 |
+
|
| 83 |
+
preprocess = transforms.Compose([
|
| 84 |
+
transforms.Resize(384),
|
| 85 |
+
transforms.CenterCrop(384),
|
| 86 |
+
transforms.ToTensor(),
|
| 87 |
+
transforms.Normalize(mean=[0.485, 0.456, 0.406],
|
| 88 |
+
std=[0.229, 0.224, 0.225]),
|
| 89 |
+
])
|
| 90 |
+
|
| 91 |
+
img = Image.open("photo.jpg").convert("RGB")
|
| 92 |
+
x = preprocess(img).unsqueeze(0)
|
| 93 |
+
|
| 94 |
+
with torch.inference_mode():
|
| 95 |
+
logits = model(x)
|
| 96 |
+
probs = logits.softmax(dim=-1)[0]
|
| 97 |
+
|
| 98 |
+
blur_prob = probs[1].item()
|
| 99 |
+
if blur_prob >= 0.60:
|
| 100 |
+
label = "blurred"
|
| 101 |
+
elif blur_prob <= 0.40:
|
| 102 |
+
label = "sharp"
|
| 103 |
+
else:
|
| 104 |
+
label = "uncertain"
|
| 105 |
+
print(label, f"(p(blur)={blur_prob:.3f})")
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
### Multi-scale TTA (free +0.23% F1)
|
| 109 |
+
|
| 110 |
+
The champion checkpoint hits F1 0.9745 (vs 0.9722 single-scale) by averaging
|
| 111 |
+
softmax probabilities across resolutions 320, 384, 448:
|
| 112 |
+
|
| 113 |
+
```python
|
| 114 |
+
scales = [320, 384, 448]
|
| 115 |
+
probs_sum = 0
|
| 116 |
+
for s in scales:
|
| 117 |
+
t = transforms.Compose([
|
| 118 |
+
transforms.Resize(s),
|
| 119 |
+
transforms.CenterCrop(s),
|
| 120 |
+
transforms.ToTensor(),
|
| 121 |
+
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
|
| 122 |
+
])
|
| 123 |
+
with torch.inference_mode():
|
| 124 |
+
probs_sum = probs_sum + model(t(img).unsqueeze(0)).softmax(dim=-1)
|
| 125 |
+
probs = probs_sum / len(scales)
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
## Training
|
| 129 |
+
|
| 130 |
+
- **Dataset**: GoPro Large, paired sharp / motion-blurred frames (Strategy A labeling).
|
| 131 |
+
- **Augmentation**: random crop, horizontal flip, mild brightness/contrast jitter (±0.2).
|
| 132 |
+
- **blur_gamma**: the champion uses the "gamma" variant of GoPro Large which ships
|
| 133 |
+
a second set of synthetically blurred frames from the same sharp sources,
|
| 134 |
+
effectively doubling the blurred class. Worth a consistent +1% F1.
|
| 135 |
+
- **Optimizer**: AdamW, lr=1e-4, weight_decay=1e-4 (5e-3 for the `_wd` variant).
|
| 136 |
+
- **Schedule**: CosineAnnealingLR, 25 epochs, early stopping on val F1.
|
| 137 |
+
- **Loss**: cross-entropy (label smoothing and focal loss were tried and hurt).
|
| 138 |
+
- **Hardware**: single GPU (GCP VM).
|
| 139 |
+
|
| 140 |
+
### Key findings from the sweep
|
| 141 |
+
|
| 142 |
+
1. **Resolution is the biggest lever.** 128 -> 384 px moved F1 from 0.82 to 0.97.
|
| 143 |
+
MobileNetV3-Large helps *only* at >=384 px; at 160-320 it was neutral vs Small.
|
| 144 |
+
2. **blur_gamma extra data** gives a consistent +1% F1.
|
| 145 |
+
3. **Multi-scale TTA is free money** — +0.23% F1 with no retraining.
|
| 146 |
+
4. **Ensembles and threshold tuning** are marginal at this ceiling; the
|
| 147 |
+
single-model champion is already ~1% below the multi-scale TTA ceiling.
|
| 148 |
+
5. **Training past 25 epochs at 384 px overfits.**
|
| 149 |
+
|
| 150 |
+
## Intended use
|
| 151 |
+
|
| 152 |
+
- **Document / receipt capture**: reject blurred shots before OCR.
|
| 153 |
+
- **Archival pipelines**: flag low-quality frames for re-capture.
|
| 154 |
+
- **Smartphone camera apps**: real-time shutter-hint / auto-retake.
|
| 155 |
+
|
| 156 |
+
Not designed to distinguish motion blur vs defocus vs low-light noise — it's a
|
| 157 |
+
binary gate trained specifically on motion blur from the GoPro dataset.
|
| 158 |
+
|
| 159 |
+
## Citation
|
| 160 |
+
|
| 161 |
+
```
|
| 162 |
+
@misc{magika-blur-detector,
|
| 163 |
+
author = {Duy Tran Thanh (Brad Duy)},
|
| 164 |
+
title = {Magika Blur Detector},
|
| 165 |
+
year = {2026},
|
| 166 |
+
howpublished = {\url{https://huggingface.co/bradduy/magika-blur-detector}},
|
| 167 |
+
}
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
## License
|
| 171 |
+
|
| 172 |
+
MIT. See the project repository for details.
|
efficientnet_b0_384/best.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1b17779c640c7704b12f1e390469027421bb140857e3b810c6f40a93be8cf331
|
| 3 |
+
size 16319717
|
mobilenet_v3_large_384_gamma/best.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fea1314291ac90e12d2c3343f51e0dd4f8b7f7d5b180ab53084aa077b7ef634d
|
| 3 |
+
size 17011669
|
mobilenet_v3_large_384_wd/best.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ecedd7c7fd1a9e5e1289e5bfc2f4da5e6790d197466634cfb70a59191121517e
|
| 3 |
+
size 17011669
|