Upload blur-detector checkpoints (exp29, exp45, exp42) + model card

Browse files

Files changed (4) hide show

README.md +172 -0
efficientnet_b0_384/best.pt +3 -0
mobilenet_v3_large_384_gamma/best.pt +3 -0
mobilenet_v3_large_384_wd/best.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,172 @@

+---
+license: mit
+library_name: pytorch
+tags:
+  - image-classification
+  - blur-detection
+  - image-quality
+  - mobilenet
+  - efficientnet
+  - magika
+pipeline_tag: image-classification
+datasets:
+  - GoPro-Large
+metrics:
+  - f1
+  - accuracy
+  - roc_auc
+---
+# Magika Blur Detector
+A **Magika-inspired lightweight blur detector** — a fast image quality gate that
+classifies images as `sharp`, `blurred`, or `uncertain`. Trained on the GoPro
+Large dataset (paired sharp / motion-blurred frames).
+> "Is this photo sharp enough to OCR / archive / show the user?" — answered in
+> a few milliseconds on CPU with a < 20 MB model.
+## Results (GoPro Large official test split)
+| Rank | Checkpoint | Backbone | Res | F1 | Accuracy | Precision | Recall | AUC |
+|---|---|---|---:|---:|---:|---:|---:|---:|
+| 1 | `mobilenet_v3_large_384_gamma` + multi-scale TTA | MobileNetV3-Large | 320/384/448 | **0.9745** | 0.9748 | 0.9880 | 0.9613 | 0.9979 |
+| 2 | `mobilenet_v3_large_384_gamma` (single-scale) | MobileNetV3-Large | 384 | 0.9722 | - | - | - | - |
+| 3 | `mobilenet_v3_large_384_wd` | MobileNetV3-Large | 384 | 0.9597 | - | - | - | - |
+| 4 | `efficientnet_b0_384` | EfficientNet-B0 | 384 | ~0.965 | - | - | - | - |
+| - | Baseline (MNV3-Small 128px) | MobileNetV3-Small | 128 | 0.8188 | 0.8218 | 0.8326 | 0.8056 | 0.9061 |
+**46 experiments across 8 sweeps** were run to reach this. See the
+[project repository](https://github.com/bradduy/MagikaDocumentFromPixel) for the
+full autoresearch log.
+## Files
+| Path | What |
+|---|---|
+| `mobilenet_v3_large_384_gamma/best.pt` | **Recommended.** Champion checkpoint. PyTorch `state_dict` for torchvision `mobilenet_v3_large` with the final classifier re-initialized to 2 classes. |
+| `mobilenet_v3_large_384_wd/best.pt` | Regularized sibling (wd=5e-3). Diversifies well with the champion for ensembling. |
+| `efficientnet_b0_384/best.pt` | EfficientNet-B0 alternative. |
+All checkpoints are raw `state_dict`s — no optimizer / scheduler state. Load
+them into a torchvision backbone with the final classifier layer swapped to
+`Linear(in_features, 2)`:
+```python
+import torch
+import torchvision.models as tvm
+from huggingface_hub import hf_hub_download
+ckpt = hf_hub_download(
+    repo_id="bradduy/magika-blur-detector",
+    filename="mobilenet_v3_large_384_gamma/best.pt",
+)
+model = tvm.mobilenet_v3_large(weights=None)
+in_features = model.classifier[-1].in_features
+model.classifier[-1] = torch.nn.Linear(in_features, 2)
+model.load_state_dict(torch.load(ckpt, map_location="cpu"))
+model.eval()
+```
+## Inference
+Classes: `0 = sharp`, `1 = blurred`. The project applies a confidence threshold
+(default `0.60`) and routes low-confidence predictions to an `uncertain` bucket
+for human review.
+```python
+from PIL import Image
+import torch
+from torchvision import transforms
+preprocess = transforms.Compose([
+    transforms.Resize(384),
+    transforms.CenterCrop(384),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.485, 0.456, 0.406],
+                         std=[0.229, 0.224, 0.225]),
+])
+img = Image.open("photo.jpg").convert("RGB")
+x = preprocess(img).unsqueeze(0)
+with torch.inference_mode():
+    logits = model(x)
+    probs = logits.softmax(dim=-1)[0]
+blur_prob = probs[1].item()
+if blur_prob >= 0.60:
+    label = "blurred"
+elif blur_prob <= 0.40:
+    label = "sharp"
+else:
+    label = "uncertain"
+print(label, f"(p(blur)={blur_prob:.3f})")
+```
+### Multi-scale TTA (free +0.23% F1)
+The champion checkpoint hits F1 0.9745 (vs 0.9722 single-scale) by averaging
+softmax probabilities across resolutions 320, 384, 448:
+```python
+scales = [320, 384, 448]
+probs_sum = 0
+for s in scales:
+    t = transforms.Compose([
+        transforms.Resize(s),
+        transforms.CenterCrop(s),
+        transforms.ToTensor(),
+        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
+    ])
+    with torch.inference_mode():
+        probs_sum = probs_sum + model(t(img).unsqueeze(0)).softmax(dim=-1)
+probs = probs_sum / len(scales)
+```
+## Training
+- **Dataset**: GoPro Large, paired sharp / motion-blurred frames (Strategy A labeling).
+- **Augmentation**: random crop, horizontal flip, mild brightness/contrast jitter (±0.2).
+- **blur_gamma**: the champion uses the "gamma" variant of GoPro Large which ships
+  a second set of synthetically blurred frames from the same sharp sources,
+  effectively doubling the blurred class. Worth a consistent +1% F1.
+- **Optimizer**: AdamW, lr=1e-4, weight_decay=1e-4 (5e-3 for the `_wd` variant).
+- **Schedule**: CosineAnnealingLR, 25 epochs, early stopping on val F1.
+- **Loss**: cross-entropy (label smoothing and focal loss were tried and hurt).
+- **Hardware**: single GPU (GCP VM).
+### Key findings from the sweep
+1. **Resolution is the biggest lever.** 128 -> 384 px moved F1 from 0.82 to 0.97.
+   MobileNetV3-Large helps *only* at >=384 px; at 160-320 it was neutral vs Small.
+2. **blur_gamma extra data** gives a consistent +1% F1.
+3. **Multi-scale TTA is free money** — +0.23% F1 with no retraining.
+4. **Ensembles and threshold tuning** are marginal at this ceiling; the
+   single-model champion is already ~1% below the multi-scale TTA ceiling.
+5. **Training past 25 epochs at 384 px overfits.**
+## Intended use
+- **Document / receipt capture**: reject blurred shots before OCR.
+- **Archival pipelines**: flag low-quality frames for re-capture.
+- **Smartphone camera apps**: real-time shutter-hint / auto-retake.
+Not designed to distinguish motion blur vs defocus vs low-light noise — it's a
+binary gate trained specifically on motion blur from the GoPro dataset.
+## Citation
+```
+@misc{magika-blur-detector,
+  author = {Duy Tran Thanh (Brad Duy)},
+  title  = {Magika Blur Detector},
+  year   = {2026},
+  howpublished = {\url{https://huggingface.co/bradduy/magika-blur-detector}},
+}
+```
+## License
+MIT. See the project repository for details.

efficientnet_b0_384/best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b17779c640c7704b12f1e390469027421bb140857e3b810c6f40a93be8cf331
+size 16319717

mobilenet_v3_large_384_gamma/best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fea1314291ac90e12d2c3343f51e0dd4f8b7f7d5b180ab53084aa077b7ef634d
+size 17011669

mobilenet_v3_large_384_wd/best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ecedd7c7fd1a9e5e1289e5bfc2f4da5e6790d197466634cfb70a59191121517e
+size 17011669