Image Feature Extraction
Py-Feat
PyTorch
Safetensors
pose-estimation
head-pose
landmark-to-pose
distillation
Instructions to use py-feat/pose_mlp_v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Py-Feat
How to use py-feat/pose_mlp_v2 with Py-Feat:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 7,147 Bytes
2fa5435 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | ---
tags:
- pytorch
- safetensors
- pose-estimation
- head-pose
- landmark-to-pose
- distillation
- py-feat
library_name: py-feat
pipeline_tag: image-feature-extraction
license: mit
---
# Py-Feat Pose-MLP v2 β Landmark-to-6DoF Head Pose
A small distilled MLP that takes 68 face landmarks (the dlib-68 / OpenFace
layout produced by `mobilefacenet`, OpenFace, etc.) and emits 6DoF head
pose calibrated to img2pose's coordinate frame. Designed for `py-feat`
pipelines that use a face detector without a built-in pose head (e.g.
RetinaFace in `py-feat β₯ 0.7`).
## Model Description
`py-feat`'s v0.6 production pipeline used `img2pose` as its face detector,
which multi-tasks face localization with 6DoF head pose regression β so
pose came "for free" from the detector. In v0.7 the default face detector
became `RetinaFace` (much higher WIDERFACE Hard AP) which only detects
faces. To preserve the `Fex` schema (`pitch`, `roll`, `yaw`, `x`, `y`,
`z` columns), `py-feat` distills img2pose's pose regression into a small
MLP that operates entirely on already-computed landmarks.
The MLP is bbox-free: it normalizes incoming landmarks by their centroid
and inter-eye distance, so the same model works regardless of whether
the upstream detector produced loose (img2pose) or tight (RetinaFace)
face crops.
## Model Details
- **Model type**: Multi-layer perceptron (MLP)
- **Architecture**: `Linear(136β512) β LayerNorm β GELU β Dropout(0.15)
β Linear(512β256) β LayerNorm β GELU β Dropout β Linear(256β128) β
LayerNorm β GELU β Dropout β Linear(128β6)`
- **Parameter count**: 236,934 (~0.9 MB safetensors)
- **Input**: 68 2D landmarks, normalized by landmark centroid and
inter-eye distance (`feat.utils.face_pose_mlp.normalize_landmarks`).
- **Output**: 6 values β `[Pitch, Roll, Yaw, X, Y, Z]`. The MLP emits
z-scored values; the loader de-normalizes using `mean`/`std` stored in
the sidecar `pose_mlp_v2.json`. Angles are radians, calibrated to
img2pose's coordinate frame.
- **Framework**: PyTorch (safetensors weight file, no pickle).
- **Inference cost**: ~10 Β΅s / face on CPU (batched), negligible vs.
the upstream face/landmark stages.
## Training Details
- **Teacher**: `img2pose` (Albiero et al., 2021). The MLP is trained to
match img2pose's regressed `[Pitch, Roll, Yaw, X, Y, Z]` outputs.
- **Training corpus**: CelebV-HQ β `n_clips = 35,445`,
`n_train_frames = 2,783,134`, `n_val_frames = 154,619`. Frames with
`FaceScore < 0.8` or `|pose| > 75Β°` are dropped (filters bad teacher
signal on degenerate poses).
- **Loss**: MSE on z-scored 6D output.
- **Optimizer**: Adam, `lr=1e-3`, `batch_size=1024`.
- **Epochs**: 40 (best val loss at last epoch β see `pose_mlp_v2.json`
for per-epoch history).
- **Hardware**: single GPU (training takes ~2 hr).
- **Seed**: 42.
### Held-out validation MAE on CelebV-HQ (clip-disjoint split)
| Axis | MAE (Β°) |
|---|---|
| Pitch | 2.66 |
| Roll | 2.34 |
| Yaw | 1.58 |
For reference, img2pose's reported MAE on the AFLW2000-3D / BIWI test
sets is ~4Β° average. The MLP cannot exceed its teacher; values here are
the gap between the MLP and the teacher's predictions, not against a
ground-truth motion-capture rig.
### v1 β v2 changelog
| Aspect | v1 | v2 |
|---|---|---|
| Hidden | 256β128β64 | 512β256β128 |
| Activation | Linear β ReLU β Dropout | Linear β LayerNorm β GELU β Dropout |
| Dropout | 0.10 | 0.15 |
| Training frames | 569,678 | 2,783,134 |
| Epochs | 30 | 40 |
| Best val loss | 0.0809 | 0.0777 |
| Roll MAE (Β°) | 2.530 | 2.335 |
## Intended Use
- **Primary**: Drop-in replacement for img2pose's pose head when using
`py-feat` with a face detector that doesn't predict pose
(`face_model='retinaface'` in `feat.Detector`, MediaPipe in
`feat.MPDetector`).
- **Secondary**: Any pipeline that produces 68 dlib-style face landmarks
and wants img2pose-compatible head pose without re-running img2pose.
### Out of scope
- Eye / gaze direction β use `L2CS-Net` for gaze.
- Mediapipe-478 landmarks β translate to 68 dlib landmarks first.
- Static head-pose inference from a single landmark (less than 68 pts).
## Usage
The MLP is loaded automatically by `feat.Detector` when
`face_model != 'img2pose'`. To call it directly:
```python
import torch
from feat.utils.face_pose_mlp import pose_from_landmarks_mlp
# 68 (x, y) landmarks in image-pixel coordinates, e.g. from mobilefacenet.
landmarks = torch.tensor([
# ... [68, 2] ...
], dtype=torch.float32).unsqueeze(0) # [1, 68, 2]
pose = pose_from_landmarks_mlp(landmarks) # [1, 6]: (Pitch, Roll, Yaw, X, Y, Z)
print(pose)
```
Weights resolve from (in order):
1. `FEAT_POSE_MLP_PATH` environment variable
2. `models/pose_mlp_v2.safetensors` in the repo
3. This HuggingFace repo (`py-feat/pose_mlp_v2`)
## Limitations
- The MLP cannot improve on img2pose's accuracy β it only matches it
more efficiently with bbox-free input. Use img2pose directly if you
need img2pose's exact behavior (a tiny ~1Β° distillation gap may remain).
- Trained on CelebV-HQ β performance on non-frontal, occluded, or
heavily-rotated faces (>75Β°) is degraded by both the teacher and the
data filter.
- Output coordinates are img2pose's frame, not a standard FACS / BIWI
frame. Pose values are interpretable across the `py-feat` pipeline
but may need recalibration to compare with other tools.
## Citation
If you use `py-feat` and this pose-MLP, please cite both `py-feat` and
img2pose:
```bibtex
@article{cheong2023pyfeat,
title={Py-Feat: Python Facial Expression Analysis Toolbox},
author={Cheong, Jin Hyun and Jolly, Eshin and Xie, Tiankang and Byrne, Sophie and Kenney, Matthew and Chang, Luke J.},
journal={Affective Science},
volume={4},
pages={781--796},
year={2023}
}
@inproceedings{albiero2021img2pose,
title={img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation},
author={Albiero, VΓtor and Chen, Xingyu and Yin, Xi and Pang, Guan and Hassner, Tal},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={7617--7627},
year={2021}
}
@inproceedings{zhu2022celebvhq,
title={CelebV-HQ: A Large-Scale Video Facial Attributes Dataset},
author={Zhu, Hao and Wu, Wayne and Zhu, Wentao and Jiang, Liming and Tang, Siwei and Zhang, Li and Liu, Ziwei and Loy, Chen Change},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2022}
}
```
## License
MIT (this distillation). The teacher (`img2pose`) is BSD-3, and the
training corpus (CelebV-HQ) is released for non-commercial research
use β please honor each upstream license if you re-train or
re-distribute.
## Files
- `pose_mlp_v2.safetensors` β model weights (1 MB)
- `pose_mlp_v2.json` β architecture, output-normalization stats, training
history, validation MAE per epoch
- `README.md` β this card
## Acknowledgments
Distilled from img2pose by VΓtor Albiero et al. (Meta AI / NVIDIA),
trained on CelebV-HQ by Hao Zhu et al. (CUHK / S-Lab NTU). Built and
maintained by [Cosanlab](https://cosanlab.com) at Dartmouth.
|