File size: 3,092 Bytes
cb6b726
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e779b1d
cb6b726
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e779b1d
cb6b726
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e779b1d
cb6b726
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
license: apache-2.0
---

# SynthPose (MMPose HRNet48+DarkPose variant)

The SynthPose model was proposed in [OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics](https://arxiv.org/abs/2406.09788) by Yoni Gozlan, Antoine Falisse, Scott Uhlrich, Anthony Gatti, Michael Black, Akshay Chaudhari. 

# Intended use cases

This model uses DarkPose with an HRNet backbone.
SynthPose is a new approach that enables finetuning of pre-trained 2D human pose models to predict an arbitrarily denser set of keypoints for accurate kinematic analysis through the use of synthetic data.
More details are available in [OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics](https://arxiv.org/abs/2406.09788).
This particular variant was finetuned on a set of keypoints usually found on Motion Capture setups, and include coco keypoints as well.

The model predicts the following 52 markers:

```
[
    'nose',
    'left_eye',
    'right_eye',
    'left_ear',
    'right_ear',
    'left_shoulder',
    'right_shoulder',
    'left_elbow',
    'right_elbow',
    'left_wrist',
    'right_wrist',
    'left_hip',
    'right_hip',
    'left_knee',
    'right_knee',
    'left_ankle',
    'right_ankle',
    'sternum',
    'rshoulder',
    'lshoulder',
    'r_lelbow',
    'l_lelbow',
    'r_melbow',
    'l_melbow',
    'r_lwrist',
    'l_lwrist',
    'r_mwrist',
    'l_mwrist',
    'r_ASIS',
    'l_ASIS',
    'r_PSIS',
    'l_PSIS',
    'r_knee',
    'l_knee',
    'r_mknee',
    'l_mknee',
    'r_ankle',
    'l_ankle',
    'r_mankle',
    'l_mankle',
    'r_5meta',
    'l_5meta',
    'r_toe',
    'l_toe',
    'r_big_toe',
    'l_big_toe',
    'l_calc',
    'r_calc',
    'C7',
    'L2',
    'T11',
    'T6',
]
```
Where the first 17 keypoints are the COCO keypoints, and the next 35 are anatomical markers.

# Usage

## Installation
This implementation is based on [MMPose](https://mmpose.readthedocs.io/en/latest/).
MMpose requires torch, and the installation process is the following:
```bash
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"
```

## Image inference

Here's how to load the model and run inference on an image:

```python
from huggingface_hub import snapshot_download
from mmpose.apis import MMPoseInferencer

snapshot_download(repo_id="yonigozlan/synthpose-hrnet-48-mmpose", local_dir="./synthpose-hrnet-48-mmpose")
inferencer = MMPoseInferencer(
    pose2d='./synthpose-hrnet-48-mmpose/td-hm_hrnet-w48_dark-8xb32-210e_synthpose_inference.py',
    pose2d_weights='./synthpose-hrnet-48-mmpose/hrnet-w48_dark.pth'
)

url = "https://farm7.staticflickr.com/6105/6218847094_20deb6b938_z.jpg"
result_generator = inferencer([url], pred_out_dir='predictions', vis_out_dir='visualizations')
results = next(result_generator)
```

## Video inference

To run inference on a video, simply replace the last two lines with 

```python
result_generator = inferencer("football.mp4", pred_out_dir='predictions', vis_out_dir='visualizations')
results = [result for result in result_generator]
```