Skip to content

Commit 9fbc66e

Browse files
committed
Merge branch 'dev-1.x' of https://github.com/open-mmlab/mmpose into dev-1.x
2 parents 0e27c3b + 464635a commit 9fbc66e

File tree

130 files changed

+23236
-199
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

130 files changed

+23236
-199
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ https://user-images.githubusercontent.com/15977946/124654387-0fd3c500-ded1-11eb-
120120
- More flexible code structure and style, fewer restrictions, and a shorter code review process.
121121
- Utilize the powerful capabilities of MMPose in the form of independent projects without being constrained by the code framework.
122122
- Newly added projects include:
123+
- [Pose Anything](/projects/pose_anything/)
123124
- [RTMPose](/projects/rtmpose/)
124125
- [YOLOX-Pose](/projects/yolox_pose/)
125126
- [MMPose4AIGC](/projects/mmpose4aigc/)
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
2+
3+
<!-- [ALGORITHM] -->
4+
5+
<details>
6+
<summary align="right"><a href="https://arxiv.org/abs/2312.07526">RTMO</a></summary>
7+
8+
```bibtex
9+
@misc{lu2023rtmo,
10+
title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
11+
author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
12+
year={2023},
13+
eprint={2312.07526},
14+
archivePrefix={arXiv},
15+
primaryClass={cs.CV}
16+
}
17+
```
18+
19+
</details>
20+
21+
RTMO is a one-stage pose estimation model that seamlessly integrates coordinate classification into the YOLO architecture. It introduces a Dynamic Coordinate Classifier (DCC) module that handles keypoint localization through dual 1D heatmaps. The DCC employs dynamic bin allocation, localizing the coordinate bins to each predicted bounding box to improve efficiency. It also uses learnable bin representations based on positional encodings, enabling computation of bin-keypoint similarity for precise localization.
22+
23+
RTMO is trained end-to-end using a multi-task loss, with losses for bounding box regression, keypoint heatmap classification via a novel MLE loss, keypoint coordinate proxy regression, and keypoint visibility classification. The MLE loss models annotation uncertainty and balances optimization between easy and hard samples.
24+
25+
During inference, RTMO employs grid-based dense predictions to simultaneously output human detection boxes and poses in a single pass. It selectively decodes heatmaps only for high-scoring grids after NMS, minimizing computational cost.
26+
27+
Compared to prior one-stage methods that regress keypoint coordinates directly, RTMO achieves higher accuracy through coordinate classification while retaining real-time speeds. It also outperforms lightweight top-down approaches for images with many people, as the latter have inference times that scale linearly with the number of human instances.

0 commit comments

Comments
 (0)