Skip to content

Commit a4df14b

Browse files
authored
Merge pull request #12 from deruyter92/jaap/minor_refactors
Minor refactors
2 parents d76b239 + 6b5d354 commit a4df14b

11 files changed

Lines changed: 68 additions & 62 deletions

File tree

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,8 @@ htmlcov/
4545
*.pkl
4646
*.h5
4747
*.ckpt
48+
49+
# Excluded directories
50+
pre_trained_models/
51+
demo/predictions/
52+
demo/images/

README.md

Lines changed: 25 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# FMPose3D: monocular 3D pose estimation via flow matching
22

33
![Version](https://img.shields.io/badge/python_version-3.10-purple)
4-
[![PyPI version](https://badge.fury.io/py/fmpose3d.svg)](https://badge.fury.io/py/fmpose3d)
5-
[![License: LApache 2.0](https://img.shields.io/badge/License-Apache2.0-blue.svg)](https://www.gnu.org/licenses/apach2.0)
4+
[![PyPI version](https://badge.fury.io/py/fmpose3d.svg?icon=si%3Apython)](https://badge.fury.io/py/fmpose3d)
5+
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
66

7-
This is the official implementation of the approach described in the paper:
7+
This is the official implementation of the approach described in the preprint:
88

9-
[**FMPose3D: monocular 3D Pose Estimation via Flow Matching**](xxx)
9+
[**FMPose3D: monocular 3D pose estimation via flow matching**](https://arxiv.org/abs/2602.05755)
1010
Ti Wang, Xiaohang Yu, Mackenzie Weygandt Mathis
1111

1212
<!-- <p align="center"><img src="./images/Frame 4.jpg" width="50%" alt="" /></p> -->
@@ -15,13 +15,13 @@ Ti Wang, Xiaohang Yu, Mackenzie Weygandt Mathis
1515

1616
## 🚀 TL;DR
1717

18-
FMPose3D replaces slow diffusion models for monocular 3D pose estimation with fast Flow Matching, generating multiple plausible 3D poses via an ODE in just a few steps, then aggregates them using a reprojection-based Bayesian module (RPEA) for accurate predictions, achieving state-of-the-art results on human and animal 3D pose benchmarks.
18+
FMPose3D creates a 3D pose from a single 2D image. It leverages fast Flow Matching, generating multiple plausible 3D poses via an ODE in just a few steps, then aggregates them using a reprojection-based Bayesian module (RPEA) for accurate predictions, achieving state-of-the-art results on human and animal 3D pose benchmarks.
1919

2020

2121

2222
## News!
2323

24-
- [X] Feb 2026: FMPose3D code and arXiv paper is released - check out the demos here or on our [project page](https://xiu-cs.github.io/FMPose3D/)
24+
- [X] Feb 2026: the FMPose3D code and our arXiv paper is released - check out the demos here or on our [project page](https://xiu-cs.github.io/FMPose3D/)
2525
- [ ] Planned: This method will be integrated into [DeepLabCut](https://www.mackenziemathislab.org/deeplabcut)
2626

2727
## Installation
@@ -32,17 +32,11 @@ Make sure you have Python 3.10+. You can set this up with:
3232
```bash
3333
conda create -n fmpose_3d python=3.10
3434
conda activate fmpose_3d
35-
```
36-
<!-- test version -->
37-
```bash
38-
git clone https://github.com/AdaptiveMotorControlLab/FMPose3D.git
39-
# TestPyPI (pre-release/testing build)
40-
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ fmpose3d==0.0.7
41-
# Future Official PyPI release
42-
# pip install fmpose3d
35+
36+
pip install fmpose3d
4337
```
4438

45-
## Demo
39+
## Demos
4640

4741
### Testing on in-the-wild images (humans)
4842

@@ -85,7 +79,7 @@ The training logs, checkpoints, and related files of each training time will be
8579

8680
For training on Human3.6M:
8781
```bash
88-
sh /scripts/FMPose3D_train.sh
82+
sh ./scripts/FMPose3D_train.sh
8983
```
9084

9185
### Inference
@@ -98,10 +92,24 @@ To run inference on Human3.6M:
9892
sh ./scripts/FMPose3D_test.sh
9993
```
10094

101-
## Experiments Animals
95+
## Experiments on non-human animals
10296

10397
For animal training/testing and demo scripts, see [animals/README.md](animals/README.md).
10498

99+
## Citation
100+
101+
```
102+
@misc{wang2026fmpose3dmonocular3dpose,
103+
title={FMPose3D: monocular 3D pose estimation via flow matching},
104+
author={Ti Wang and Xiaohang Yu and Mackenzie Weygandt Mathis},
105+
year={2026},
106+
eprint={2602.05755},
107+
archivePrefix={arXiv},
108+
primaryClass={cs.CV},
109+
url={https://arxiv.org/abs/2602.05755},
110+
}
111+
```
112+
105113
## Acknowledgements
106114

107115
We thank the Swiss National Science Foundation (SNSF Project # 320030-227871) and the Kavli Foundation for providing financial support for this project.

animals/demo/vis_animals.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
"""
99

1010
# SuperAnimal Demo: https://github.com/DeepLabCut/DeepLabCut/blob/main/examples/COLAB/COLAB_YOURDATA_SuperAnimal.ipynb
11-
import sys
1211
import os
1312
import numpy as np
1413
import glob
@@ -25,8 +24,6 @@
2524
from fmpose3d.animals.common.arguments import opts as parse_args
2625
from fmpose3d.common.camera import normalize_screen_coordinates, camera_to_world
2726

28-
sys.path.append(os.getcwd())
29-
3027
args = parse_args().parse()
3128
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu
3229

@@ -334,13 +331,15 @@ def get_pose3D(path, output_dir, type='image'):
334331
print(f"args.n_joints: {args.n_joints}, args.out_joints: {args.out_joints}")
335332

336333
## Reload model
334+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
335+
337336
model = {}
338-
model['CFM'] = CFM(args).cuda()
337+
model['CFM'] = CFM(args).to(device)
339338

340339
model_dict = model['CFM'].state_dict()
341340
model_path = args.saved_model_path
342341
print(f"Loading model from: {model_path}")
343-
pre_dict = torch.load(model_path)
342+
pre_dict = torch.load(model_path, map_location=device, weights_only=True)
344343
for name, key in model_dict.items():
345344
model_dict[name] = pre_dict[name]
346345
model['CFM'].load_state_dict(model_dict)
@@ -400,7 +399,8 @@ def get_3D_pose_from_image(args, keypoints, i, img, model, output_dir):
400399
input_2D = np.expand_dims(input_2D, axis=0) # (1, J, 2)
401400

402401
# Convert to tensor format matching visualize_animal_poses.py
403-
input_2D = torch.from_numpy(input_2D.astype('float32')).cuda() # (1, J, 2)
402+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
403+
input_2D = torch.from_numpy(input_2D.astype('float32')).to(device) # (1, J, 2)
404404
input_2D = input_2D.unsqueeze(0) # (1, 1, J, 2)
405405

406406
# Euler sampler for CFM
@@ -418,7 +418,7 @@ def euler_sample(c_2d, y_local, steps, model_3d):
418418

419419
# Single inference without flip augmentation
420420
# Create 3D random noise with shape (1, 1, J, 3)
421-
y = torch.randn(input_2D.size(0), input_2D.size(1), input_2D.size(2), 3).cuda()
421+
y = torch.randn(input_2D.size(0), input_2D.size(1), input_2D.size(2), 3, device=device)
422422
output_3D = euler_sample(input_2D, y, steps=args.sample_steps, model_3d=model)
423423

424424
output_3D = output_3D[0:, args.pad].unsqueeze(1)

animals/scripts/main_animal3d.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def step(split, args, actions, dataLoader, model, optimizer=None, epoch=None, st
7575
# gt_3D shape: torch.Size([B, J, 4]) (x,y,z + homogeneous coordinate)
7676
gt_3D = gt_3D[:,:,:3] # only use x,y,z for 3D ground truth
7777

78-
# [input_2D, gt_3D, batch_cam, vis_3D] = get_varialbe(split, [input_2D, gt_3D, batch_cam, vis_3D])
78+
# [input_2D, gt_3D, batch_cam, vis_3D] = get_variable(split, [input_2D, gt_3D, batch_cam, vis_3D])
7979

8080
# unsqueeze frame dimension
8181
input_2D = input_2D.unsqueeze(1) # (B,F,J,C)
@@ -264,15 +264,17 @@ def get_parameter_number(net):
264264
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=args.batch_size,
265265
shuffle=False, num_workers=int(args.workers), pin_memory=True)
266266

267+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
268+
267269
model = {}
268-
model['CFM'] = CFM(args).cuda()
270+
model['CFM'] = CFM(args).to(device)
269271

270272
if args.reload:
271273
model_dict = model['CFM'].state_dict()
272274
# Prefer explicit saved_model_path; otherwise fallback to previous_dir glob
273275
model_path = args.saved_model_path
274276
print(model_path)
275-
pre_dict = torch.load(model_path)
277+
pre_dict = torch.load(model_path, weights_only=True, map_location=device)
276278
for name, key in model_dict.items():
277279
model_dict[name] = pre_dict[name]
278280
model['CFM'].load_state_dict(model_dict)

demo/vis_in_the_wild.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
Licensed under Apache 2.0
88
"""
99

10-
import sys
1110
import cv2
1211
import os
1312
import numpy as np
@@ -16,8 +15,6 @@
1615
from tqdm import tqdm
1716
import copy
1817

19-
sys.path.append(os.getcwd())
20-
2118
# Auto-download checkpoint files if missing
2219
from fmpose3d.lib.checkpoint.download_checkpoints import ensure_checkpoints
2320
ensure_checkpoints()
@@ -213,7 +210,8 @@ def get_3D_pose_from_image(args, keypoints, i, img, model, output_dir):
213210

214211
input_2D = input_2D[np.newaxis, :, :, :, :]
215212

216-
input_2D = torch.from_numpy(input_2D.astype('float32')).cuda()
213+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
214+
input_2D = torch.from_numpy(input_2D.astype('float32')).to(device)
217215

218216
N = input_2D.size(0)
219217

@@ -229,10 +227,10 @@ def euler_sample(c_2d, y_local, steps, model_3d):
229227

230228
## estimation
231229

232-
y = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3).cuda()
230+
y = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3, device=device)
233231
output_3D_non_flip = euler_sample(input_2D[:, 0], y, steps=args.sample_steps, model_3d=model)
234232

235-
y_flip = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3).cuda()
233+
y_flip = torch.randn(input_2D.size(0), input_2D.size(2), input_2D.size(3), 3, device=device)
236234
output_3D_flip = euler_sample(input_2D[:, 1], y_flip, steps=args.sample_steps, model_3d=model)
237235

238236
output_3D_flip[:, :, :, 0] *= -1
@@ -280,14 +278,16 @@ def get_pose3D(path, output_dir, type='image'):
280278
# args.type = type
281279

282280
## Reload
281+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
282+
283283
model = {}
284-
model['CFM'] = CFM(args).cuda()
284+
model['CFM'] = CFM(args).to(device)
285285

286286
# if args.reload:
287287
model_dict = model['CFM'].state_dict()
288288
model_path = args.model_weights_path
289289
print(model_path)
290-
pre_dict = torch.load(model_path)
290+
pre_dict = torch.load(model_path, map_location=device, weights_only=True)
291291
for name, key in model_dict.items():
292292
model_dict[name] = pre_dict[name]
293293
model['CFM'].load_state_dict(model_dict)

fmpose3d/animals/common/arber_dataset.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
import glob
1313
import os
1414
import random
15-
import sys
1615

1716
import cv2
1817
import matplotlib.pyplot as plt
@@ -23,10 +22,8 @@
2322
from torch.utils.data import Dataset
2423
from tqdm import tqdm
2524

26-
sys.path.append(os.path.dirname(sys.path[0]))
27-
28-
from common.camera import normalize_screen_coordinates
29-
from common.lifter3d import load_camera_params, load_h5_keypoints
25+
from fmpose3d.common.camera import normalize_screen_coordinates
26+
from fmpose3d.animals.common.lifter3d import load_camera_params, load_h5_keypoints
3027

3128

3229
class ArberDataset(Dataset):

fmpose3d/animals/common/utils.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515

1616
import numpy as np
1717
import torch
18-
from torch.autograd import Variable
1918

2019

2120
def mpjpe_cal(predicted, target):
@@ -220,18 +219,17 @@ def update(self, val, n=1):
220219
self.avg = self.sum / self.count
221220

222221

223-
def get_varialbe(split, target):
222+
def get_variable(split, target):
223+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
224224
num = len(target)
225225
var = []
226226
if split == "train":
227227
for i in range(num):
228-
temp = (
229-
Variable(target[i], requires_grad=False).contiguous().type(torch.cuda.FloatTensor)
230-
)
228+
temp = target[i].requires_grad_(False).contiguous().float().to(device)
231229
var.append(temp)
232230
else:
233231
for i in range(num):
234-
temp = Variable(target[i]).contiguous().cuda().type(torch.cuda.FloatTensor)
232+
temp = target[i].contiguous().float().to(device)
235233
var.append(temp)
236234

237235
return var

fmpose3d/common/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
save_top_N_models,
2323
test_calculation,
2424
print_error,
25-
get_varialbe,
25+
get_variable,
2626
)
2727

2828
__all__ = [
@@ -36,6 +36,6 @@
3636
"save_top_N_models",
3737
"test_calculation",
3838
"print_error",
39-
"get_varialbe",
39+
"get_variable",
4040
]
4141

fmpose3d/common/utils.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515

1616
import numpy as np
1717
import torch
18-
from torch.autograd import Variable
1918

2019
def deterministic_random(min_value, max_value, data):
2120
digest = hashlib.sha256(data.encode()).digest()
@@ -186,20 +185,17 @@ def update(self, val, n=1):
186185
self.avg = self.sum / self.count
187186

188187

189-
def get_varialbe(split, target):
188+
def get_variable(split, target):
189+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
190190
num = len(target)
191191
var = []
192192
if split == "train":
193193
for i in range(num):
194-
temp = (
195-
Variable(target[i], requires_grad=False)
196-
.contiguous()
197-
.type(torch.cuda.FloatTensor)
198-
)
194+
temp = target[i].requires_grad_(False).contiguous().float().to(device)
199195
var.append(temp)
200196
else:
201197
for i in range(num):
202-
temp = Variable(target[i]).contiguous().cuda().type(torch.cuda.FloatTensor)
198+
temp = target[i].contiguous().float().to(device)
203199
var.append(temp)
204200

205201
return var

fmpose3d/models/model_GAMLP.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,6 @@
77
Licensed under Apache 2.0
88
"""
99

10-
import sys
11-
sys.path.append("..")
1210
import torch
1311
import torch.nn as nn
1412
import math

0 commit comments

Comments
 (0)