Multimodal Flow Matching for Large-Scale Human Mobility Flow Generation Using Satellite Imagery and Social Sensing Data
Human mobility flow captures the intensity of spatial interactions within a city and offers insights into both the spatial structure of urban systems and human dynamics. However, the acquisition of high-quality human mobility data remains challenging due to high collection costs, limited accessibility, and sparse spatial coverage, motivating the adoption of generative AI to produce representative and scalable human mobility data. In this research, we propose a generative GeoAI-based multimodal flow matching method for large-scale origin–destination (OD) mobility flow generation by fusing remote sensing and social sensing data. This method demonstrates superior performance in both intra-city and cross-city scenarios in the three largest metropolitan areas in the United States, evaluated across metrics regarding mobility flow volume estimation error and graph structure distribution. We further develop an interpretability framework tailored to this model, revealing the role of attention in heterogeneous modality alignment and interpreting the stepwise inference process of the generative model. Through visual and quantitative analysis of the attention scores before and after multimodal alignment, we identify explainable fusion patterns between remote sensing and social sensing data. The deterministic and continuous OD flow generation process defined by flow matching provides transparent evolutionary paths. This study not only accurately models human mobility flows and urban functional coupling between regions but also offers insights on the use of generative AI for understanding urban dynamics
Xu, Y., Hu, Y., Gao, S., Zhu, Q., & Zhang, F. (2026). Multimodal flow matching for large-scale human mobility flow generation using satellite imagery and social sensing data. ISPRS Journal of Photogrammetry and Remote Sensing, 239, 291–308. https://doi.org/10.1016/j.isprsjprs.2026.06.006
@article{XU2026291,
title = {Multimodal flow matching for large-scale human mobility flow generation using satellite imagery and social sensing data},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {239},
pages = {291-308},
year = {2026},
issn = {0924-2716},
doi = {https://doi.org/10.1016/j.isprsjprs.2026.06.006},
url = {https://www.sciencedirect.com/science/article/pii/S0924271626003126},
author = {Yichen Xu and Yusen Hu and Song Gao and Qiang Zhu and Feng Zhang}
}train_img_encoder.py: train image and tabular encoders with contrastive loss.get_img_embedding.py: export image encoder representations to a feature CSV.get_tab_embedding.py: export tabular encoder representations to a feature CSV.
python train_img_encoder.py --log 'log/M1_s2_vitB32.log' --pretrained '.../Image2Flow/code/ckpt/M1bands3-s2-vitB32_1e-4_img_120.pth'
python get_img_embedding.py --log 'log/M1_s2_vitB32.log' --ckpt 'M1_s2_vitB32_img_120.pth' --data_path 'M1_NY/s2_clip' --output_path 'outputs/M1/M1_s2_vitB32.csv'`Main M1 training:
python train_OTCNF.py \
--log g64_MMs10_vitB32P \
--node_feats_path outputs/M1/M1_s2_vitB32.csv \
--region M1 \
--year 2020 \
--device cuda:0M1 evaluation:
python test_OTCNF.py \
--log g64_MMs10_vitB32P \
--node_feats_path outputs/M1/M1_s2_vitB32.csv \
--seed 1 \
--region M1Transfer evaluation on M2/M3:
python test_OTCNF_transfer.py \
--log g64_MMs10_vitB32P \
--node_feats_path outputs/M1/M2_s2_vitB32.csv \
--tag trainon \
--region M2This project was benefited from Imagery2Flow, CommutingODGen-Dataset, MMGR. Please read the source projects for further details.