Spatio-Temporal Calibration of Multiple Kinect Cameras Using 3D Human Pose

Research output: Contribution to journalArticlepeer-review


RGB and depth cameras are extensively used for the 3D tracking of human pose and motion. Typically, these cameras calculate a set of 3D points representing the human body as a skeletal structure. The tracking capabilities of a single camera are often affected by noise and inaccuracies due to occluded body parts. Multiple-camera setups offer a solution to maximize coverage of the captured human body and to minimize occlusions. According to best practices, fusing information across multiple cameras typically requires spatio-temporal calibration. First, the cameras must synchronize their internal clocks. This is typically performed by physically connecting the cameras to each other using an external device or cable. Second, the pose of each camera relative to the other cameras must be calculated (Extrinsic Calibration). The state-of-the-art methods use specialized calibration session and devices such as a checkerboard to perform calibration. In this paper, we introduce an approach to the spatio-temporal calibration of multiple cameras which is designed to run on-the-fly without specialized devices or equipment requiring only the motion of the human body in the scene. As an example, the system is implemented and evaluated using Microsoft Azure Kinect. The study shows that the accuracy and robustness of this approach is on par with the state-of-the-art practices.

Original languageEnglish
Article number8900
Issue number22
StatePublished - Nov 2022

Bibliographical note

Publisher Copyright:
© 2022 by the authors.


  • 3D human pose estimation
  • Azure Kinect
  • depth sensor
  • extrinsic calibration
  • motion capture
  • multiple-camera setup
  • synchronization

ASJC Scopus subject areas

  • Analytical Chemistry
  • Information Systems
  • Biochemistry
  • Atomic and Molecular Physics, and Optics
  • Instrumentation
  • Electrical and Electronic Engineering


Dive into the research topics of 'Spatio-Temporal Calibration of Multiple Kinect Cameras Using 3D Human Pose'. Together they form a unique fingerprint.

Cite this