Robot Information Intelligent Perception and Navigation Based on Multi-Sensor Fusion

Authors

  • Xibin Li School of Safety Engineering, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
  • Yiyang Luo Purchasing Department, Sinopec International Business Tianjin Co., Ltd., Tianjin 300042, Tianjin, China
  • Mingsong Bao School of Safety Engineering, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
  • Haoming Sun Technical Research and Development Department, Shandong Tesla Robot Co., Ltd.,Yantai 264006, Shandong, China

Abstract

Various sensors are different in terms of time synchronization, data dimension, and sampling frequency, which makes the deep fusion of heterogeneous data difficult. In addition, the existing high-precision fusion algorithms rely heavily on computational resources and cannot meet the needs of lightweight robots with limited computing power. To address these issues, this research work applies a fusion method based on a multimodal convolutional neural network (2D-ResNet-50 + 3D-CNN) and a cross-modal attention mechanism to ensure the synchronization and unified format of multi-sensor data by means of data preprocessing and time alignment technology. Then, a convolutional neural network is used to extract features from visual image data, laser radar point cloud data, and inertial measurement unit (IMU) data, and the information from different sensors is fused through a cross-modal attention mechanism. A modular architecture is used to optimize the computational efficiency of the system. The system is
divided into multiple independent modules, each responsible for a specific task. The event trigger mechanism is used to dynamically activate and schedule related modules to enhance the system’s intelligence. This method is deployed on NVIDIA’s Jetson Xavier NX platform, and the experiments are conducted under the Robot Operating System framework. Experiments show that the robot’s control error does not exceed 0.25 when performing path tracking tasks. The path planning time in various environments does not exceed 150 milliseconds. This method can improve perception precision while maintaining high real-time performance and efficiency with limited computational resources, significantly optimizing the robot’s navigation performance in complex dynamic environments.

Keywords: multi-sensor fusion, robot navigation, deep learning, cross-modal attention mechanism, modular architecture

Cite As

X. Li, Y. Luo, M. Bao, H. Sun, "Robot Information Intelligent Perception and Navigation Based on Multi-Sensor Fusion",
Engineering Intelligent Systems, vol. 34 no. 2, pp. 273-284, 2026.

Published

2026-03-01