Hierarchical Multi-Agent Deep Reinforcement Learning Architecture in Complex Industrial Production Scheduling

Authors

  • Haibo Peng Yunnan Open University, Kunming 650500,Yunnan,China
  • Guixiong Li Baosight (Yunnan) Co., Ltd, Kunming 650500,Yunnan, China
  • Zhibo Zhang Yunnan Open University, Kunming 650500,Yunnan,China
  • Rong Zhou Baosight (Yunnan) Co., Ltd, Kunming 650500,Yunnan, China

Abstract


Complex industrial production scheduling problems are characterized by high dimensionality, dynamics, and multi-objectives. Existing scheduling methods are usually based on mathematical programming or a single-agent architecture, which have problems such as insufficient model complexity and dynamics, difficulty in dealing with high-dimensional problems, and poor scalability. In particular, they have limited performance when facing multi-task collaborative optimization and real-time environmental changes.Therefore, we designed a hierarchical multi-agent deep reinforcement learning (DRL) architecture that separates global task allocation from local execution through a hierarchical structure. The high-level deep Q-network
(DQN) is used for resource allocation, while the low-level proximal policy optimization (PPO) is used to achieve fine scheduling. The centralized training with a decentralized execution (CTDE) framework is included to coordinate the behavior of multiple agents. At the same time, the graph neural network (GNN) is used to model task dependencies and design reward functions to balance short-term and long-term objectives, thereby improving scheduling efficiency, flexibility, and adaptability to uncertainty and providing an efficient and intelligent solution for complex industrial production scheduling. Experimental results show that the hierarchical multi-agent deep reinforcement learning system is significantly better than other algorithms in terms of key indicators such as task completion time, resource utilization, system delay rate, cost saving rate, and scheduling failure rate. In large-scale task scheduling, the average task completion time is reduced by 15%–25% compared with other methods, and the system delay rate is kept at around 5%. In a dynamic environment’s equipment failure recovery scenario, the scheduling failure rate drops rapidly from 18% to below 10%, demonstrating its efficient global optimization and local adjustment capabilities. The research results show that the hierarchical multi-agent method provides an efficient and flexible solution for complex industrial production scheduling, which has important theoretical value and practical significance.

Keywords: Complex Industrial Production Scheduling, Deep Reinforcement Learning, Hierarchical Multi-agent, Centralized Training with Decentralized Execution, Graph Neural Network

Cite As

H. Peng, G. Li, Z. Zhang, R. Zhou, "Hierarchical Multi-Agent Deep Reinforcement Learning Architecture in Complex
Industrial Production Scheduling", Engineering Intelligent Systems, vol. 34 no. 2, pp. 221-233, 2026.



Published

2026-03-01