Researchers from the NUST MISIS and NRU HSE Universities found a way to enhance reinforcement learning for neural networks specialized in spatial orientation. By incorporating an attention mechanism that allows models to focus on the most critical input data when making predictions, the efficiency of graph neural networks increased by 15%.
For self-moving devices to navigate in three-dimensional space, neural networks are essential, as the surrounding environment requires rapid reactions to changing conditions. However, this poses one of the most challenging tasks, as neural networks often lack complete information about their current surroundings, such as depth or terrain maps. Furthermore, the network has limited knowledge of reward perspectives, expressed through a mathematical function, as rewards are issued not incrementally but once at the end after completing the entire task. This function aids the network in solving tasks more efficiently and learning.
The authors of the study proposed a new method for forming a reward function, taking into account the specificity of receiving a reward only once after completing the entire problem. It is based on additional secondary rewards, known as reward shaping. The scientists applied two ways to improve the technique proposed by Canadian researchers from McGill University in 2020. The first utilizes advanced aggregation functions, and the second employs an attention mechanism. Advanced aggregation functions consider the order and content observed by the neural network.
The researchers conducted a series of experiments with staged rewards, using two orientation tasks in virtual spaces: 4 Rooms, where the neural network learns simultaneously in 16 spaces, performing 5 million actions to find a box, and Labyrinth, which is randomly generated each time. Successful model training requires completing 20 million steps to find the exit. The scientists found that when forming the reward function based on the attention mechanism, the agent learns to focus on the edges of the graph corresponding to important transitions in the three-dimensional environment—those where the goal comes into the agent’s field of view. This increases the efficiency of neural networks by up to 15%. The experiment details are published in the IEEE Access journal (Q1).
“It was crucial for us to optimize the learning process specifically for graph neural networks. The graph cannot be observed as a whole directly, but for effective training of a graph neural network, it is sufficient to consider its parts. These can be observed as individual trajectories of the agent’s movement. Thus, not all trajectory options are necessary for training. The application of the attention mechanism is a promising solution, as it significantly accelerates the learning process. This acceleration occurs due to considering the Markov process graph structure, which is unavailable to non-graph neural networks,” Ilya Makarov, director of the artificial intelligence center NUST MISIS, head of the AI in Industry group of the AIRI Institute.