当前位置:网站首页>动态DAG计划的几何深度强化学习(AL)

动态DAG计划的几何深度强化学习(AL)

2020-12-07 19:17:13 田冠宇

在实践中,面对包含不确定性以及不确定性和动态性的组合优化问题是很常见的。这三个属性要求适当的算法。强化学习(RL)是一种非常自然的方式。如今,尽管做出了一些努力,但大多数现实生活中的组合优化问题仍然不在强化学习算法的范围之内。 在本文中,我们提出了一种强化学习方法来解决实际的调度问题,并将其应用于高性能计算社区中普遍执行的算法Cholesky分解。与静态调度相反,在静态调度中,任务是在并行执行开始之前以预定顺序分配给处理器的,而我们的方法是动态的:任务分配及其执行顺序是在运行时根据系统状态和意外事件决定的,这提供了更大的灵活性。为此,我们的算法将图神经网络与参与者评论算法(A2C)结合使用,以动态构建问题的自适应表示。 我们证明了这种方法与高性能计算运行时系统中使用的最新启发式技术相比具有竞争力。此外,我们的算法不需要明确的环境模型,但是我们证明了可以轻松合并额外的知识并提高性能。我们还将展示此RL方法提供的关键属性,并研究其向其他实例的转移能力。

原文题目:Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling

原文:In practice, it is quite common to face combinatorial optimization problems which contain uncertainty along with non-determinism and dynamicity. These three properties call for appropriate algorithms; reinforcement learning (RL) is dealing with them in a very natural way. Today, despite some efforts, most real-life combinatorial optimization problems remain out of the reach of reinforcement learning algorithms. In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem, and apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization. On the contrary to static scheduling, where tasks are assigned to processors in a predetermined ordering before the beginning of the parallel execution, our method is dynamic: task allocations and their execution ordering are decided at runtime, based on the system state and unexpected events, which allows much more flexibility. To do so, our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly. We show that this approach is competitive with state-of-the-art heuristics used in high-performance computing runtime systems. Moreover, our algorithm does not require an explicit model of the environment, but we demonstrate that extra knowledge can easily be incorporated and improves performance. We also exhibit key properties provided by this RL approach, and study its transfer abilities to other instances.

原文作者:Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux

原文地址:https://arxiv.org/abs/2011.04333

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

版权声明
本文为[田冠宇]所创,转载请带上原文链接,感谢
https://cloud.tencent.com/developer/article/1747615