In the scene where multiple agents exist, agents complete a task in a cooperative way (maximizing a global cumulative reward), and users accessing the same channel can interact with each other. It is required that each agent not only consider the observed state of itself, but also consider the actions and strategies of other agents, so as to learn a strategy to maximize the global cumulative reward.
正在翻译中..