Based on MARL (Multi-Agent Reinforcement Learning),
this project provides a dynamic coverage control algorithm for UAV swarm.
Our task is to plan the flight route of the UAV swarm
so that all discrete PoIs (Points of Interest) can be covered for a certain period of time.
基于多智能体强化学习, 本项目提供了一个面向无人机集群的动态覆盖控制方法.
我们的任务是规划无人机集群的轨迹, 使得在一段时间内, 所有PoIs都能被覆盖住.
Considering the particularity of UAV swarm control,
this project focuses on analyzing how to maintain the communication connectivity of the swarm during execution.
进阶要求是无人机在执行任务时连通保持(在后续给出定义)
There are
有
Problem can be formulated as follow,
, where
,where
We build a dynamic coverage environment based on Multiagent-Particle-Envs.
Class CoverageWorld
inherits from multiagent.core.world
, in whose step()
, the power and energy are calculated, and the PoI state is updated.
multiagent/scenarios/coverage.py
describes the dynamic coverage scenario.
multiagent/render.py
has been modified to display in real time the current power obtained by PoIs and the communication between the UAVs.
Some other changes, such as adding connectivity maintaining constraints, revising action according to constraints, will be mentioned later.
我们基于Multiagent-Particle-Envs构建了一个动态覆盖环境。
CoverageWorld类继承自multiagent.core.world,在其step()方法中计算功率和能量,并更新了PoI的状态(通过判断已获得功率和所需功率, 判定是否已经完成覆盖)。
scenarios中的文件描述了动态覆盖场景, 其中coverage1.py是不考虑连通保持的版本, coverage2.py是考虑连通保持的版本.
multiagent/render.py被修改以实时显示PoIs获得的当前功率以及无人机之间的通信连通情况。
其他一些更改,比如添加保持连接性的约束条件,根据约束条件修改动作,稍后会提到。
The agent's observations include its own position and velocity,
as well as the relative positions of other agents and PoIs.
The actions of the agent include forward, backward, left, and right, and keeping still.
As a purely cooperative scenario, the rewards of all agents are the same and are set as follows,
Agent的观测包括自身的位置和速度,以及其他agent和PoIs的相对位置, 当前能量, 所需能量, 是否完成覆盖的标志位。
Agent的动作包括前进、后退、向左、向右和保持静止。
作为一个纯粹的合作场景,所有agent的奖励都相同,并设置如下:
,where
其中,
判断下一时刻若失去连通,则在失去连通的无人机之间产生连通保持力,其满足
证明略
The trained resulted is displayed as follow, (2 and 3 is under connectivity preservation)
MAPPO-based code in uav_dcc_control 基于MAPPO算法的torch的代码在uav_dcc_control中, 目前实现了场景1(无连通保持约束下的覆盖)和场景2(规则约束下的连通保持覆盖).
场景3(基于动作矫正器的连通保持覆盖)在基于tensorflow的代码中, 我已经看不懂了, 能看懂tf1的可以试着看一下
conda create -n dcc python==3.9
pip3 install torch torchvision torchaudio omegaconf wandb
pip install gym==0.10.5
pip install pyglet==1.5.27 # optional for render
python train.py 0 # "0" means cuda:0, if cuda is not available, subject "0" with any int
其中sys变量0表示调用cuda:0, 如果cuda不可用, 则会使用cpu