DC-58 - Distributed Intelligence on a group of autonomous systems under resource and communication constraints

Computer sciences and mathematics

Distributed Intelligence on a group of autonomous systems under resource and communication constraints

DC-58

IMT Atlantique and Flinders

Brest (France) and Adelaide (Australia)

Host organizations

Hiring Institution
Institut Mines Telecom Atlantique (IMT)

PhD-Awarding Institutions
Institut Mines Telecom Atlantique (IMT)
Flinders University

Position Description

Download the full Position Description in PDF

Proposed projects

Option 1

Distributed Qualitative Case-Based Reasoning and Learning Applied to a Team of Autonomous Maritime Vessels

In [1] we introduced a Case-based reasoning (CBR) algorithm, called Qualitative Case Based Reasoning and Learning (QCBRL), that represents cases by means of a Qualitative Spatial Reasoning (QSR) formalism that also serves as the basis for case retrieval and reuse methods. New cases are learned by partially running a Reinforcement Learning (RL) procedure. The central idea of QCBRL is to model domains using a qualitative spatial representation of directions called Elevated Point Relation Algebra (EOPRA) [2]. QCBRL creates a compact world model representation and solves the exponential space complexity problem described in [3], providing also a qualitative (and therefore closer to human) conceptualisation of spatial information.

Case retrieval and reuse are executed in QCBRL assuming Conceptual Neighbourhood Diagrams (CND) [4] and a qualitative similarity function. These tools are used to compute the similarity between a new problem and an element of the case base, retrieving the most similar case to a given situation and reusing its description to solve a new problem.
New cases are learned by executing a partial RL method; that is, when no similar case is available, QCBRL iterates the RL method (via multiple simulations) until the occurrence of a first successful episode. The set of states and actions of this successful episode is, then, stored as a new case. The case-based maintenance is performed by considering a trust value for each case that is incremented (or decremented) when the retrieved case solves (or not) the problem. Since the partial RL algorithm may return non-optimal actions, the cases with trust values that meet the removal conditions are deleted. With these procedures, the method described in [1] executes the complete CBR cycle of Retrieval, Reuse, Revision and Retention [5] with an explicit representation of the world that allows both, human readability and direct system accountability.

In this preliminary work, although the environment was perceived by multiple agents, the reasoning was centralised on a single agent. Besides, experiments were conducted on a constrained robot soccer scenario. The aim of this PhD project is to investigate distributed implementations of QCBRL in the domain of hybrid teams of autonomous maritime vessels [6] executing search and rescue tasks. A Multi-Agent Reinforcement Learning (MARL) is a promising approach that we will consider to learn typical cases and retrain the distributed system in the field. Recent theoretical results [7] show how the convergence of a truly decentralized MARL can be guaranteed. Navigation tasks can be based on pre-trained models but are more efficient if they can learn online from their actions while detecting/identifying obstacles and targets that are used to update asynchronously a partially shared model of the environment [8].

A second objective of this PhD project is to test and compare the proposed solutions in terms of efficiency, adaptability and implementation feasibility with a hardware-in-the-loop approach [9]. With such an approach, each agent is implemented on a real-life embedded board (e.g. NVIDA Jetson boards) connected to a virtual-reality simulator. In this regard, the AirSim simulation platform (https://microsoft.github.io/AirSim/) based on the Unreal Engine seems to be an interesting solution to provide a scalable and photo-realistic environment for training.

Part of this research will be carried on in collaboration with industrial partners in order to study, build and characterize a prototype of the system.

[1] Thiago Pedro Donadon Homem, Paulo Eduardo Santos, Anna Helena Reali Costa, Reinaldo Augusto da Costa Bianchi, and Ramon Lopez de Mantaras. Qualitative case-based reasoning and learning. Artificial Intelligence, page 103258, 2020.
[2] Reinhard Moratz and Jan Oliver Wallgrun. Spatial reasoning with augmented points: Extending cardinal directions with local distances. Journal of Spatial Information Science, 5(5):1–30, dec 2012.
[3] Stefano V. Albrecht and Peter Stone. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258:66 – 95, 2018.
[4] Christian Freksa. Temporal reasoning based on semi-intervals. Artificial Intelligence, 54(1-2):199–227, March 1992.
[5] Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, March 1994.
[6] Frank Dylla. Qualitative spatial reasoning for navigating agents – behavior formalization with qualitative representations. pages 98–128, 01 2009.
[7] K. Zhang, Z. Yang, H. Liu, T. Zhang and T. Basar, Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents, Proceedings of the 35th Int. Conference on Machine Learning, 2018
[8] H.V. Nguyen, H. Rezatofighi, B.N. Vo, B.N. D.C. Ranasinghe, Multi-objective multi-agent planning for jointly discovering and tracking mobile objects. Proc. of the 34th AAAI Conf. on Artificial Intelligence, 2020
[9] E.Moréac, E. Abdali, F. Berry, D. Heller and J-Ph. Diguet, Hardware-in-the-loop simulation with dynamic partial FPGA reconfiguration applied to computer vision in ROS-based UAV, 31st Int. Work. on Rapid System Prototyping (RSP) 2020.

Option 2

Collaborative Mission Planning and Control of Autonomous Maritime Vehicles

The exploration of unknown environments often presents unforeseen challenges and inherent risks due to the uncertainties involved. While single uninhabited systems are capable of completing complex missions, the introduction of multi-robotic teams can permit increased level of efficiency to a given task, especially when these tasks cover large areas and mission completion time is a critical constraint. The use of heterogeneous robotic teams comprising autonomous underwater vehicles (AUVs), autonomous surface vehicles (ASVs) and seabed crawler vehicles can further facilitate a higher degree of flexibility and redundancy when analysing complex environments comprising diverse environmental conditions. The challenge of traditional robot vehicle team-based localisation and control techniques are however considerably magnified in the underwater domain by the lower reliability and potential asynchronicity of underwater acoustic communications as compared to RF based communication in the above water domain. This complicates possible mission tasking approaches for hybrid teams of autonomous marine vehicles in terms of obtaining a shared understanding of the environment and the team status, thus requiring new solutions for control, coordination, collaboration and communication to overcome these complications.

This project will investigate efficient reasoning processes, communication strategies and underlying low-level control mechanisms necessary to coordinate heterogeneous teams of autonomous marine vehicles, in dynamic and uncertain environments. In this work, the autonomous agents have to achieve a common agreement, via a negotiation procedure, in order to solve complex problems collaboratively. Under this collaborative framework, each agent will have distinct resources, abilities, viewpoints and priorities. We expect this solution to be general enough to allow for heterogeneous teams operating across air, surface and underwater domains, and also between artificial and human agents. Therefore, our aim is the development of a mixed-initiative system [1], where the interaction and negotiation between the agents will maximise their resources in order to optimise the successful execution of a common task.

The negotiation procedure between vehicles will be conducted in game-theoretic terms [2], where the agent interaction is modelled as a cooperative game and the Nash equilibrium (representing the agents’ agreement) will be obtained by online distributed algorithms [3]. This provides efficient task allocation solutions that can be easily extended to consider outside threats along with team collaboration, where the interactions with additional agents are modelled as a non-cooperative game [4]. Negotiation, however, occurs only when there is some level of conflict in the perceived states, assigned actions across the team and the availability of resources within the heterogeneous team. To obtain an efficient problem-solving policy for any given problem we propose to use a novel algorithm, Qualitative Case-Based Reasoning and Learning (QCBRL) [5]. In QCBRL, cases are predetermined solutions for groups of autonomous agents that could be adapted to similar situations. A reinforcement learning (RL) module enables the team of agents to learn new solutions to unforeseen situations at runtime, without assuming a pre-processing step. Extending QCBRL with the game-theoretic delegation model to a team of underwater vehicles is one of the contributions of this project. The interactions between agents during the QCBRL machine learning step will be guided by the search of a Nash equilibrium, guaranteeing a maximal utility value (or common agreement) between the agents.

Part of this research will be carried on in collaboration with industrial partners in order to study, build and characterize a prototype of the system.

[1] Jean Bouchard, Jonathan Gaudreault, Claude-Guy Quimper, Philippe Marier, Edith Brotherton, and Nathaniel Simard. Mixed-initiative system for tactical planning allowing real-time constraint insertions. IFAC-PapersOnLine, 50(1):15233 – 15240, 2017. 20th IFAC World Congress.
[2] Simon Parsons and Michael Wooldridge. Game theory and decision theory in multi-agent systems. Autonomous Agents and Multi-Agent Systems, 5:243–254, 09 2002.
[3] K. Lu, G. Li, and L. Wang. Online distributed algorithms for seeking generalized nash equilibria in dynamic environments. IEEE Transactions on Automatic Control, pages 1–1, 2020.
[4] Cheng-Kuang Wu and Xingwei Hu. A game theory approach for multi-agent system resources allocation against outside threats. Journal of Risk Analysis and Crisis Response, 9, 11 2019.
[5] Thiago Homem, Paulo E. Santos, Anna H. Reali Costa, Reinaldo A. C. Bianchi, and Ramon Lopez de Mantaras. Qualitative case-based reasoning and learning. Artificial Intelligence, 283:103258, 2020.

Option 3

Case-based reasoning with a Multi-Agent Reinforcement Learning on a Distributed System including Humans

As for the other two PhD projects, we consider a swarm of autonomous or remote-controlled vehicles or robots for a mission in an environment which is dynamic and not fully known. A typical scenario consists of a search & rescue mission with both UAV (unmanned aerial vehicle) and AGV (autonomous ground vehicle) or Robots or USV (unmanned surface vehicle) on the water. In this context, UAV can explore and detect with a top view, while AGV/USV/Robots can explore and detect with a side view and perform rescue mission. All of these vehicles and Robots have embedded computing, storage and communications resources which are limited but can be used to learn and execute.

This PhD project aim to investigate the use of case-based reasoning [1] to decide and apply the best strategy to maximize the mission success under real-life constraints which are: i) a truly distributed system with ego-centric viewpoints and asynchronized agent states, world knowledge and models, ii) limited communications that do not allow a centralized solution and a synchronous approach and iii) limited computing and storage resources. Case-based reasoning relies on four main steps: case retrieval, adaptation, evaluation and training. The first important objective of the PhD project is to investigate how to leverage the group of agents to parallelize the execution of each step including retraining which can be extremely computationally demanding. A Multi-Agent Reinforcement Learning (MARL) is a promising approach that we will consider to learn typical cases and retrain the distributed system in the field. Recent theoretical results [2] show how the convergence of a truly decentralized MARL can be guaranteed. Navigation tasks can be based on pre-trained models but are more efficient if they can learn online from their actions while detecting/identifying obstacles and targets that are used to update asynchronously a partially shared model of the environment [3].
The second objective of the PhD project is to consider a hybrid team with machines and humans. There are different ways to include humans in such a system. We are interested to study how a truly distributed group of artificial agents with local perceptions can adapt and learn from the human behaviours and decisions and then how humans adapt their strategies accordingly. We aim to test and compare the proposed approach within an environment where humans can control some of the drones/robots. The framework COGMENT [5] recently introduced by AIR (https://ai-r.com/) is a good candidate as it is the first one that allows to train and test Human-MARL systems. The second dimension to explore is to consider human observers who can from time to time adapt the rewards [6] based on subjective criteria.

Part of this research will be carried on in collaboration with industrial partners in order to study, build and characterize a prototype of the system.

[1] T.P. Donadon Homem, P.E. Santos, A.H.R. Costa, R.A. da Costa Bianchi, and R.L. de Mantaras. Qualitative case-based reasoning and learning. Artificial Intelligence, 2020.
[2] K. Zhang, Z. Yang, H. Liu, T. Zhang and T. Basar, Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents, Proceedings of the 35th Int. Conference on Machine Learning, 2018
[3] H.V. Nguyen, H. Rezatofighi, B.N. Vo, B.N. D.C. Ranasinghe, Multi-objective multi-agent planning for jointly discovering and tracking mobile objects. Proc. of the 34th AAAI Conf. on Artificial Intelligence, 2020
[4] P. Gautier, J. Laurent, J-Ph. Diguet. Deep Q-Learning-Based Dynamic Management of a Robotic Cluster. IEEE Transactions on Automation Science and Engineering, 2022.
[5] N.Navidi, F.Chabot, S.Kurandwad, I.Lutigman, V.Robert, G.Szriftgiser and A.Schuch, Human and Multi-Agent collaboration in a human-MARL teaming framework, arXiv:2006.07301v2.
[6] P. Christiano, J. Leike, T.B. Brown, M.Martic, S.Legg and D.Amodei, Deep Reinforcement Learning from Human Preferences, NIPS17.

Supervisors

Amer Baghdadi
Paulo Santos
Karl Sammut

Research Areas

Artificial Intelligence, Multi-agent systems, Embedded Systems, Maritime Engineering.