Learning Trustworthy Planning Algorithms (LTPA)

This project tackles one of the key challenges in AI: bridging the gap between learning and reasoning. Both approaches have been able to solve challenging problems by themselves, but we need to combine their advantages to make the next leap forward in AI. This project will contribute to this goal by using machine learning to automate the design of trustworthy planning algorithms. In particular, we aim to

Solving these scientific challenges will allow us to build planners that scale to industrial-size tasks and that can be trusted to support humans.

Background

The project is situated in the area of automated planning, one of the original core AI research fields. Automated planning is the task of finding an action sequence that transforms a given initial state into one that satisfies a given goal condition. Traditionally, planning algorithms are domain-independent, that is, they can solve any planning task from any application domain for which there is a model defining the action dynamics. Due to this generality, planners have been successfully used in a wide range of industrial settings, such as robot task planning, space mission planning, computer games, analyzing computer network vulnerabilities, controlling elevators, and automating greenhouses.

We aim to tackle three main scientific challenges within this project.

Learning optimal planning algorithms

The main approach to solving automated planning tasks is state space search with a heuristic, that is, a function that estimates the distance of a given state to the nearest goal state. The heuristics that we created for optimal classical planning are the state of the art in the field. However, devising, understanding and implementing a heuristic is a very tedious and time-intensive task. More importantly, the design space for heuristics is huge, and it is infeasible to manually explore it in a meaningful way. Therefore, we aim to use machine learning to learn heuristics automatically.

Existing work on this topic usually learns black-box models such as deep neural networks. In contrast, we will learn explainable models, that is, models with a predetermined target structure for representing information. Explainable models not only allow reasoning about their decisions, they also often generalize better to unseen data. Most importantly, we will address one of the main drawbacks of heuristic functions approximated by machine learning models: they usually give no guarantees about the learned function. However, to find the provably cheapest plans we need admissible heuristics, that is, distance estimators that never overestimate the true goal distance. Therefore, instead of learning the heuristic directly as done in all previous work, we will use machine learning to learn how to build heuristics that never overestimate goal distances by design. Using this novel angle of attack, we will be the first to construct admissible heuristics with machine learning.

Learning dynamic planning algorithms

Today's planners are highly-parameterized systems and different planning tasks require different planner configurations to be solved efficiently. Consequently, researchers (including ourselves) have developed systems for automatically choosing which planner configuration to run for a new task. However, they all use the same static configuration for the whole solving process, even though different parts of the search space might require different algorithms. For example, it can be beneficial to switch between different heuristics while solving a single planning task. Therefore, we will use reinforcement learning to obtain planning algorithms that dynamically adapt their behavior to different parts of the search space while they search for solutions. The result of our work will be the first fully dynamic, domain-independent planner.

In addition to the setting of automated planning, the general dynamic algorithm configuration framework for learning adaptive solvers is very relevant to many other research and industrial settings as well. Many (if not all) parameterized solvers will benefit from adapting their behavior to the given input, which is why we believe that our work will spark more research into this topic and produce more efficient solvers.

Learning trustworthy agent behavior

Recent years have seen a tremendous amount of progress in the field of artificial intelligence. For example, computer agents can now predict protein structures and play the game of Go far better than human experts. The drawback of these agents is that they learn a black-box behavior model that is impossible for humans to understand and trust. As a consequence, it is dangerous for humans to interact with agents using such models. This includes both physical agents such as robots and nonphysical agents such as computer programs. To obtain trustworthy and reliable agents, we will make a radical departure from black-box models. Instead, we will let the agents learn interpretable models for defining their behavior, based on well-understood modeling languages that humans can reason about.

An agent's behavior is usually modeled by a policy that defines which action to choose in a given state. We will begin by learning policies in the most fundamental setting, deterministic and fully-observable environments, and then gradually extend our approach to more complex scenarios such as probabilistic actions and partially-observable world states. To let our algorithms scale to large real-world scenarios, we will learn the policies incrementally, by starting with simple models and only refining them where necessary. At the end of our project, we will be able to deploy our agents with confidence in settings where they assist and interact with humans.

PI: Jendrik Seipp

Funding: This project is partially supported by the Zenith research organization from the Faculty of Science and Engineering at Linköping University.