In the game-theoretic approach toward controller synthesis, we model the interaction between a system to control and its environment as (some variation of) game between these entities, and we are looking for an appropriate (e.g., winning or optimal) strategy of the system. This strategy then constitutes a formal blueprint for a real-world controller for the system (Randour, 2013; Clarke et al., 2018).
The general consensus is that simple (e.g., using limited memory) strategies are better: corresponding controllers will be easier to conceive and understand, and cheaper to produce and maintain. Yet, the traditional formal model used to represent strategies is some sort of finite automaton(Fijalkow et al., 2023), which is usually far from what we expect in practice. In particular, our theoretical measures of strategy complexity are somewhat disconnected from what can be considered as complex or not in implemented controllers. Hence, we challenge the key concept of strategy and how to assess their complexity, and we study alternative strategy models (e.g., augmented automata, decision trees, RNNs) over the complete span from theoretical power to proper applicability. My FNRS project ControlleRS is dedicated to this research program.
References
ECCS
Automated Synthesis of Reliable and Efficient Systems Through Game Theory: A Case Study
Mickael Randour
In Proceedings of the European Conference on Complex Systems 2012, ECCS 2012, Brussels, Belgium, September 2-7, 2012 , 2013
Reactive computer systems bear inherent complexity due to continuous interactions with their environment. While this environment often proves to be uncontrollable, we still want to ensure that critical computer systems will not fail, no matter what they face. Examples are legion: railway traffic, power plants, plane navigation systems, etc. Formal verification of a system may ensure that it satisfies a given specification, but only applies to an already existing model of a system. In this work, we address the problem of synthesis: starting from a specification of the desired behavior, we show how to build a suitable system controller that will enforce this specification. In particular, we discuss recent developments of that approach for systems that must ensure Boolean behaviors (e.g., reachability, liveness) along with quantitative requirements over their execution (e.g., never drop out of fuel, ensure a suitable mean response time). We notably illustrate a powerful, practically useable algorithm for the automated synthesis of provably safe reactive systems.
Book
Handbook of Model Checking
Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem
The objective of this collaborative textbook is to present the state of the art on games on graphs, which is part of a larger research topic called game theory. Games on graphs is the field concerned with games whose rules and evolution are represented by a graph.
Randomness in strategies
Optimal strategies may require randomization (i.e., the ability to take probabilistic decisions) when dealing with inherently probabilistic goals, balancing multiple objectives, or in contexts of partial information. There are multiple ways to augment strategies (and related strategy representation models) with randomness. We study the expressive power of these different types of randomness: we notably establish a complete taxonomy for finite-memory strategies, valid up to multi-player games of imperfect information, in (Main & Randour, 2022).
References
CONCUR
Different Strokes in Randomised Strategies: Revisiting Kuhn’s Theorem Under Finite-Memory Assumptions
Two-player (antagonistic) games on (possibly stochastic) graphs are a prevalent model in theoretical computer science, notably as a framework for reactive synthesis. Optimal strategies may require randomisation when dealing with inherently probabilistic goals, balancing multiple objectives, or in contexts of partial information. There is no unique way to define randomised strategies. For instance, one can use so-called mixed strategies or behavioural ones. In the most general settings, these two classes do not share the same expressiveness. A seminal result in game theory - Kuhn’s theorem - asserts their equivalence in games of perfect recall. This result crucially relies on the possibility for strategies to use infinite memory, i.e., unlimited knowledge of all past observations. However, computer systems are finite in practice. Hence it is pertinent to restrict our attention to finite-memory strategies, defined as automata with outputs. Randomisation can be implemented in these in different ways: the initialisation, outputs or transitions can be randomised or deterministic respectively. Depending on which aspects are randomised, the expressiveness of the corresponding class of finite-memory strategies differs. In this work, we study two-player turn-based stochastic games and provide a complete taxonomy of the classes of finite-memory strategies obtained by varying which of the three aforementioned components are randomised. Our taxonomy holds both in settings of perfect and imperfect information, and in games with more than two players.
Memory bounds
The vast majority of simple objectives (which encode specifications) in two-player (stochastic) games admit memoryless optimal strategies. That is, controllers for the system only need to look at the current state to act optimally, regardless of what happened in the past. This usually does not hold when considering more complex objectives, such as, e.g., conjunctions of simple objectives, which are needed to accurately model the richness of real-world situations. An important research area is therefore devoted to understanding for which objectives we need memory and how much. We notably characterize the objectives for which arena-independent memory suffices, in a variety of settings (Bouyer et al., 2022; Bouyer et al., 2023; Bouyer et al., 2023; Bouyer et al., 2022), or provide tight bounds for specific types of objectives (Bouyer et al., 2022).
References
FSTTCS
The True Colors of Memory: A Tour of Chromatic-Memory Strategies in Zero-Sum Games on Graphs (Invited Talk)
In 42nd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2022, December 18-20, 2022, IIT Madras, Chennai, India , 2022
Two-player turn-based zero-sum games on (finite or infinite) graphs are a central framework in theoretical computer science - notably as a tool for controller synthesis, but also due to their connection with logic and automata theory. A crucial challenge in the field is to understand how complex strategies need to be to play optimally, given a type of game and a winning objective. In this invited contribution, we give a tour of recent advances aiming to characterize games where finite-memory strategies suffice (i.e., using a limited amount of information about the past). We mostly focus on so-called chromatic memory, which is limited to using colors - the basic building blocks of objectives - seen along a play to update itself. Chromatic memory has the advantage of being usable in different game graphs, and the corresponding class of strategies turns out to be of great interest to both the practical and the theoretical sides
TheoretiCS
Characterizing Omega-Regularity through Finite-Memory Determinacy of Games on Infinite Graphs
We consider zero-sum games on infinite graphs, with objectives specified as sets of infinite words over some alphabet of colors. A well-studied class of objectives is the one of ω-regular objectives, due to its relation to many natural problems in theoretical computer science. We focus on the strategy complexity question: given an objective, how much memory does each player require to play as well as possible? A classical result is that finite-memory strategies suffice for both players when the objective is ω-regular. We show a reciprocal of that statement: when both players can play optimally with a chromatic finite-memory structure (i.e., whose updates can only observe colors) in all infinite game graphs, then the objective must be ω-regular. This provides a game-theoretic characterization of ω-regular objectives, and this characterization can help in obtaining memory bounds. Moreover, a by-product of our characterization is a new one-to-two-player lift: to show that chromatic finite-memory structures suffice to play optimally in two-player games on infinite graphs, it suffices to show it in the simpler case of one-player games on infinite graphs. We illustrate our results with the family of discounted-sum objectives, for which ω-regularity depends on the value of some parameters.
LMCS
Arena-Independent Finite-Memory Determinacy in Stochastic Games
We study stochastic zero-sum games on graphs, which are prevalent tools to model decision-making in presence of an antagonistic opponent in a random environment. In this setting, an important question is the one of strategy complexity: what kinds of strategies are sufficient or required to play optimally (e.g., randomization or memory requirements)? Our contributions further the understanding of arena-independent finite-memory (AIFM) determinacy, i.e., the study of objectives for which memory is needed, but in a way that only depends on limited parameters of the game graphs. First, we show that objectives for which pure AIFM strategies suffice to play optimally also admit pure AIFM subgame perfect strategies. Second, we show that we can reduce the study of objectives for which pure AIFM strategies suffice in two-player stochastic games to the easier study of one-player stochastic games (i.e., Markov decision processes). Third, we characterize the sufficiency of AIFM strategies through two intuitive properties of objectives. This work extends a line of research started on deterministic games to stochastic ones.
LMCS
Games Where You Can Play Optimally with Arena-Independent Finite Memory
For decades, two-player (antagonistic) games on graphs have been a framework of choice for many important problems in theoretical computer science. A notorious one is controller synthesis, which can be rephrased through the game-theoretic metaphor as the quest for a winning strategy of the system in a game against its antagonistic environment. Depending on the specification, optimal strategies might be simple or quite complex, for example having to use (possibly infinite) memory. Hence, research strives to understand which settings allow for simple strategies. In 2005, Gimbert and Zielonka provided a complete characterization of preference relations (a formal framework to model specifications and game objectives) that admit memoryless optimal strategies for both players. In the last fifteen years however, practical applications have driven the community toward games with complex or multiple objectives, where memory – finite or infinite – is almost always required. Despite much effort, the exact frontiers of the class of preference relations that admit finite-memory optimal strategies still elude us. In this work, we establish a complete characterization of preference relations that admit optimal strategies using arena-independent finite memory, generalizing the work of Gimbert and Zielonka to the finite-memory case. We also prove an equivalent to their celebrated corollary of great practical interest: if both players have optimal (arena-independent-)finite-memory strategies in all one-player games, then it is also the case in all two-player games. Finally, we pinpoint the boundaries of our results with regard to the literature: our work completely covers the case of arena-independent memory (e.g., multiple parity objectives, lower- and upper-bounded energy objectives), and paves the way to the arena-dependent case (e.g., multiple lower-bounded energy objectives).
CONCUR
Half-Positional Objectives Recognized by Deterministic Büchi Automata
In two-player games on graphs, the simplest possible strategies are those that can be implemented without any memory. These are called positional strategies. In this paper, we characterize objectives recognizable by deterministic Büchi automata (a subclass of omega-regular objectives) that are half-positional, that is, for which the protagonist can always play optimally using positional strategies (both over finite and infinite graphs). Our characterization consists of three natural conditions linked to the language-theoretic notion of right congruence. Furthermore, this characterization yields a polynomial-time algorithm to decide half-positionality of an objective recognized by a given deterministic Büchi automaton.
Refinement mechanisms in synthesis
Programming is an inherently iterative process: when was the last time you coded a feature that worked right away without any debugging? Yet, we often assume that synthesis must be a push-a-button-once process where either you get a perfect controller from the get-go, or you are guaranteed there is no suitable one. But what happens if your specification is ill-defined and needs to be amended, or if your system needs to be patched to permit the existence of a suitable controller? This is what we are working on: techniques to help when synthesis fails. In particular, we are studying appropriate notions of counter-examples à la (Wimmer et al., 2012) to understand where a system fails and how to fix it, hence enabling synthesis through iterative refinements.
References
TACAS
Minimal Critical Subsystems for Discrete-Time Markov Models
Ralf Wimmer, Nils Jansen, Erika Ábrahám, Bernd Becker, and Joost-Pieter Katoen
In Tools and Algorithms for the Construction and Analysis of Systems - 18th International Conference, TACAS 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24 - April 1, 2012. Proceedings , 2012
Abstraction is at the core of formal methods: we analyze complex systems by looking at simplified versions of them. Yet, it is often the case that the first formal model we work with is not abstract enough, and that we need to keep abstracting even further to enable effective model checking or synthesis. It may be necessary because the model under study is simply too large for practical purposes, or possesses complex features that are difficult to handle with classical techniques (e.g., continuous state spaces). Hence, several axes described on this page involve dealing with abstractions as a key mechanism. We notably use it to provide faster algorithms for classical problems (à la (Kattenbelt et al., 2010)), or to enable guarantee-driven reasoning in continuous models (à la (Wooding & Lavaei, 2024)).
References
FMSD
A game-based abstraction-refinement framework for Markov decision processes
Mark Kattenbelt, Marta Z. Kwiatkowska, Gethin Norman, and David Parker
IMPaCT: A Parallelized Software Tool for IMDP Construction and Controller Synthesis with Convergence Guarantees
Ben Wooding, and Abolfazl Lavaei
In Proceedings of the 27th ACM International Conference on Hybrid Systems: Computation and Control, HSCC 2024, Hong Kong SAR, China, May 14-16, 2024 , 2024
Historically, game-theoretic models have been focused on single-objective scenarios, where the goal of the system is either to ensure satisfaction of a qualitative property (e.g., avoiding unsafe states) or to optimize a given payoff function (e.g., minimize the energy consumption). Richer models are needed to accurately model many real-world situations: trade-offs may occur between different resources (e.g., decreasing response time requires additional computing power and energy consumption) but also between different behavioral models (e.g., average-case vs. worst-case performance). Our group develops numerous techniques and tools dedicated to such multi-objective reasoning, e.g., (Brihaye et al., 2020; Randour et al., 2017; Bruyère et al., 2017). My FNRS project ManySynth was dedicated to this research program. See also my chapter on the subject in (Fijalkow et al., 2023).
References
Book
Games on Graphs
Nathanaël Fijalkow, Nathalie Bertrand, Patricia Bouyer-Decitre, Romain Brenguier, Arnaud Carayol, John Fearnley, Hugo Gimbert, Florian Horn, Rasmus Ibsen-Jensen, Nicolas Markey, Benjamin Monmege, Petr Novotný, Mickael Randour, Ocan Sankur, Sylvain Schmitz, Olivier Serre, and Mateusz Skomra
The objective of this collaborative textbook is to present the state of the art on games on graphs, which is part of a larger research topic called game theory. Games on graphs is the field concerned with games whose rules and evolution are represented by a graph.
LMCS
Life is Random, Time is Not: Markov Decision Processes with Window Objectives
The window mechanism was introduced by Chatterjee et al. to strengthen classical game objectives with time bounds. It permits to synthesize system controllers that exhibit acceptable behaviors within a configurable time frame, all along their infinite execution, in contrast to the traditional objectives that only require correctness of behaviors in the limit. The window concept has proved its interest in a variety of two-player zero-sum games because it enables reasoning about such time bounds in system specifications, but also thanks to the increased tractability that it usually yields. In this work, we extend the window framework to stochastic environments by considering Markov decision processes. A fundamental problem in this context is the threshold probability problem: given an objective it aims to synthesize strategies that guarantee satisfying runs with a given probability. We solve it for the usual variants of window objectives, where either the time frame is set as a parameter, or we ask if such a time frame exists. We develop a generic approach for window-based objectives and instantiate it for the classical mean-payoff and parity objectives, already considered in games. Our work paves the way to a wide use of the window mechanism in stochastic models.
FMSD
Percentile queries in multi-dimensional Markov decision processes
Markov decision processes (MDPs) with multi-dimensional weights are useful to analyze systems with multiple objectives that may be conflicting and require the analysis of trade-offs. We study the complexity of percentile queries in such MDPs and give algorithms to synthesize strategies that enforce such constraints. Given a multi-dimensional weighted MDP and a quantitative payoff function f, thresholds v_i (one per dimension), and probability thresholds α_i, we show how to compute a single strategy to enforce that for all dimensions i, the probability of outcomes ρ satisfying f_i(ρ)≥v_i is at least α_i. We consider classical quantitative payoffs from the literature (sup, inf, lim sup, lim inf, mean-payoff, truncated sum, discounted sum). Our work extends to the quantitative case the multi-objective model checking problem studied by Etessami et al. in unweighted MDPs.
Inf. Comput.
Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games
We extend the quantitative synthesis framework by going beyond the worst-case. On the one hand, classical analysis of two-player games involves an adversary (modeling the environment of the system) which is purely antagonistic and asks for strict guarantees. On the other hand, stochastic models like Markov decision processes represent situations where the system is faced to a purely randomized environment: the aim is then to optimize the expected payoff, with no guarantee on individual outcomes. We introduce the beyond worst-case synthesis problem, which is to construct strategies that guarantee some quantitative requirement in the worst-case while providing an higher expected value against a particular stochastic model of the environment given as input. This problem is relevant to produce system controllers that provide nice expected performance in the everyday situation while ensuring a strict (but relaxed) performance threshold even in the event of very bad (while unlikely) circumstances. We study the beyond worst-case synthesis problem for two important quantitative settings: the mean-payoff and the shortest path. In both cases, we show how to decide the existence of finite-memory strategies satisfying the problem and how to synthesize one if one exists. We establish algorithms and we study complexity bounds and memory requirements.
Continuous and timed systems
A key decision in modeling real-world systems is how we handle time: either as a sequence of discrete steps with events only happening at predefined ticks or as a continuous flow where events can take place at any point. While the second model is arguably more realistic (which can be necessary for time-sensitive applications), the resulting models are usually more difficult to analyze, often leading to undecidability when combined with complex specifications. Our team focuses on developing expressive frameworks while maintaining tractability through novel concepts, both in timed automata and games (Main et al., 2022; Main et al., 2021), and in hybrid systems (Bouyer et al., 2022).
In Formal Modeling and Analysis of Timed Systems - 20th International Conference, FORMATS 2022, Warsaw, Poland, September 13-15, 2022, Proceedings , 2022
The window mechanism, introduced by Chatterjee et al. for mean-payoff and total-payoff objectives in two-player turn-based games on graphs, refines long-term objectives with time bounds. This mechanism has proven useful in a variety of settings, and most recently in timed systems. In the timed setting, the so-called fixed timed window parity objectives have been studied. A fixed timed window parity objective is defined with respect to some time bound and requires that, at all times, we witness a time frame, i.e., a window, of size less than the fixed bound in which the smallest priority is even. In this work, we focus on the bounded timed window parity objective. Such an objective is satisfied if there exists some bound for which the fixed objective is satisfied. The satisfaction of bounded objectives is robust to modeling choices such as constants appearing in constraints, unlike fixed objectives, for which the choice of constants may affect the satisfaction for a given bound. We show that verification of bounded timed window objectives in timed automata can be performed in polynomial space, and that timed games with these objectives can be solved in exponential time, even for multi-objective extensions. This matches the complexity classes of the fixed case. We also provide a comparison of the different variants of window parity objectives.
CONCUR
Time Flies When Looking out of the Window: Timed Games with Window Parity Objectives
The window mechanism was introduced by Chatterjee et al. to reinforce mean-payoff and total-payoff objectives with time bounds in two-player turn-based games on graphs. It has since proved useful in a variety of settings, including parity objectives in games and both mean-payoff and parity objectives in Markov decision processes. We study window parity objectives in timed automata and timed games: given a bound on the window size, a path satisfies such an objective if, in all states along the path, we see a sufficiently small window in which the smallest priority is even. We show that checking that all time-divergent paths of a timed automaton satisfy such a window parity objective can be done in polynomial space, and that the corresponding timed games can be solved in exponential time. This matches the complexity class of timed parity games, while adding the ability to reason about time bounds. We also consider multi-dimensional objectives and show that the complexity class does not increase. To the best of our knowledge, this is the first study of the window mechanism in a real-time setting.
Inf. Comput.
Decisiveness of stochastic systems and its application to hybrid models
In [ABM07], Abdulla et al. introduced the concept of decisiveness, an interesting tool for lifting good properties of finite Markov chains to denumerable ones. Later, this concept was extended to more general stochastic transition systems (STSs), allowing the design of various verification algorithms for large classes of (infinite) STSs. We further improve the understanding and utility of decisiveness in two ways. First, we provide a general criterion for proving decisiveness of general STSs. This criterion, which is very natural but whose proof is rather technical, (strictly) generalizes all known criteria from the literature. Second, we focus on stochastic hybrid systems (SHSs), a stochastic extension of hybrid systems. We establish the decisiveness of a large class of SHSs and, under a few classical hypotheses from mathematical logic, we show how to decide reachability problems in this class, even though they are undecidable for general SHSs. This provides a decidable stochastic extension of o-minimal hybrid systems. [ABM07] Parosh A. Abdulla, Noomene Ben Henda, and Richard Mayr. 2007. Decisive Markov Chains. Log. Methods Comput. Sci. 3, 4 (2007).
RL x FM
In a sense, there are two worlds: the world of AI and learning, which offers unparalleled efficiency but few guarantees, and the one of logic, which provides a fundamental framework to reason about safety, but usually lacks efficiency (as most formal methods problems have inherently high computational complexity and thus suffer from poor scalability). Our group participates in a blooming effort to combine the formal reasoning from logic and the power of learning (e.g., (Brázdil et al., 2020)). We develop techniques mixing reinforcement learning and exact methods for reactive synthesis, with a particular focus on the application contexts developed in the axes above, such as the expressive multi-objective framework of (Randour et al., 2017).
References
FMSD
Percentile queries in multi-dimensional Markov decision processes
Markov decision processes (MDPs) with multi-dimensional weights are useful to analyze systems with multiple objectives that may be conflicting and require the analysis of trade-offs. We study the complexity of percentile queries in such MDPs and give algorithms to synthesize strategies that enforce such constraints. Given a multi-dimensional weighted MDP and a quantitative payoff function f, thresholds v_i (one per dimension), and probability thresholds α_i, we show how to compute a single strategy to enforce that for all dimensions i, the probability of outcomes ρ satisfying f_i(ρ)≥v_i is at least α_i. We consider classical quantitative payoffs from the literature (sup, inf, lim sup, lim inf, mean-payoff, truncated sum, discounted sum). Our work extends to the quantitative case the multi-objective model checking problem studied by Etessami et al. in unweighted MDPs.
AAAI
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 , 2020
In collaboration with a local research group in applied probability, we are exploring how formal methods can help in reasoning about stochastic processes used in epidemic modeling. These particular stochastic processes are complex both due to their size (capturing huge populations) and their non-linear dynamics. Work in this direction has notably been inspired by the recent pandemic (Xu et al., 2021; Svoboda et al., 2022; Leys & Pérez, 2024). Our focus is on rich specifications and control, i.e., enabling reasoning about strategies in epidemic stochastic models.
References
PLOS ONE
Control strategies for COVID-19 epidemic with vaccination, shield immunity and quarantine: A metric temporal logic approach