Max Weltevrede
PhD Researcher, TU Delft
About me
I’m a PhD researcher in the Sequential Decision Making group at the Delft University of Technology supervised by Matthijs Spaan and Wendelin Böhmer. I do research in reinforcement learning with a focus on developing RL agents that can generalise to new scenarios. I have investigated several ways of improving generalisation performance, through exploring more of the training environemnts, using ensembles and distillation after training, and data augmentation in an offline RL setting. I am currently doing an Applied Science internship at Wayve in London, where I, among other things, investigate data augmentation techniques to improve real-world autonomous driving performance.
Generally, I am interested in many things. At the moment this includes generalisation, adaptation, continual learning, causality, physics, the scientific method, software engineering, playing guitar, singing, painting and collecting fossils.
News
| May 06, 2026 | Our paper “Training on Irrelevant States Implies Data Augmentation: Generalization in Contextual MDPs” got accepted at RLC 2026! |
|---|---|
| Jan 01, 2026 | Started an internship at Wayve in London. |
| Sep 18, 2025 | Our paper “How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning” got accepted at NeurIPS 2025! |
| Jul 10, 2025 | Presented our work Exploration Implies Data Augmentation in Cathy Wu’s lab at MIT |
Publications
-
- Generalization in Offline RL: The Structure of Pessimism Matters More than How Pessimistic You ArePreprint, May 2026
-
-
-
-
-
-
-