prof_pic.png

Max Weltevrede

PhD Researcher, TU Delft

About me

I’m a PhD researcher in the Sequential Decision Making group at the Delft University of Technology supervised by Matthijs Spaan and Wendelin Böhmer. I do research in reinforcement learning with a focus on developing RL agents that can generalise to new scenarios. I have investigated several ways of improving generalisation performance, through exploring more of the training environemnts, using ensembles and distillation after training, and data augmentation in an offline RL setting. I am currently doing an Applied Science internship at Wayve in London, where I, among other things, investigate data augmentation techniques to improve real-world autonomous driving performance.

Generally, I am interested in many things. At the moment this includes generalisation, adaptation, continual learning, causality, physics, the scientific method, software engineering, playing guitar, singing, painting and collecting fossils.

News

May 06, 2026 Our paper “Training on Irrelevant States Implies Data Augmentation: Generalization in Contextual MDPs” got accepted at RLC 2026!
Jan 01, 2026 Started an internship at Wayve in London.
Sep 18, 2025 Our paper “How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning” got accepted at NeurIPS 2025!
Jul 10, 2025 Presented our work Exploration Implies Data Augmentation in Cathy Wu’s lab at MIT

Publications

    • Generalization in Offline RL: The Structure of Pessimism Matters More than How Pessimistic You Are
      Max Weltevrede, Matthijs T. J. Spaan, and Wendelin Böhmer
      Preprint, May 2026
    • Training on Irrelevant States Implies Data Augmentation: Generalization in Contextual MDPs
      Max Weltevrede, Caroline Horsch, Matthijs T. J. Spaan, and Wendelin Böhmer
      Reinforcement Learning Conference (RLC 2026), May 2026
    • Sparse Masked Attention Policies for Reliable Generalization
      Caroline Horsch, Laurens Engwegen, Max Weltevrede, Matthijs T. J. Spaan, and Wendelin Böhmer
      Preprint, Under Review, Feb 2026
    • Universal Value-Function Uncertainties
      Moritz A. Zanger, Max Weltevrede, Yaniv Oren, Pascal R. Van der Vaart, Caroline Horsch, Wendelin Böhmer, and Matthijs T. J. Spaan
      International Conference on Learning Representations (ICLR 2026), Feb 2026
    • How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
      Max Weltevrede, Moritz A. Zanger, Matthijs T. J. Spaan, and Wendelin Böhmer
      Conference on Neural Information Processing Systems (NeurIPS 2025), May 2025
    • Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning
      Max Weltevrede, Felix Kaubek, Matthijs T. J. Spaan, and Wendelin Böhmer
      Seventeenth European Workshop on Reinforcement Learning (EWRL), Sep 2024
    • The Role of Diverse Replay for Generalisation in Reinforcement Learning
      Max Weltevrede, Matthijs T. J. Spaan, and Wendelin Böhmer
      Sixteenth European Workshop on Reinforcement Learning (EWRL), Aug 2023