Maxime Gonthier

Research interests

My research focuses on scheduling problems. During my Ph.D., I studied the challenge of scheduling tasks sharing data under memory constraints. To address this issue, we developed innovative algorithms and implemented them in the StarPU runtime.

I also had the opportunity to collaborate with the Division of Scientific Computing at Uppsala University, where I worked on batch scheduling for jobs that require large input files.

Since the beginning of my postdoctoral fellowship at the University of Chicago, I have broadened my research interests while continuing to focus on scheduling for high-performance computing. Notably, I work on mixed-criticality scheduling problems, study resilience in data storage, develop scheduling strategies to promote energy efficiency, and contribute to the development of HPC software.

PhD defense

My PhD defense took place on September 25^th, 2023 at the LaBRI in Bordeaux, France. The title of the presentation was Scheduling Under Memory Constraint in Task-based Runtime Systems.
Abstract: Hardware accelerators, such as GPUs, now provide a large part of the computational power used for scientific simulations. GPUs come with their own limited memory and are connected to the main memory of the machine via a bus with limited bandwidth. Scientific simulations often operate on very large data, to the point of not fitting in the limited GPU memory. In this case, one has to turn to out-of-core computing, where data movement quickly becomes a performance bottleneck. During this thesis, we worked on the problem of scheduling for a task-based runtime to improve data locality in an out-of-core setting, in order to reduce data movements. We designed strategies for both task scheduling and data eviction from limited memories. We implemented them in the StarPU runtime and compared them to existing scheduling techniques. Our strategies achieves significantly better performance when scheduling tasks on multiple GPUs with limited memory, as well as on multiple CPU cores with limited main memory.

My PhD manuscrit is available here. A capture of my PhD defense is also available:

<span class="italic-text">Your browser does not support this video :(</span>

Publications

Peer Reviewed International Journals

Taming data locality for task scheduling under memory constraint in runtime systems
Future Generation Computer Systems (FGCS), 2023
Maxime Gonthier, Loris Marchal, et Samuel Thibault
[Link]

Peer Reviewed International Conferences

DynoStore: A wide-area distribution system for the management of data over heterogeneous storage
25th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid), 2025
Dante D. Sanchez-Gallegos, J. L. Gonzalez-Compean, Maxime Gonthier, Valerie Hayot-Sasson, Gregory Pauloski, Haochen Pan, Kyle Chard, Jesus Carretero, Ian Foster
[Link]
WRATH: Workload Resilience Across Task Hierarchies in Task-based Parallel Programming Frameworks
25th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid), 2025
Sicheng Zhou, Zhuozhao Li, Valerie Hayot-Sasson, Haochen Pan, Maxime Gonthier, Gregory Pauloski, Ryan Chard, Ian Foster, Kyle Chard
[Link]
Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems
39th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2025
Wenyi Wang, Maxime Gonthier, Poornima Nookala, Haochen Pan, Ian Foster, Ioan Raicu, Kyle Chard
[Link]
TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks
20th IEEE International Conference on e-Science, 2024
J. Gregory Pauloski, Valerie Hayot-Sasson, Maxime Gonthier, Nathaniel Hudson, Haochen Pan, Sicheng Zhou, Ian Foster, Kyle Chard
[Link]
An Empirical Investigation of Container Building Strategies and Warm Times to Reduce Cold Starts in Scientific Computing Serverless Functions
20th IEEE International Conference on e-Science, 2024
André Bauer, Maxime Gonthier, Ryan Chard, Haochen Pan, Daniel Grzenda, Martin Straesser, J. Gregory Pauloski, Alok Kamatar, Matthew E. Baughman, Nathaniel Hudson, Ian Foster, Kyle Chard
[Link]
Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems
36th IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, 2022
Maxime Gonthier, Loris Marchal, Samuel Thibault
[Link]

Peer Reviewed International Workshops

Octopus: Experiences with a Hybrid Event-Driven Architecture for Distributed Scientific Computing
Fault Tolerance for HPC at eXtreme Scales Workshop (FTXS) at SuperComputing, 2024
Haochen Pan, Ryan Chard, Sicheng Zhou, Alok Kamatar, Rafael Vescovi, Valérie Hayot-Sasson, André Bauer, Maxime Gonthier, Kyle Chard, Ian Foster
[Link]
Data-Driven Locality-Aware Batch Scheduling
26th Workshop on Advances in Parallel and Distributed Computational Models (APDCM) at 38th IEEE IPDPS, 2024
Maxime Gonthier, Elisabeth Larsson, Loris Marchal, Carl Nettelblad, Samuel Thibault
[Link]
Diaspora: Resilience-Enabling Services for Real-Time Distributed Workflows
1st international workshop on Near Real-time Data Processing for Interconnected Scientific Instruments (NRDPISI-1) à la 20th IEEE International Conference on e-Science, 2024
Bogdan Nicolae, Justin M. Wozniak, Tekin Bicer, Hai Nguyen, Parth Patel, Haochen Pan, Amal Gueroudji, Maxime Gonthier, Valerie Hayot-Sasson, Eliu Huerta, Kyle Chard, Ryan Chard, Matthieu Dorier, Nageswara S. V. Rao, Anees Al-Najjar, Alessandra Corsi, Ian Foster
[Link]
Locality-Aware Scheduling of Independent Tasks for Runtime Systems
COLOC: 5th workshop on data locality à la 27th International European Conference on Parallel and Distributed Computing (EuroPar), 2021
Maxime Gonthier, Loris Marchal, Samuel Thibault
[Link]

Peer Reviewed National Conferences

Exploiting data locality to maximize the performance of data-sharing tasksets
ComPAS - Conférence francophone d’informatique en Parallélisme, Architecture et Système, 2023
Maxime Gonthier
[Link]

Research Reports

Core Hours and Carbon Credits: Incentivizing Sustainability in HPC
Alok Kamatar, Maxime Gonthier, Valerie Hayot-Sasson, Andre Bauer, Marcin Copik, Raul Castro Fernandez, Torsten Hoefler, Kyle Chard, Ian Foster
[Link]
A generic scheduler to foster data locality for GPU and out-of-core task-based applications
Maxime Gonthier, Loris Marchal, Samuel Thibault
[Link]
Locality-aware batch scheduling of I/O intensive workloads
Maxime Gonthier, Elisabeth Larsson, Loris Marchal, Carl Nettelblad, Samuel Thibault
[Link]
A generic scheduler to foster data locality for GPU and out-of-core task-based applications
Maxime Gonthier, Loris Marchal, Samuel Thibault
[Link]
Locality-Aware Scheduling of Independent Tasks for Runtime Systems
Maxime Gonthier, Loris Marchal, Samuel Thibault
[Link]

Year	Topic	Level	Location
2023	Computer hardware architecture	L3	Enseirb-Matmeca Bordeaux
2022	Algorithmic	L3	Enseirb-Matmeca Bordeaux
2022	Internship tutoring and member of the jury	M2	Enseirb-Matmeca Bordeaux
2022	Network programming	M1	Enseirb-Matmeca Bordeaux
2022	Internship tutoring	M1	Enseirb-Matmeca Bordeaux
2021	Network Programming	M1	Enseirb-Matmeca Bordeaux
2020	Systems	L1	Université Lyon 1

Awards and grants

Best paper award at the 20th IEEE International Conference on e-Science - 2024
https://www.escience-conference.org/2024/
France Chicago Center
Grant awarded by the France Chicago Center in 2024 for a period of 2 years (see Smart Scheduling in Serverless Environments: Unveiling the Role of Data in Performance and Resilience at https://fcc.uchicago.edu/faccts-awards/).
This travel grant is intended to promote collaboration between researchers in France and Chicago. The project participants include Professors Ian Foster (University of Chicago and Argonne National Laboratory), Loris Marchal (CNRS and LIP), Kyle Chard (University of Chicago and Argonne), and Frédéric Vivien (INRIA and LIP).
Early Career Professionals Travel Grant
Travel grant awarded by SIGHPC in 2024. This grant is designed to help early career researchers attend conferences. https://www.sighpc.org/opportunities/travel-grants.

Co-supervision of students and software developments

Alok Kamatar: Core Hours and Carbon Credits: Incentivizing Sustainability in HPC, PhD student at University of Chicago (Jan 2024–present). We developed models and tools for allocating carbon credits, surveyed HPC usage, and prototyped a game to promote energy efficiency. He also developed the HPC Scheduling Game (https://game.funcx.org/).

Dante D. Sanchez-Gallegos: Building a Wide-Area Distribution System for the Management of Data Over Heterogeneous Storage, PhD student at University of Madrid Carlos III (Jan 2024–present). Together, we built algorithms for erasure coding and data distribution. His work led to the creation of DynoStore (https://github.com/dynostore/dynostore), which facilitates data transfers between storage systems.

Wenyi Wang: Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems, PhD student at University of Chicago (Feb 2024–present). His research is about building a lock-free runtime system called XQueue (https://gitlab.com/pnookala/gnu-openmp/-/tree/xtask?ref_type=heads), which aims at reducing synchronization overhead for fine-grained task-based applications.

Greg Pauloski: Programming the Continuum: Towards Better Techniques for Developing Distributed Science Applications, PhD student at University of Chicago (Jan–Dec 2024). I contributed to the development of TaPS (https://github.com/proxystore/taps), a task-based application suite.

Haochen Pan: Building a Hybrid Event-Driven Architecture for Distributed Scientific Computing, PhD student at the University of Chicago (Jan 2024–present). I've worked on this project aimed at enhancing resilience for scientific applications with the development of Octopus (https://github.com/globus-labs/diaspora-event-sdk), a cloud-edge communication software.

Shu Shi: Multi-LLM Serving for HPC, PhD student at University of Chicago (Nov 2024–present). We are developing scheduling solutions to optimize the placement and reuse of large language models on GPUs.

Sicheng Zhou: Workload Resilience Across Task Hierarchies in Task-based Parallel Programming Frameworks, undergraduate intern at University of Chicago (Mar–Jun 2024). Our collaboration on improving error management in task-based systems resulted in WRATH (https://github.com/ClaudiaCumberbatch/resilient_compute), a tool for creating and dynamically reconfiguring compute pools.

Software developments

Data-Aware Reactive Task Scheduling for StarPU (DARTS): URL. DARTS improves performance for memory-intensive task-based applications by exploiting data locality and a custom eviction policy. Integrated into StarPU 1.5.0.

Hierarchical Fair Packing for StarPU (HFP): URL. HFP is an offline scheduling strategy to enhance performance under memory constraints. Integrated in StarPU 1.5.0.

Visualization tool for StarPU schedulers: URL. This tool visualizes task scheduling order and data loads (supporting outer product, GEMM, and Cholesky factorization) for StarPU's schedulers.

Batch simulator: URL 1 and URL 2. This simulator collects job submission logs and incorporates scheduling policies for large scale simulations of batch scheduling.

Data replication and mapping simulator (D-Rex): URL. This simulator models erasure coding, replicating data over heterogeneous storage nodes and simulating node failures.

Invited talks

2024 - Illinois Institute of Technology, Department of Computer Science. Chicago, USA.
2024 - University of Chicago, Department of Computer Science. Chicago, USA.
2024 - Johns Hopkins University, Members of the Diaspora project. Baltimore, USA.
2024 - ANEO, Consulting company in Cloud and HPC. Boulogne-Billancourt, France. Link
2024 - 17th Scheduling for Large-Scale Systems Workshop. Aussois, France. Link
2023 - Toulouse Institute of Computer Science Research, APO Team. Toulouse, France.
2023 - University of Versailles, David Team. Versailles, France.
2023 - INRIA Grenoble, POLARIS and DATAMOVE Teams. Grenoble, France.
2023 - 15th JLESC Workshop. Bordeaux, France. Link
2023 - SOLHARIS Plenary Meeting, Members of the ANR SOLHARIS Project. Bordeaux, France. Link
2023 - ICube - University of Strasbourg, CAMUS Team. Strasbourg, France.
2022 - Uppsala University, Scientific Computing Division. Uppsala, Sweden.
2022 - 15th Scheduling for Large-Scale Systems Workshop. Fréjus, France. Link
2022 - INRIA Bordeaux, STORM, TOPAL, TADAAM, and CONCACE Teams. Bordeaux, France.
2021 - ENS Lyon, ROMA Team. Lyon, France.