I am currently a postdoctoral fellow at Globus Labs (University of Chicago) and at Argonne National Laboratory. My work is under the supervision of Ian Foster and Kyle Chard.
I completed my PhD at ENS Lyon in 2023. I worked in the ROMA team at the LIP laboratory under the supervision of Loris Marchal (ENS Lyon, CNRS) and in the STORM team at INRIA Bordeaux under the supervision of Samuel Thibault (Université de Bordeaux, Inria).
Email: | mgonthier (at) uchicago (dot) edu |
My research focuses on scheduling problems. During my Ph.D., I studied the challenge of scheduling tasks sharing data under memory constraints. To address this issue, we developed innovative algorithms and implemented them in the StarPU runtime.
I also had the opportunity to collaborate with the Division of Scientific Computing at Uppsala University, where I worked on batch scheduling for jobs that require large input files.
Since the beginning of my postdoctoral fellowship at the University of Chicago, I have broadened my research interests while continuing to focus on scheduling for high-performance computing. Notably, I work on mixed-criticality scheduling problems, study resilience in data storage, develop scheduling strategies to promote energy efficiency, and contribute to the development of HPC software.
My PhD defense took place on September 25th, 2023 at the LaBRI in Bordeaux, France. The title of the presentation was Scheduling Under Memory Constraint in Task-based Runtime Systems.
Abstract: Hardware accelerators, such as GPUs, now provide a large part of the computational power used for scientific simulations. GPUs come with their own limited memory and are connected to the main memory of the machine via a bus with limited bandwidth. Scientific simulations often operate on very large data, to the point of not fitting in the limited GPU memory. In this case, one has to turn to out-of-core computing, where data movement quickly becomes a performance bottleneck. During this thesis, we worked on the problem of scheduling for a task-based runtime to improve data locality in an out-of-core setting, in order to reduce data movements. We designed strategies for both task scheduling and data eviction from limited memories. We implemented them in the StarPU runtime and compared them to existing scheduling techniques. Our strategies achieves significantly better performance when scheduling tasks on multiple GPUs with limited memory, as well as on multiple CPU cores with limited main memory.
My PhD manuscrit is available here. A capture of my PhD defense is also available:
Year | Topic | Level | Location |
---|---|---|---|
2023 | Computer hardware architecture | L3 | Enseirb-Matmeca Bordeaux |
2022 | Algorithmic | L3 | Enseirb-Matmeca Bordeaux |
2022 | Internship tutoring and member of the jury | M2 | Enseirb-Matmeca Bordeaux |
2022 | Network programming | M1 | Enseirb-Matmeca Bordeaux |
2022 | Internship tutoring | M1 | Enseirb-Matmeca Bordeaux |
2021 | Network Programming | M1 | Enseirb-Matmeca Bordeaux |
2020 | Systems | L1 | Université Lyon 1 |
Alok Kamatar: Core Hours and Carbon Credits: Incentivizing Sustainability in HPC, PhD student at University of Chicago (Jan 2024–present). We developed models and tools for allocating carbon credits, surveyed HPC usage, and prototyped a game to promote energy efficiency. He also developed the HPC Scheduling Game (https://game.funcx.org/).
Dante D. Sanchez-Gallegos: Building a Wide-Area Distribution System for the Management of Data Over Heterogeneous Storage, PhD student at University of Madrid Carlos III (Jan 2024–present). Together, we built algorithms for erasure coding and data distribution. His work led to the creation of DynoStore (https://github.com/dynostore/dynostore), which facilitates data transfers between storage systems.
Wenyi Wang: Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems, PhD student at University of Chicago (Feb 2024–present). His research is about building a lock-free runtime system called XQueue (https://gitlab.com/pnookala/gnu-openmp/-/tree/xtask?ref_type=heads), which aims at reducing synchronization overhead for fine-grained task-based applications.
Greg Pauloski: Programming the Continuum: Towards Better Techniques for Developing Distributed Science Applications, PhD student at University of Chicago (Jan–Dec 2024). I contributed to the development of TaPS (https://github.com/proxystore/taps), a task-based application suite.
Haochen Pan: Building a Hybrid Event-Driven Architecture for Distributed Scientific Computing, PhD student at the University of Chicago (Jan 2024–present). I've worked on this project aimed at enhancing resilience for scientific applications with the development of Octopus (https://github.com/globus-labs/diaspora-event-sdk), a cloud-edge communication software.
Shu Shi: Multi-LLM Serving for HPC, PhD student at University of Chicago (Nov 2024–present). We are developing scheduling solutions to optimize the placement and reuse of large language models on GPUs.
Sicheng Zhou: Workload Resilience Across Task Hierarchies in Task-based Parallel Programming Frameworks, undergraduate intern at University of Chicago (Mar–Jun 2024). Our collaboration on improving error management in task-based systems resulted in WRATH (https://github.com/ClaudiaCumberbatch/resilient_compute), a tool for creating and dynamically reconfiguring compute pools.
Data-Aware Reactive Task Scheduling for StarPU (DARTS): URL. DARTS improves performance for memory-intensive task-based applications by exploiting data locality and a custom eviction policy. Integrated into StarPU 1.5.0.
Hierarchical Fair Packing for StarPU (HFP): URL. HFP is an offline scheduling strategy to enhance performance under memory constraints. Integrated in StarPU 1.5.0.
Visualization tool for StarPU schedulers: URL. This tool visualizes task scheduling order and data loads (supporting outer product, GEMM, and Cholesky factorization) for StarPU's schedulers.
Batch simulator: URL 1 and URL 2. This simulator collects job submission logs and incorporates scheduling policies for large scale simulations of batch scheduling.
Data replication and mapping simulator (D-Rex): URL. This simulator models erasure coding, replicating data over heterogeneous storage nodes and simulating node failures.