Trinity Program Pursuing “Power-Aware Scheduling” Through Contract with Adaptive Computing

The Trinity project has placed a new “Non-Recurring Engineering” contract with Adaptive Computing to develop “Power-Aware Scheduling” capabilities for the Trinity supercomputer, which when fully installed will consume 8-10 megawatts (MW) and have a peak computational capability of more than 40 petaflops.  This work will enable power to be managed as a resource by the system’s workload manager and lead to better understanding of how to operate future ASC platforms to maximize the benefit derived from the available power budget.  This is important because each generation of supercomputers is consuming significantly more power than the previous generation.   Given practical limits on a system-level power budget of 20-40 MW, this trend is expected to cause power to become the primary constraint on supercomputer performance rather than the amount of computing hardware available.  Therefore, power budgets must be managed as a first-class resource and distributed intelligently by the workload manager. Trinity project staff at SNL and LANL will be collaborating closely with Adaptive Computing to develop power-aware scheduling capabilities and exercise them at scale on Trinity.  Some of the areas explored will include system-, job-, and component-level power budgeting, job scheduling within power budgets, user-interfaces for specifying power-related quality-of-service requirements, system power usage ramp rate control, and power usage reporting and billing.  The work will leverage and interface with the Power API implementation and other advanced power management capabilities being developed by Cray for the Trinity platform.  The overall insight gained from this work will be used to help specify the requirements and features needed on future supercomputer platforms.

Notional framework for power-aware scheduling.  In order to maximize the benefit derived from a system-level power budget of 20 MW, the workload manager allocates power to jobs based on historical and real-time power usage information obtained from the power usage database.
Notional framework for power-aware scheduling. In order to maximize the benefit derived from a system-level power budget of 20 MW, the workload manager allocates power to jobs based on historical and real-time power usage information obtained from the power usage database.
Contact
Kevin Pedretti, ktpedre@sandia.gov

October 1, 2016