The Process Management Interface (PMI) has been used for quite some time as a means of exchanging wireup information needed for interprocess communication. Two versions (PMI-1 and PMI-2) have been released as part of the MPICH effort. While PMI-2 demonstrates better scaling properties than its PMI-1 predecessor, attaining rapid launch and wireup of the roughly 1M processes executing across 100k nodes expected for exascale operations remains challenging.
PMI Exascale (PMIx) represents an attempt to resolve these questions by providing an extended version of the PMI definitions specifically designed to support clusters up to and including exascale sizes. The overall objective of the project is not to branch the existing definitions - in fact, PMIx fully supports both of the existing PMI-1 and PMI-2 APIs - but rather to (a) augment and extend those APIs to eliminate some current restrictions that impact scalability, (b) establish a standards-like body for maintaining the definitions, and (c) provide a reference implementation of the PMIx standard that demonstrates the desired level of scalability.
The charter of the PMIx community is to:
Define a set of agnostic APIs (not affiliated with any specific programming model code base) to support interactions between application processes and the system management software stack (SMS)
Develop an open source (non-copy-left licensed) standalone “convenience” library to facilitate adoption of the PMIx standard
Retain transparent backward compatibility with the existing PMI-1 and PMI-2 definitions, any future PMI releases, and across all PMIx versions
Support the Instant On initiative for rapid startup of applications at exascale and beyond
Work with the HPC community to define and implement new APIs that support evolving programming model requirements for application-RM interactions.
Note that the definition of the PMIx standard is not contingent upon use of the convenience library. Any implementation that supports the defined APIs is perfectly acceptable, and some environments have chosen to pursue that route. The convenience library is provided solely for the following purposes:
Validation of the standard. No proposed change and/or extension to the standard is accepted without an accompanying prototype implementation in the convenience library. This ensures that the proposal has undergone at least some minimal level of scrutiny and testing before being considered.
Ease of adoption. The PMIx convenience library is designed to be particularly easy for resource managers (and the SMS in general) to adopt, thus facilitating a rapid uptake into that community for application portability. Both client and server libraries are included, along with reference examples of client usage and server-side integration. A list of supported environments and versions is provided here - please check regularly as the list is changing!
The convenience library targets support for the Linux operating system. A reasonable effort is made to support all major, modern Linux distributions; however, validation is limited to the most recent 2-3 releases of RedHat Enterprise Linux (RHEL), Fedora, CentOS, and SUSE Linux Enterprise Server (SLES). In addition, development support is maintained for Mac OSX. Production support for vendor-specific operating systems is included as provided by the vendor.
Participation in the PMIx community is open to anyone, and not restricted to only code contributors to the convenience library. Current community members are listed here.
Overview of PMIx
The following publications (with accompanying citation info) may help provide some background on PMIx and a perspective on its role in future HPC resource management:
- PMIx: Process Management for Exascale Environments [pdf] [ppt]
- PMIx: Process Management for Exascale Environments Ralph H Castain, David G. Solt, Joshua Hursey, and Aurelien Bouteiller. Proceedings of the 24th European MPI Users’ Group Meeting (EuroMPI/USA 2017), Chicago IL, USA, Sept 24-28, 2017, 14:1–14:10.
- PMIx: Storage Integration [pdf] [ppt]
- PMIx: Storage Integration Ralph H Castain. Presented at DOE Tiered Storage Working Group meeting, Feb 2017.
- PMIx Birds-of-a-Feather at SC’16 [pdf] [ppt]
- Charting the PMIx Roadmap. Ralph H Castain, David Solt, and Artem Polyakov. Presented at Birds-of-a-Feather Meeting, Supercomputing 2016, November 2016.
- PMIx: Scalable Debugger Support [pdf] [ppt]
- Ralph H. Castain, Presented to MPI Forum Debugger Working Group, July 28, 2016
- PMIx State-of-the-Union at SIAM’16 [pdf] [ppt]
- Process Management Interface-Exascale David Solt and Ralph H Castain, Presented at SIAM’16, April 12-18 2016
- PMIx Birds-of-a-Feather at SC’15 [pdf] [ppt]
- Charting the PMIx Roadmap. Ralph H Castain, Joshua Ladd, David Solt, and Gary Brown. Presented at Birds-of-a-Feather Meeting, Supercomputing 2015, November 2015.
- Exascale Process Management Interface (SLURM User’s Group 2015) [pdf] [ppt]
- Exascale Process Management Interface. Ralph H Castain, Joshua Ladd, Artem Polyakov, David Bigagli, and Gary Brown. Presented at SLURM User’s Group Meeting, Sept 2015, Washington DC
- HPC Resource Management: View to the Future [pdf] [ppt]
- HPC Resource Management: View to the Future. Ralph H Castain. Presented at Open MPI Developer’s Meeting, Feb 2016, Dallas TX
How do I get involved?
- Read the documentation. The header files contain the “official” definition of the standard, and the comments in them are intended to provide explanation and guidance. In addition, a set of man pages are being written to describe their implementation in the convenience library (not much is there just yet, but we are working on it):
In addition, a set of example applications have been developed to highlight how an application might use various aspects of PMIx.
Review the wiki for an explanation of the standard definitions and an overview of the convenience library
Scan the FAQs.
Download the latest release or the master:
- The PMIx library itself (including documentation):
- The PMIx code base is being developed in the PMIx GitHub repository.
- Become part of the conversation
Join the mailing list. The list sees a fairly low rate of posting as most discussion occurs in Github “issues” and in the teleconferences. However, broad-ranging issues and announcement of releases are among the topics handled on the mailing list.
Participate in the teleconferences. PMIx developers meet every Tues and Thurs to discuss the project, review and approve pending RFCs, and talk about their ongoing efforts. Anyone can join the calls to listen and participate in the design of PMIx.