Predicting Output Performance of A Petascale Supercomputer
Published in HPDC, 2017
In this paper, we develop a predictive model useful for output performance prediction of supercomputer ile systems under production load. Our target environment is TitanÐthe 3rd fastest supercomputer in the worldÐand its Lustre-based multi-stage write path. We observe from Titan that although output performance is highly variable at small time scales, the mean performance is stable and consistent over typical application run times. Moreover, we ind that output performance is non-linearly related to its correlated parameters due to interference and saturation on individual stages on the path. These observations enable us to build a predictive model of expected write times of output patterns and I/O conigurations, using feature transformations to capture non-linear relationships. We identify the candidate features based on the structure of the Lustre/Titan write path, and use feature transformation functions to produce a model space with 135,000 candidate models. By searching for the minimal mean square error in this space we identify a good model and show that it is efective.