Diagnosing performance bottlenecks in HPC applications