Sascha Hunold

Associate Prof. Dr.rer.nat.

Research Focus

Computer Engineering: 100%

Research Areas

Scheduling, parallel computing, parallel programming, parallel algorithms, High performance computing

Roles

Vice Dean of Academic Affairs
Informatics Bachelor
Associate Professor
Parallel Computing, E191-04
Curriculum Coordinator
Master / Area / High Performance Computing

Contact

Courses

2025W

Advanced Multiprocessor Programming / 191.022 / VU
Bachelor Thesis for Informatics and Business Informatics / 184.716 / PR
Computer Engineering Practical / 191.005 / PR
Computer Engineering Project / 191.006 / PR
Project in Computer Science 1 / 191.008 / PR
Project in Computer Science 2 / 191.009 / PR
Scientific Programming with Python / 191.125 / VU
Scientific Project Computer Engineering / 191.007 / PR
Scientific Research and Writing / 193.052 / SE
Seminar for Master Students in Computer Engineering / 180.778 / SE
Seminar for PhD Students / 184.739 / SE

2026S

Bachelor Thesis for Informatics and Business Informatics / 184.716 / PR
Basics of Parallel Computing / 191.114 / VU
Computer Engineering Practical / 191.005 / PR
Computer Engineering Project / 191.006 / PR
High Performance Computing / 191.029 / VU
Project in Computer Science 1 / 191.008 / PR
Project in Computer Science 2 / 191.009 / PR
Scientific Project Computer Engineering / 191.007 / PR
Seminar for Master Students in Computer Engineering / 180.778 / SE
Seminar for Master Students in Software Engineering (Computer Engineering) / 180.011 / SE

Projects

High Performance Molecular Screening at Massive Scale
2022 – 2023 / Austrian Research Promotion Agency (FFG)
Publication: 192194
Offline and Online Autotuning of Parallel Applications
2021 – 2025 / Austrian Science Fund (FWF)
Publications: 136174 / 153709 / 188027 / 188934 / 188980 / 190663 / 192196 / 204353 / 204481 / 209941 / 219282 / 219281 / 222800 / 58614 / 135871

Publications

2026

To ncclsee, or Not to ncclsee: That is the Profiling Question / Laso Rodriguez, R., Salimi Beni, M., Vardas, I., Benkner, S., & Hunold, S. (2026). To ncclsee, or Not to ncclsee: That is the Profiling Question. In Austrian-Slovenian HPC Meeting 2026 – ASHPC26 (pp. 13–13).

2025

Exploring NCCL Tuning Strategies for Distributed Deep Learning / Salimi Beni, M., Laso, R., Cosenza, B., Benkner, S., & Hunold, S. (2025). Exploring NCCL Tuning Strategies for Distributed Deep Learning. In 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (pp. 59–62). IEEE. https://doi.org/10.1109/IPDPSW66978.2025.00015
Project: Autotune (2021–2025)
Mpisee: communicator-centric profiling of MPI applications / Vardas, I., Träff, J. L., Laso, R., & Hunold, S. (2025). Mpisee: communicator-centric profiling of MPI applications. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 37(15–17), Article e70158. https://doi.org/10.1002/cpe.70158
Download: PDF (1.78 MB)
Projects: Autotune (2021–2025) / Process Mapping (2019–2024)
Optimizing Distributed Deep Learning Training by Tuning NCCL / Salimi Beni, M., Laso, R., Cosenza, B., Benkner, S., & Hunold, S. (2025). Optimizing Distributed Deep Learning Training by Tuning NCCL. In ASHPC25 : Austrian-Slovenian HPC Meeting 2025 : Rimske Terme, Slovenia : 19-22 May 2025 (pp. 38–38). https://doi.org/10.34726/10424
Download: PDF (209 KB)

2024

MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns / Salimibeni, M., Cosenza, B., & Hunold, S. (2024). MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns. In Proceedings : 2024 IEEE International Conference on Cluster Computing : 24 – 27 September 2024 Kobe, Japan (pp. 108–119). https://doi.org/10.1109/CLUSTER59578.2024.00017
Project: Autotune (2021–2025)
Exploring Mapping Strategies for Co-allocated HPC Applications / Vardas, I., Hunold, S., Swartvagher, P., & Träff, J. L. (2024). Exploring Mapping Strategies for Co-allocated HPC Applications. In Demetris Zeinalipour, D. Blanco Heras, G. Pallis, H. Herodotou, D. Trihinas, D. Balouek, P. Diehl, T. Cojean, K. Fürlinger, M. H. Kirkeby, M. Nardelli, & P. Di Sanzo (Eds.), Euro-Par 2023: Parallel Processing Workshops : Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers, Part II (pp. 271–276). Springer Nature. https://doi.org/10.1007/978-3-031-48803-0_41
Download: PDF (722 KB)
Projects: Autotune (2021–2025) / Process Mapping (2019–2024)
pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations / Laso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations. arXiv. https://doi.org/10.48550/arXiv.2402.06384
Download: PDF (1.15 MB)
Exploring Scalability in C++ Parallel STL Implementations / Laso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). Exploring Scalability in C++ Parallel STL Implementations. In ICPP ’24: Proceedings of the 53rd International Conference on Parallel Processing (pp. 284–293). ACM. https://doi.org/10.1145/3673038.3673065
Download: PDF (996 KB)
Project: Autotune (2021–2025)
Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers / Hunold, S., Xie, B., & Shu, K. (Eds.). (2024). Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers (Vol. 14521). Springer Singapore. https://doi.org/10.1007/978-981-97-0316-6
Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping / Vardas, I., Hunold, S., SWARTVAGHER, P., & Träff, J. L. (2024). Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping. In 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (pp. 119–124). IEEE. https://doi.org/10.1109/CCGrid59990.2024.00023
Projects: Autotune (2021–2025) / Process Mapping (2019–2024)
Analysis and prediction of performance variability in large-scale computing systems / Salimi Beni, M., Hunold, S., & Cosenza, B. (2024). Analysis and prediction of performance variability in large-scale computing systems. Journal of Supercomputing, 80(10), 14978–15005. https://doi.org/10.1007/s11227-024-06040-w
Download: PDF (1.74 MB)

2023

Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems / Hunold, S. (2023, December 8). Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems [Presentation]. Universität Münster, Münster, Germany.
Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures / Swartvagher, P., Hunold, S., Träff, J. L., & Vardas, I. (2023). Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures. In Proceedings of 2023 SC23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis (SC 2023 Workshops) (pp. 405–415). ACM. https://doi.org/10.1145/3624062.3624109
Download: PDF (1.02 MB)
Project: Process Mapping (2019–2024)
Verifying Performance Guidelines for MPI Collectives at Scale / Hunold, S. (2023). Verifying Performance Guidelines for MPI Collectives at Scale. In Proceedings of 2023 SC23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC23 Workshops) (pp. 1264–1268). ACM. https://doi.org/10.1145/3624062.3625532
Download: PDF (619 KB)
Project: Autotune (2021–2025)
Synchronizing MPI Processes in Space and Time / Schuchart, J., Hunold, S., & Bosilca, G. (2023). Synchronizing MPI Processes in Space and Time. In EuroMPI “23: Proceedings of the 30th European MPI Users” Group Meeting (pp. 1–11). ACM. https://doi.org/10.1145/3615318.3615325
Project: Autotune (2021–2025)
A Quantitative Analysis of OpenMP Task Runtime Systems / Hunold, S., & Kraßnitzer, K. D. V. (2023). A Quantitative Analysis of OpenMP Task Runtime Systems. In A. Gainaru, C. Zhang, & C. Luo (Eds.), Benchmarking, Measuring, and Optimizing : 14th BenchCouncil International Symposium, Bench 2022, Virtual Event, November 7-9, 2022, Revised Selected Papers (pp. 3–18). Springer. https://doi.org/10.1007/978-3-031-31180-2_1
Project: Autotune (2021–2025)
Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers / Swartvagher, P., Vardas, I., Hunold, S., & Träff, J. L. (2023). Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 61–61). EuroCC Austria. https://doi.org/10.34726/5368
Download: PDF (207 KB)
Project: Process Mapping (2019–2024)
MPI is Good, Control is Better: Checking Performance Guidelines of Collectives / Hunold, S., & Hagn, M. (2023). MPI is Good, Control is Better: Checking Performance Guidelines of Collectives. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 60–60). EuroCC Austria. https://doi.org/10.34726/5367
Download: PDF (124 KB)
Project: Autotune (2021–2025)
Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications / Vardas, I., Hunold, S., Swartvagher, P., & Träff, J. L. (2023). Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 10–10). EuroCC Austria. https://doi.org/10.34726/5330
Download: PDF (329 KB)
Project: Process Mapping (2019–2024)
Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers / Hunold, S., Vardas, I., Ibis, G., & Langer, T. (2023). Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 51–51). EuroCC Austria. https://doi.org/10.34726/5366
Download: PDF (98.5 KB)
Project: HPsCreen (2022–2023)
Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI / Träff, J. L., Hunold, S., Vardas, I., & Funk, N. M. (2023). Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI. In 2023 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 284–294). IEEE. https://doi.org/10.1109/CLUSTER52292.2023.00031
OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning / Hunold, S., & Steiner, S. (2023). OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning. In Proceedings of PMBS 2022: performance modeling, benchmarking and simulation of high performance computer systems (pp. 123–128). IEEE. https://doi.org/10.1109/PMBS56514.2022.00016
Project: Autotune (2021–2025)

2022

An Overhead Analysis of MPI Profiling and Tracing Tools / Hunold, S., Ajanohoun, J. I., Vardas, I., & Träff, J. L. (2022). An Overhead Analysis of MPI Profiling and Tracing Tools. In C. Scully-Allison, R. Liem, & A. V. Solorzano (Eds.), PERMAVOST 2022: Proceedings of the 2nd Workshop on Performance Engineering, Modelling, Analysis, and Visualization Strategy (pp. 5–13). Association for Computing Machinery (ACM). https://doi.org/10.1145/3526063.3535353
Download: Open Access (985 KB)
Projects: Autotune (2021–2025) / Process Mapping (2019–2024)
Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia / Hunold, S., & Przybylski, B. (2022, May 18). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia [Conference Presentation]. New Challenges in Scheduling Theory (Centre CNRS “Paul-Langevin”, Aussois, France), Aussois, France. http://hdl.handle.net/20.500.12708/153814
Performance Tuning of MPI Collectives - Status Quo and Open Problems / Hunold, S. (2022). Performance Tuning of MPI Collectives - Status Quo and Open Problems [Presentation]. CaSToRC HPC National Competence Center Fall Seminar Series 2022, Unknown. http://hdl.handle.net/20.500.12708/153709
Project: Autotune (2021–2025)
MPI Performance Tools under the Microscope: A Thorough Overhead Analysis / Ajanohoun, J. I., Vardas, I., Träff, J. L., & Hunold, S. (2022). MPI Performance Tools under the Microscope: A Thorough Overhead Analysis. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 16). EuroCC Austria. http://hdl.handle.net/20.500.12708/55697
mpisee: MPI Profiling for Communication and Communicator Structure / Vardas, I., Hunold, S., Ajanohoun, J. I., & Träff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 15). EuroCC Austria. http://hdl.handle.net/20.500.12708/55696
mpisee: MPI Profiling for Communication and Communicator Structure / Vardas, I., Hunold, S., Ajanohoun, J. I., & Traff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2022) (pp. 520–529). IEEE. https://doi.org/10.1109/IPDPSW55747.2022.00092
Projects: Autotune (2021–2025) / Process Mapping (2019–2024)

2021

MicroBench Maker: Reproduce, Reuse, Improve / Hunold, S., Ajanohoun, J. I., & Carpen-Amarie, A. (2021). MicroBench Maker: Reproduce, Reuse, Improve. In 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 12th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2021) in conjunction with SC 2021, St. Louis, Missouri, United States of America (the). IEEE. https://doi.org/10.1109/pmbs54543.2021.00013
Project: Autotune (2021–2025)
Teaching Complex Scheduling Algorithms / Hunold, S., & Przybylski, B. (2021). Teaching Complex Scheduling Algorithms. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 11th NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar 2021) in conjunction with 35th IEEE IPDPS 2021 - Online Conference, Portland, Oregon, USA, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw52791.2021.00058
MPI collective communication through a single set of interfaces: A case for orthogonality / Träff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2021). MPI collective communication through a single set of interfaces: A case for orthogonality. Parallel Computing: Systems & Applications, 107(102826), 102826. https://doi.org/10.1016/j.parco.2021.102826
Project: Process Mapping (2019–2024)

2020

Collectives and Communicators: A Case for Orthogonality: (Or: How to get rid of MPI neighbor and enhance Cartesian collectives) / Träff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2020). Collectives and Communicators: A Case for Orthogonality: (Or: How to get rid of MPI neighbor and enhance Cartesian collectives). In 27th European MPI Users’ Group Meeting. 27th European MPI Users’ Group Meeting (EuroMPI/USA 2020) - Online Conference, Austin, United States of America (the). IEEE. https://doi.org/10.1145/3416315.3416319
Efficient Process-to-Node Mapping Algorithms for Stencil Computations / Hunold, S., von Kirchbach, K., Lehr, M., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. arXiv. https://doi.org/10.48550/arXiv.2005.09521
Project: Process Mapping (2019–2024)
Decomposing MPI Collectives for Exploiting Multi-lane Communication / Träff, J. L., & Hunold, S. (2020). Decomposing MPI Collectives for Exploiting Multi-lane Communication. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00037
Predicting MPI Collective Communication Performance Using Machine Learning / Hunold, S., Bhatele, A., Bosilca, G., & Knees, P. (2020). Predicting MPI Collective Communication Performance Using Machine Learning. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00036
Efficient Process-to-Node Mapping Algorithms for Stencil Computations / von Kirchbach, K., Lehr, M., Hunold, S., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00011
Project: Process Mapping (2019–2024)
Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia / Hunold, S., & Przybylski, B. (2020). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia. arXiv. https://doi.org/10.48550/arXiv.2003.05217

2019

Cartesian Collective Communication / Träff, J. L., & Hunold, S. (2019). Cartesian Collective Communication. In Proceedings of the 48th International Conference on Parallel Processing. 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan. ACM. https://doi.org/10.1145/3337821.3337848
On the Importance of Data Quality when Tuning MPI Libraries / Hunold, S., & Carpen-Amarie, A. (2019). On the Importance of Data Quality when Tuning MPI Libraries. In G. Haase (Ed.), Austrian HPC Meeting 2019 - AHPC19 (AHPC19 booklet of abstracts) (p. 15). Institut für Mathematik und wissenschaftliches Rechnen der Universität Graz. http://hdl.handle.net/20.500.12708/57798
LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources / Kainrad, T., Hunold, S., Seidel, T., & Langer, T. (2019). LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources. Journal of Chemical Information and Modeling, 59(1), 31–37. https://doi.org/10.1021/acs.jcim.8b00716
Benchmarking and scheduling on parallel machines / Hunold, S. (2019). Benchmarking and scheduling on parallel machines [Professorial Dissertation, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/159450

2018

Autotuning MPI Collectives using Performance Guidelines / Hunold, S., & Carpen-Amarie, A. (2018). Autotuning MPI Collectives using Performance Guidelines. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018), Tokyo, Japan. ACM. https://doi.org/10.1145/3149457.3149461
Algorithm Selection of MPI Collectives Using Machine Learning Techniques / Hunold, S., & Carpen-Amarie, A. (2018). Algorithm Selection of MPI Collectives Using Machine Learning Techniques. In 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 9th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2018) in conjunction with SC 2018, Dallas, United States of America (the). IEEE. https://doi.org/10.1109/pmbs.2018.8641622
Hierarchical Clock Synchronization in MPI / Hunold, S., & Carpen-Amarie, A. (2018). Hierarchical Clock Synchronization in MPI. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing, CLUSTER 2018, Belfast, United Kingdom of Great Britain and Northern Ireland (the). IEEE. https://doi.org/10.1109/cluster.2018.00050

2017

Euro-Par 2016: Parallel Processing Workshops : Euro-Par 2016 International Workshops, Grenoble, France, August 24-26, 2016, Revised Selected Papers / Desprez, F., Dutot, P.-F., Kaklamanis, C., Marchal, L., Molitorisz, K., Ricci, L., Scarano, V., Vega-Rodriguez, M. A., Varbanescu, A. L., Hunold, S., Scott, S. L., Lankes, S., & Weidendorfer, J. (Eds.). (2017). Euro-Par 2016: Parallel Processing Workshops : Euro-Par 2016 International Workshops, Grenoble, France, August 24-26, 2016, Revised Selected Papers. Springer Nature Switzerland AG 2021. https://doi.org/10.1007/978-3-319-58943-5
Introduction to REPPAR Workshop / Hunold, S., Legrand, A., & Nussbaum, L. (2017). Introduction to REPPAR Workshop. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lake Buena Vista, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw.2017.221
Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node / Heinrich, F. C., Cornebize, T., Degomme, A., Legrand, A., Carpen-Amarie, A., Hunold, S., Orgerie, A.-C., & Quinson, M. (2017). Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (CLUSTER 2017), Honolulu, Hawaii, United States of America (the). IEEE. https://doi.org/10.1109/cluster.2017.66
Autotuning MPI Collectives using Performance Guidelines / Hunold, S., & Carpen-Amarie, A. (2017). Autotuning MPI Collectives using Performance Guidelines. LIG - Bâtiment IMAG, St Martin d’Hères, France. http://hdl.handle.net/20.500.12708/86599
Tuning MPI Collectives by Verifying Performance Guidelines / Hunold, S., & Carpen-Amarie, A. (2017). Tuning MPI Collectives by Verifying Performance Guidelines. arXiv. https://doi.org/10.48550/arXiv.1707.09965
On expected and observed communication performance with MPI derived datatypes / Carpen-Amarie, A., Hunold, S., & Träff, J. L. (2017). On expected and observed communication performance with MPI derived datatypes. Parallel Computing: Systems & Applications, 69, 98–117. https://doi.org/10.1016/j.parco.2017.08.006
Projects: EPiGRAM (2013–2016) / MPI (2013–2018)
Scheduling Independent Moldable Tasks on Multi-Cores with GPUs / Bleuse, R., Hunold, S., Kedad-Sidhoum, S., Monna, F., Mounie, G., & Trystram, D. (2017). Scheduling Independent Moldable Tasks on Multi-Cores with GPUs. IEEE Transactions on Parallel and Distributed Systems, 28(9), 2689–2702. https://doi.org/10.1109/tpds.2017.2675891

2016

The art of benchmarking MPI libraries / Hunold, S., Carpen-Amarie, A., & Träff, J. L. (2016). The art of benchmarking MPI libraries. In I. Reichl, C. Blaas-Schenner, & J. Zabloudil (Eds.), Austrian HPC Meeting 2016 - AHPC 2016 (p. 45). Vienna Scientific Cluster (VSC). http://hdl.handle.net/20.500.12708/56921
The Art of MPI Benchmarking / Hunold, S. (2016). The Art of MPI Benchmarking. 45th SPEEDUP Workshop on High-Performance Computing, Basel, Switzerland. http://hdl.handle.net/20.500.12708/86310
On the Expected and Observed Communication Performance with MPI Derived Datatypes / Carpen-Amarie, A., Hunold, S., & Träff, J. L. (2016). On the Expected and Observed Communication Performance with MPI Derived Datatypes. In D. Holmes, A. Collis, J. L. Träff, & L. Smith (Eds.), Proceedings of the 23rd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2966884.2966905
Projects: EPiGRAM (2013–2016) / MPI (2013–2018)
Automatic Verification of Self-consistent MPI Performance Guidelines / Hunold, S., Carpen-Amarie, A., Lübbe, F. D., & Träff, J. L. (2016). Automatic Verification of Self-consistent MPI Performance Guidelines. In P.-F. Dutot & D. Trystram (Eds.), Euro-Par 2016: Parallel Processing (pp. 433–446). Springer International Publishing. https://doi.org/10.1007/978-3-319-43659-3_32
Projects: MPI (2013–2018) / ReproPC (2013–2016)
Clock Synchronization Algorithms and SimGrid / Hunold, S. (2016). Clock Synchronization Algorithms and SimGrid. SimGrid User Days, Fréjus, France. http://hdl.handle.net/20.500.12708/86260
Message-Combining Algorithms for Isomorphic, Sparse Collective Communication / Träff, J. L., Carpen-Amarie, A., Hunold, S., & Rougier, A. (2016). Message-Combining Algorithms for Isomorphic, Sparse Collective Communication. arXiv. https://doi.org/10.48550/arXiv.1606.07676
PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines / Hunold, S., Carpen-Amarie, A., Lübbe, F. D., & Träff, J. L. (2016). PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines. arXiv. https://doi.org/10.48550/arXiv.1606.00215
Projects: MPI (2013–2018) / ReproPC (2013–2016)
MPI Derived Datatypes: Performance Expectations and Status Quo / Carpen-Amarie, A., Hunold, S., & Träff, J. L. (2016). MPI Derived Datatypes: Performance Expectations and Status Quo. arXiv. https://doi.org/10.48550/arXiv.1607.00178
Projects: EPiGRAM (2013–2016) / MPI (2013–2018)
The art of benchmarking MPI libraries / Hunold, S. (2016). The art of benchmarking MPI libraries. Austrian HPC Meeting 2016 - AHPC16, Grundlsee, Austria. http://hdl.handle.net/20.500.12708/86269
The Art of MPI Benchmarking / Hunold, S. (2016). The Art of MPI Benchmarking. Lunchtime Seminar, Department of Computer Science, University of Innsbruck, Innsbruck, Austria, Austria. http://hdl.handle.net/20.500.12708/86282

2015

Euro-Par 2015: Parallel Processing Workshops : Euro-Par 2015 International Workshops, Vienna, Austria, August 24-25, 2015, Revised Selected Papers / Hunold, S., Costan, A., Gimenez, D., Iosup, A., Ricci, L., Gomez Requena, M. E., Scarano, V., Varbanescu, A. L., Scott, S. L., Lankes, S., Weidendorfer, J., & Alexander, M. (Eds.). (2015). Euro-Par 2015: Parallel Processing Workshops : Euro-Par 2015 International Workshops, Vienna, Austria, August 24-25, 2015, Revised Selected Papers. Springer International Publishing. https://doi.org/10.1007/978-3-319-27308-2
Euro-Par 2015: Parallel Processing : 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings / Träff, J. L., Hunold, S., & Versaci, F. (Eds.). (2015). Euro-Par 2015: Parallel Processing : 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings. Springer-Verlag Berlin Heidelberg. https://doi.org/10.1007/978-3-662-48096-0
Reproducibility in Parallel Computing / Hunold, S. (2015). Reproducibility in Parallel Computing. Session: Performance Reproducibility in HPC - Challenges and State-of-the-Art at the 27th International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2015), Austin, United States of America (the). http://hdl.handle.net/20.500.12708/86091
Project: ReproPC (2013–2016)
On the Impact of Synchronizing Clocks and Processes on Benchmarking MPI Collectives / Hunold, S., & Carpen-Amarie, A. (2015). On the Impact of Synchronizing Clocks and Processes on Benchmarking MPI Collectives. In J. Dongarra, A. Denis, B. Goglin, E. Jeannot, & G. Mercier (Eds.), Proceedings of the 22nd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2802658.2802662
Projects: MPI (2013–2018) / ReproPC (2013–2016)
Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations / Träff, J. L., Lübbe, F. D., Rougier, A., & Hunold, S. (2015). Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations. In J. Dongarra, A. Denis, B. Goglin, E. Jeannot, & G. Mercier (Eds.), Proceedings of the 22nd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2802658.2802663
Projects: EPiGRAM (2013–2016) / MPI (2013–2018)
One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints / Hunold, S. (2015). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. Wirtschaftswissenschaftliche Fakultät, Universität Augsburg, Augsburg, Germany. http://hdl.handle.net/20.500.12708/86038
Accurately Measuring MPI Collectives with Synchronized Clocks / Hunold, S. (2015). Accurately Measuring MPI Collectives with Synchronized Clocks. Dagstuhl Seminar 15281: Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems, Wadern, Germany. http://hdl.handle.net/20.500.12708/86057
A Survey on Reproducibility in Parallel Computing / Hunold, S. (2015). A Survey on Reproducibility in Parallel Computing. arXiv. https://doi.org/10.48550/arXiv.1511.04217
MPI Benchmarking Revisited: Experimental Design and Reproducibility / Hunold, S., & Carpen-Amarie, A. (2015). MPI Benchmarking Revisited: Experimental Design and Reproducibility. arXiv. https://doi.org/10.48550/arXiv.1505.07734
One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints / Hunold, S. (2015). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. Concurrency and Computation: Practice and Experience, 27(4), 1010–1026. http://hdl.handle.net/20.500.12708/150641
Project: ReproPC (2013–2016)
Energy Characterization and Optimization of Parallel Prefix-Sums Kernels / Papatriantafyllou, A. (2015). Energy Characterization and Optimization of Parallel Prefix-Sums Kernels. In S. Hunold, A. Costan, D. Gimenez, A. Iosup, L. Ricci, M. E. Gomez Requena, V. Scarano, A. L. Varbanescu, S. L. Scott, S. Lankes, J. Weidendorfer, & M. Alexander (Eds.), Euro-Par 2015: Parallel Processing Workshops (pp. 685–696). Springer International Publishing. https://doi.org/10.1007/978-3-319-27308-2_55

2014

Implementing a classic: zero-copy all-to-all communication with mpi datatypes / Träff, J. L., Rougier, A., & Hunold, S. (2014). Implementing a classic: zero-copy all-to-all communication with mpi datatypes. In M. Gerndt, P. Stenström, L. Rauchwerger, B. Miller, & M. Schulz (Eds.), Proceedings of the 28th ACM international conference on Supercomputing - ICS ’14. ACM. https://doi.org/10.1145/2597652.2597662
Euro-Par 2014: Parallel Processing Workshops : Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part I / Lopes, L., Zilinskas, J., Costan, A., Cascella, R. G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S. L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., & Alexander, M. (Eds.). (2014). Euro-Par 2014: Parallel Processing Workshops : Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part I. Springer. https://doi.org/10.1007/978-3-319-14325-5
Euro-Par 2014: Parallel Processing Workshops : Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part II / Lopes, L., Zilinskas, J., Costan, A., Cascella, R. G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S. L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., & Alexander, M. (Eds.). (2014). Euro-Par 2014: Parallel Processing Workshops : Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part II. Springer. https://doi.org/10.1007/978-3-319-14313-2
Scheduling Moldable Tasks with Precedence Constraints and Arbitrary Speedup Functions on Multiprocessors / Hunold, S. (2014). Scheduling Moldable Tasks with Precedence Constraints and Arbitrary Speedup Functions on Multiprocessors. In R. Wyrzykowski, J. Dongarra, K. Karczewski, & J. Wasniewski (Eds.), Parallel Processing and Applied Mathematics (pp. 13–25). Springer. https://doi.org/10.1007/978-3-642-55195-6_2
Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think / Hunold, S., Carpen-Amarie, A., & Träff, J. L. (2014). Reproducible MPI Micro-Benchmarking Isn’t As Easy As You Think. In J. Dongarra, Y. Ishikawa, & A. Hori (Eds.), Proceedings of the 21st European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2642769.2642785
Projects: MPI (2013–2018) / ReproPC (2013–2016)
Stepping Stones to Reproducible Research: A Study of Current Practices in Parallel Computing / Carpen-Amarie, A., Rougier, A., & Lübbe, F. D. (2014). Stepping Stones to Reproducible Research: A Study of Current Practices in Parallel Computing. In L. Lopes, J. Zilinskas, A. Costan, R. G. Cascella, G. Kecskemeti, E. Jeannot, M. Cannataro, L. Ricci, S. Benkner, S. Petit, V. Scarano, J. Gracia, S. Hunold, S. L. Scott, S. Lankes, C. Lengauer, J. Carretero, J. Breitbart, & M. Alexander (Eds.), Euro-Par 2014: Parallel Processing Workshops Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part I (pp. 499–510). Springer International Publishing. https://doi.org/10.1007/978-3-319-14325-5_43
One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints / Hunold, S. (2014). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. 9th Scheduling for Large Scale Systems Workshop, Lyon, France. http://hdl.handle.net/20.500.12708/85812
Moldable Task Scheduling: Theory and Practice / Hunold, S. (2014). Moldable Task Scheduling: Theory and Practice. Workshop on New Challenges in Scheduling Theory, Aussois, France. http://hdl.handle.net/20.500.12708/85817
Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think / Hunold, S., Carpen-Amarie, A., & Träff, J. L. (2014). Reproducible MPI Micro-Benchmarking Isn’t As Easy As You Think. Research Group Theory and Applications of Algorithms, University of Vienna, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/85872
Projects: MPI (2013–2018) / ReproPC (2013–2016)
One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints / Hunold, S. (2014). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. AIT Austrian Institute of Technology, Seibersdorf, Austria, Austria. http://hdl.handle.net/20.500.12708/85871
Project: ReproPC (2013–2016)
Reproducibility of Experiments: It's about the WHO and less the HOW / Hunold, S. (2014). Reproducibility of Experiments: It’s about the WHO and less the HOW. Panel on reproducible research methodologies and new publication models, 4th International Workshop on Adaptive Self-tuning Computing Systems (ADAPT 2014) co-located with HiPEAC 2014, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/85814
Project: ReproPC (2013–2016)

2013

Can I repeat your parallel computing experiment? Yes, you can't / Hunold, S. (2013). Can I repeat your parallel computing experiment? Yes, you can’t. Technische Universität Dresden, Zentrale für Informationsdienste und Hochleistungsrechnen (ZIH), Dresden, Germany. http://hdl.handle.net/20.500.12708/85620
On the Scalability of Moldable Task Scheduling Algorithms / Hunold, S. (2013). On the Scalability of Moldable Task Scheduling Algorithms. Dagstuhl Seminar 13381: Algorithms and Scheduling Techniques for Exascale Systems, Schloss Dagstuhl, Germany. http://hdl.handle.net/20.500.12708/85623
Fair scheduling of bag-of-tasks applications using distributed Lagrangian optimization / Bertin, R., Hunold, S., Legrand, A., & Touati, C. (2013). Fair scheduling of bag-of-tasks applications using distributed Lagrangian optimization. Journal of Parallel and Distributed Computing, 74(1), 1914–1929. https://doi.org/10.1016/j.jpdc.2013.08.011
On the State and Importance of Reproducible Experimental Research in Parallel Computing / Hunold, S., & Träff, J. L. (2013). On the State and Importance of Reproducible Experimental Research in Parallel Computing. arXiv. https://doi.org/10.48550/arXiv.1308.3648

2012

Evolutionary Scheduling of Parallel Tasks Graphs onto Homogeneous Clusters / Hunold, S., & Lepping, J. (2012). Evolutionary Scheduling of Parallel Tasks Graphs onto Homogeneous Clusters. New Challenges in Scheduling Theory, Frejus, France. http://hdl.handle.net/20.500.12708/85392
Reproducibility and Data Provenance with VisTrails / Hunold, S. (2012). Reproducibility and Data Provenance with VisTrails. WP8 meeting, ANR SONGS project, Paris, France. http://hdl.handle.net/20.500.12708/85431

Supervisions

Enhancing the Performance Analysis of NCCL GPU Collectives / Cerar, J. (2026). Enhancing the Performance Analysis of NCCL GPU Collectives [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.136848
Download: PDF (2.11 MB)
LLM-driven Translation of GPU Code Across Parallel Execution Models / Hagn, M. (2026). LLM-driven Translation of GPU Code Across Parallel Execution Models [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.138663
Download: PDF (5.5 MB)
Performance and Scalability Analysis of Dask Applications on Large Scale Systems / Chakarov, T. (2026). Performance and Scalability Analysis of Dask Applications on Large Scale Systems [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.131836
Download: PDF (3.06 MB)
pSTL-Bench : evaluating the capabilities of ISO C++ parallel STL implementations on modern parallel hardware using microbenchmarking / Krupitza, D. (2023). pSTL-Bench : evaluating the capabilities of ISO C++ parallel STL implementations on modern parallel hardware using microbenchmarking [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.113280
Download: PDF (5.86 MB)
Online algorithm selection of MPI collective communication operations / Steiner, S. (2023). Online algorithm selection of MPI collective communication operations [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.105821
Download: PDF (948 KB)
The Causes of run time variability in HPC, how to pin them down and how to handle them / Roth, N. (2021). The Causes of run time variability in HPC, how to pin them down and how to handle them [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.83802
Download: PDF (4.94 MB)
To Co-schedule or not to co-schedule? Efficiently utilizing large multicore machines / Sarközi, B. A. (2021). To Co-schedule or not to co-schedule? Efficiently utilizing large multicore machines [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.87732
Download: PDF (8.69 MB)
Providing transparent remote access to HPC resources for graphical desktop applications / Kainrad, T. (2018). Providing transparent remote access to HPC resources for graphical desktop applications [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.45383
Download: PDF (6.71 MB)

Awards

Best Short Paper / PMBS@Supercomputing
2022 / USA
Best Paper Award IEEE CLUSTER 2020
2020 / Japan
Best Paper Award EuroMPI/Asia
2014 / Japan

And more…

Soon, this page will include additional information such as reference projects, activities as journal reviewer and editor, memberships in councils and committees, and other research activities.

Until then, please visit Sascha Hunold’s research profile in TISS .

Related