Parallel Computing E191-04
Parallel Computing deals with the efficient utilization of parallel processing resources for the solution of computational problems. This sounds dry, but since all modern, general purpose computing devices are in some way or the other parallel computers, parallel compting is ubiquitous and inevitable.
Contact
- Head: Jesper Larsson Träff
- Web: www.par.tuwien.ac.at
- Location: Treitlstrasse 1 - 3
On This Page
About
Parallel Computing deals with the efficient utilization of parallel processing resources for the solution of computational problems. This sounds dry, but since all modern, general purpose computing devices are in some way or the other parallel computers, parallel compting is ubiquitous and inevitable.
Since not all computational problems are easily amenable to being solved in parallel, it is fascinating and challenging, and abounds with issues and problems that must be resolved better. Parallel Computing at TU Wien focuses on efficient utilization and modeling of real, existing architectures and systems (shared-memory multi-cores, distributed memory systems, hybrid and accelerated systems), on algorithms, interfaces, libraries and, to some extent, applications; and with idealized models of parallel computuations to explore the limits of parallelization.
The research area has specific expertise and interest in message-passing parallel computing, interfaces like MPI, benchmarking of parallel algorithms, scheduling, shared-memory algorithms and data structures, and parallel algorithms. All these topics are dealt with extensively in lectures offered by the research division.
The research Unit Parallel Computing is part of the Institute of Computer Engineering.
Professors
Scientific Staff
Administrative Staff
Student Staff
Courses
2024W
- Advanced Multiprocessor Programming / 184.726 / VU
- Bachelor Thesis for Informatics and Business Informatics / 184.716 / PR
- Computer Engineering Practical / 191.005 / PR
- Computer Engineering Project / 191.006 / PR
- Parallel Algorithms / 184.727 / VU
- Project in Computer Science 1 / 191.008 / PR
- Project in Computer Science 2 / 191.009 / PR
- Scientific Programming with Python / 191.125 / VU
- Scientific Project Computer Engineering / 191.007 / PR
- Scientific Research and Writing / 193.052 / SE
- Seminar Computer Engineering / 191.108 / SE
- Seminar for Master Students in Computer Engineering / 180.778 / SE
- Seminar for PhD Students / 184.739 / SE
- Seminar in Software Engineering / 184.758 / SE
- Seminar in Theoretical Computer Science / 184.753 / SE
- Seminar on Algorithms / 184.754 / SE
Projects
-
High Performance Molecular Screening at Massive Scale
2022 – 2023 / Austrian Research Promotion Agency (FFG)
Publication: 192194 -
Offline and Online Autotuning of Parallel Applications
2021 – 2025 / Austrian Science Fund (FWF)
Publications: 136174 / 153709 / 188027 / 188934 / 188980 / 190663 / 192196 / 204353 / 204481 / 58614 / 135871 -
Algorithm Engineering for Process Mapping
2019 – 2024 / Austrian Science Fund (FWF)
Publications: 136174 / 137932 / 141845 / 176488 / 189827 / 191197 / 191155 / 192198 / 204353 / 55551 / 58196 / 135871 -
Resilience versus Performance in Numerical Linear Algebra
2016 – 2020 / Vienna Science and Technology Fund (WWTF)
Publications: 176471 / 57831 -
Verifying Self-consistent MPI Performance Guidelines
2013 – 2018 / Austrian Science Fund (FWF)
Publications: 145229 / 146889 / 176447 / 176449 / 176452 / 55184 / 56109 / 56110 / 56111 / 56523 / 56570 / 56975 / 56997 / 85805 / 85872 / 86357 -
Exascale Programming Models
2013 – 2016 / European Commission
Publications: 146889 / 176449 / 55179 / 55431 / 56108 / 56109 / 56569 / 56570 / 56571 / 85805
Publications
2024
-
pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations
/
Laso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations. arXiv. https://doi.org/10.48550/arXiv.2402.06384
Download: PDF (1.15 MB) -
Exploring Scalability in C++ Parallel STL Implementations
/
Laso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). Exploring Scalability in C++ Parallel STL Implementations. In ICPP ’24: Proceedings of the 53rd International Conference on Parallel Processing (pp. 284–293). ACM. https://doi.org/10.1145/3673038.3673065
Download: PDF (996 KB)
Project: Autotune (2021–2025) -
Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping
/
Vardas, I., Hunold, S., SWARTVAGHER, P., & Träff, J. L. (2024). Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping. In 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (pp. 119–124). IEEE. https://doi.org/10.1109/CCGrid59990.2024.00023
Projects: Autotune (2021–2025) / Process Mapping (2019–2024) -
Analysis and prediction of performance variability in large-scale computing systems
/
Salimi Beni, M., Hunold, S., & Cosenza, B. (2024). Analysis and prediction of performance variability in large-scale computing systems. Journal of Supercomputing, 80(10), 14978–15005. https://doi.org/10.1007/s11227-024-06040-w
Download: PDF (1.74 MB) - Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers / Hunold, S., Xie, B., & Shu, K. (Eds.). (2024). Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers (Vol. 14521). Springer Singapore. https://doi.org/10.1007/978-981-97-0316-6
2023
-
Round-optimal 𝑛-Block Broadcast Schedules in Logarithmic Time
/
Träff, J. L. (2023). Round-optimal 𝑛-Block Broadcast Schedules in Logarithmic Time. arXiv. https://doi.org/10.34726/7320
Download: PDF (428 KB) - Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems / Hunold, S. (2023, December 8). Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems [Presentation]. Universität Münster, Münster, Germany.
-
Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures
/
Swartvagher, P., Hunold, S., Träff, J. L., & Vardas, I. (2023). Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures. In Proceedings of 2023 SC23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis (SC 2023 Workshops) (pp. 405–415). ACM. https://doi.org/10.1145/3624062.3624109
Download: PDF (1.02 MB)
Project: Process Mapping (2019–2024) -
Verifying Performance Guidelines for MPI Collectives at Scale
/
Hunold, S. (2023). Verifying Performance Guidelines for MPI Collectives at Scale. In Proceedings of 2023 SC23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC23 Workshops) (pp. 1264–1268). ACM. https://doi.org/10.1145/3624062.3625532
Download: PDF (619 KB)
Project: Autotune (2021–2025) - The research career after the PhD / Laso Rodriguez, R., & Casado, F. E. (2023, November 3). The research career after the PhD [Presentation]. CiTIUS (USC), Santiago de Compostela, Spain. http://hdl.handle.net/20.500.12708/189614
-
Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes
/
Träff, J. L., & Vardas, I. (2023). Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes. In Proceedings of the 30th European MPI Users’ Group Meeting (EUROMPI 23). 30th European MPI Users’ Group Meeting (EuroMPI 2023), Bristol, United Kingdom of Great Britain and Northern Ireland (the). ACM. https://doi.org/10.1145/3615318.3615323
Download: PDF (568 KB)
Project: Process Mapping (2019–2024) -
Synchronizing MPI Processes in Space and Time
/
Schuchart, J., Hunold, S., & Bosilca, G. (2023). Synchronizing MPI Processes in Space and Time. In EuroMPI “23: Proceedings of the 30th European MPI Users” Group Meeting (pp. 1–11). ACM. https://doi.org/10.1145/3615318.3615325
Project: Autotune (2021–2025) -
Realizing multioperations and multiprefixes in Thick Control Flow processors
/
Forsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2023). Realizing multioperations and multiprefixes in Thick Control Flow processors. Microprocessors and Microsystems, 98, Article 104807. https://doi.org/10.1016/j.micpro.2023.104807
Download: PDF (1.93 MB) - Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors / Forsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2023). Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors. In J. Nurmi, M. Shen, P. Ellervee, P. Koch, & F. Moradi (Eds.), Proceedings 2023 IEEE Nordic Circuits and Systems Conference (NorCAS) (pp. 1–7). IEEE. https://doi.org/10.1109/NorCAS58970.2023.10305463
-
MPI is Good, Control is Better: Checking Performance Guidelines of Collectives
/
Hunold, S., & Hagn, M. (2023). MPI is Good, Control is Better: Checking Performance Guidelines of Collectives. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 60–60). EuroCC Austria. https://doi.org/10.34726/5367
Download: PDF (124 KB)
Project: Autotune (2021–2025) -
Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers
/
Swartvagher, P., Vardas, I., Hunold, S., & Träff, J. L. (2023). Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 61–61). EuroCC Austria. https://doi.org/10.34726/5368
Download: PDF (207 KB)
Project: Process Mapping (2019–2024) -
Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers
/
Hunold, S., Vardas, I., Ibis, G., & Langer, T. (2023). Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 51–51). EuroCC Austria. https://doi.org/10.34726/5366
Download: PDF (98.5 KB)
Project: HPsCreen (2022–2023) -
Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications
/
Vardas, I., Hunold, S., Swartvagher, P., & Träff, J. L. (2023). Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 10–10). EuroCC Austria. https://doi.org/10.34726/5330
Download: PDF (329 KB)
Project: Process Mapping (2019–2024) -
A Quantitative Analysis of OpenMP Task Runtime Systems
/
Hunold, S., & Kraßnitzer, K. D. V. (2023). A Quantitative Analysis of OpenMP Task Runtime Systems. In A. Gainaru, C. Zhang, & C. Luo (Eds.), Benchmarking, Measuring, and Optimizing : 14th BenchCouncil International Symposium, Bench 2022, Virtual Event, November 7-9, 2022, Revised Selected Papers (pp. 3–18). Springer. https://doi.org/10.1007/978-3-031-31180-2_1
Project: Autotune (2021–2025) - Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI / Träff, J. L., Hunold, S., Vardas, I., & Funk, N. M. (2023). Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI. In 2023 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 284–294). IEEE. https://doi.org/10.1109/CLUSTER52292.2023.00031
-
OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning
/
Hunold, S., & Steiner, S. (2023). OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning. In Proceedings of PMBS 2022: performance modeling, benchmarking and simulation of high performance computer systems (pp. 123–128). IEEE. https://doi.org/10.1109/PMBS56514.2022.00016
Project: Autotune (2021–2025)
2022
- Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules / Träff, J. L. (2022). Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules. In K. Agrawal & I.-T. A. Lee (Eds.), Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2022) (pp. 143–146). ACM. https://doi.org/10.1145/3490148.3538560
-
An Overhead Analysis of MPI Profiling and Tracing Tools
/
Hunold, S., Ajanohoun, J. I., Vardas, I., & Träff, J. L. (2022). An Overhead Analysis of MPI Profiling and Tracing Tools. In C. Scully-Allison, R. Liem, & A. V. Solorzano (Eds.), PERMAVOST 2022: Proceedings of the 2nd Workshop on Performance Engineering, Modelling, Analysis, and Visualization Strategy (pp. 5–13). Association for Computing Machinery (ACM). https://doi.org/10.1145/3526063.3535353
Download: Open Access (985 KB)
Projects: Autotune (2021–2025) / Process Mapping (2019–2024) - Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia / Hunold, S., & Przybylski, B. (2022, May 18). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia [Conference Presentation]. New Challenges in Scheduling Theory (Centre CNRS “Paul-Langevin”, Aussois, France), Aussois, France. http://hdl.handle.net/20.500.12708/153814
- (Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI / Träff, J. L. (2022). (Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI. arXiv. https://doi.org/10.48550/arXiv.2205.10072
-
Performance Tuning of MPI Collectives - Status Quo and Open Problems
/
Hunold, S. (2022). Performance Tuning of MPI Collectives - Status Quo and Open Problems [Presentation]. CaSToRC HPC National Competence Center Fall Seminar Series 2022, Unknown. http://hdl.handle.net/20.500.12708/153709
Project: Autotune (2021–2025) - MPI Performance Tools under the Microscope: A Thorough Overhead Analysis / Ajanohoun, J. I., Vardas, I., Träff, J. L., & Hunold, S. (2022). MPI Performance Tools under the Microscope: A Thorough Overhead Analysis. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 16). EuroCC Austria. http://hdl.handle.net/20.500.12708/55697
- mpisee: MPI Profiling for Communication and Communicator Structure / Vardas, I., Hunold, S., Ajanohoun, J. I., & Träff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 15). EuroCC Austria. http://hdl.handle.net/20.500.12708/55696
- Fast(er) Construction of Round-optimal n-Block Broadcast Schedules / Träff, J. L. (2022). Fast(er) Construction of Round-optimal n-Block Broadcast Schedules. In Proceedings IEEE International Conference on Cluster Computing (CLUSTER 2022) (pp. 142–151). IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00028
-
mpisee: MPI Profiling for Communication and Communicator Structure
/
Vardas, I., Hunold, S., Ajanohoun, J. I., & Traff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2022) (pp. 520–529). IEEE. https://doi.org/10.1109/IPDPSW55747.2022.00092
Projects: Autotune (2021–2025) / Process Mapping (2019–2024) - Performance and programmability comparison of the thick control flow architecture and current multicore processors / Forsell, M., Nikula, S., Roivainen, J., Leppänen, V., & Träff, J. L. (2022). Performance and programmability comparison of the thick control flow architecture and current multicore processors. The Journal of Supercomputing, 78(3), 3152–3183. https://doi.org/10.1007/s11227-021-03985-0
2021
- A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation / Träff, J. L. (2021). A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation. arXiv. https://doi.org/10.48550/arXiv.2109.12626
- A more pragmatic implementation of the lock-free, ordered, linked list / Träff, J. L., & Pöter, M. (2021). A more pragmatic implementation of the lock-free, ordered, linked list. In J. Lee & E. Petrank (Eds.), Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM. https://doi.org/10.1145/3437801.3441579
-
MicroBench Maker: Reproduce, Reuse, Improve
/
Hunold, S., Ajanohoun, J. I., & Carpen-Amarie, A. (2021). MicroBench Maker: Reproduce, Reuse, Improve. In 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 12th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2021) in conjunction with SC 2021, St. Louis, Missouri, United States of America (the). IEEE. https://doi.org/10.1109/pmbs54543.2021.00013
Project: Autotune (2021–2025) - Teaching Complex Scheduling Algorithms / Hunold, S., & Przybylski, B. (2021). Teaching Complex Scheduling Algorithms. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 11th NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar 2021) in conjunction with 35th IEEE IPDPS 2021 - Online Conference, Portland, Oregon, USA, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw52791.2021.00058
-
MPI collective communication through a single set of interfaces: A case for orthogonality
/
Träff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2021). MPI collective communication through a single set of interfaces: A case for orthogonality. Parallel Computing: Systems & Applications, 107(102826), 102826. https://doi.org/10.1016/j.parco.2021.102826
Project: Process Mapping (2019–2024)
2020
- Classical and pipelined preconditioned conjugate gradient methods with node-failure resilience / Pachajoa, C., Levonyak, M., Pacher, C., Träff, J. L., & Gansterer, W. (2020). Classical and pipelined preconditioned conjugate gradient methods with node-failure resilience. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 13). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- Special issue: Selected papers from EuroMPI 2019 / Träff, J. L., & Hoefler, T. (2020). Special issue: Selected papers from EuroMPI 2019. Parallel Computing, 99, Article 102695. https://doi.org/10.1016/j.parco.2020.102695
- High-Quality Hierarchical Process Mapping / Faraj, M. F., van der Grinten, A., Meyerhenke, H., Träff, J. L., & Schulz, C. (2020). High-Quality Hierarchical Process Mapping. arXiv. https://doi.org/10.48550/arXiv.2001.07134
- k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms / Träff, J. L. (2020). k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms. arXiv. https://doi.org/10.48550/arXiv.2008.12144
-
Efficient Process-to-Node Mapping Algorithms for Stencil Computations
/
Hunold, S., von Kirchbach, K., Lehr, M., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. arXiv. https://doi.org/10.48550/arXiv.2005.09521
Project: Process Mapping (2019–2024) - Decomposing MPI Collectives for Exploiting Multi-lane Communication / Träff, J. L. (2020). Decomposing MPI Collectives for Exploiting Multi-lane Communication. SPCL_Bcast, ETH Zürich, Zürich, Switzerland. http://hdl.handle.net/20.500.12708/87082
-
High-Quality Hierarchical Process Mapping
/
Faraj, M. F., van der Grinten, A., Meyerhenke, H., Träff, J. L., & Schulz, C. (2020). High-Quality Hierarchical Process Mapping. In S. Faro & D. Cantone (Eds.), 18th International Symposium on Experimental Algorithms, SEA 2020 (pp. 4:1-4:15). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.SEA.2020.4
Project: Process Mapping (2019–2024) - Decomposing MPI Collectives for Exploiting Multi-lane Communication / Träff, J. L., & Hunold, S. (2020). Decomposing MPI Collectives for Exploiting Multi-lane Communication. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00037
- Predicting MPI Collective Communication Performance Using Machine Learning / Hunold, S., Bhatele, A., Bosilca, G., & Knees, P. (2020). Predicting MPI Collective Communication Performance Using Machine Learning. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00036
- Signature Datatypes for Type Correct Collective Operations, Revisited / Träff, J. L. (2020). Signature Datatypes for Type Correct Collective Operations, Revisited. In 27th European MPI Users’ Group Meeting. 27th European MPI Users’ Group Meeting (EuroMPI/USA 2020) - Online Conference, Austin, United States of America (the). IEEE. https://doi.org/10.1145/3416315.3416324
- Collectives and Communicators: A Case for Orthogonality / Träff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2020). Collectives and Communicators: A Case for Orthogonality. In 27th European MPI Users’ Group Meeting. 27th European MPI Users’ Group Meeting (EuroMPI/USA 2020) - Online Conference, Austin, United States of America (the). IEEE. https://doi.org/10.1145/3416315.3416319
-
Efficient Process-to-Node Mapping Algorithms for Stencil Computations
/
von Kirchbach, K., Lehr, M., Hunold, S., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00011
Project: Process Mapping (2019–2024) - Optimizing Memory Access in TCF Processors with Compute-Update Operations / Forsell, M., Roivainen, J., & Träff, J. L. (2020). Optimizing Memory Access in TCF Processors with Compute-Update Operations. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020) in conjunction with IPDPS 2020 - Online Conference, New Orleans, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw50202.2020.00100
- A more Pragmatic Implementation of the Lock-free, Ordered, Linked List / Träff, J. L., & Pöter, M. (2020). A more Pragmatic Implementation of the Lock-free, Ordered, Linked List. arXiv. https://doi.org/10.48550/arXiv.2010.15755
- Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia / Hunold, S., & Przybylski, B. (2020). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia. arXiv. https://doi.org/10.48550/arXiv.2003.05217
-
Better Process Mapping and Sparse Quadratic Assignment
/
Kirchbach, K. V., Schulz, C., & Träff, J. L. (2020). Better Process Mapping and Sparse Quadratic Assignment. ACM Journal on Experimental Algorithmics, 25, 1–19. https://doi.org/10.1145/3409667
Project: Process Mapping (2019–2024) - Improved Cartesian Topology Mapping in MPI / Lehr, M., & von Kirchbach, K. (2020). Improved Cartesian Topology Mapping in MPI. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 27). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- Exploiting Multi-lane Communication in MPI Collectives / Träff, J. L. (2020). Exploiting Multi-lane Communication in MPI Collectives. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 30). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
2019
- On Optimal Trees for Irregular Gather and Scatter Collectives / Träff, J. L. (2019). On Optimal Trees for Irregular Gather and Scatter Collectives. IEEE Transactions on Parallel and Distributed Systems, 30(9), 2060–2074. https://doi.org/10.1109/tpds.2019.2899843
-
How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures
/
Pachajoa, C., Levonyak, M., Gansterer, W., & Träff, J. L. (2019). How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures (1907.13077). arXiv. https://doi.org/10.48550/arXiv.1907.13077
Project: REPEAL (2016–2020) -
How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures
/
Pachajoa, C., Levonyak, M., Gansterer, W. N., & Träff, J. L. (2019). How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures. In Proceedings of the 48th International Conference on Parallel Processing. 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan. ACM. https://doi.org/10.1145/3337821.3337849
Project: REPEAL (2016–2020) - More Parallelism in Dijkstra's Single-Source Shortest Path Algorithm / Kainer, M., & Träff, J. L. (2019). More Parallelism in Dijkstra’s Single-Source Shortest Path Algorithm. arXiv. https://doi.org/10.48550/arXiv.1903.12085
- Decomposing Collectives for Exploiting Multi-lane Communication / Träff, J. L. (2019). Decomposing Collectives for Exploiting Multi-lane Communication. arXiv. https://doi.org/10.48550/arXiv.1910.13373
- Cartesian Collective Communication / Träff, J. L., & Hunold, S. (2019). Cartesian Collective Communication. In Proceedings of the 48th International Conference on Parallel Processing. 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan. ACM. https://doi.org/10.1145/3337821.3337848
- Cartesian Collective Communication: "Advice to users", "Advice to implementers", and "Advice to Standardizers" / Träff, J. L. (2019). Cartesian Collective Communication: “Advice to users”, “Advice to implementers”, and “Advice to Standardizers.” University of Bordeaux, Bordeaux, France. http://hdl.handle.net/20.500.12708/86914
- On optimal Trees for irregular gather and scatter collectives? / Träff, J. L. (2019). On optimal Trees for irregular gather and scatter collectives? FernUniversität in Hagen, Prof. Dr. Jörg Keller, Hagen, Germany. http://hdl.handle.net/20.500.12708/86906
- On optimal Trees for irregular gather and scatter collectives? / Träff, J. L. (2019). On optimal Trees for irregular gather and scatter collectives? Humboldt-Universität zu Berlin, Research Group on Modeling and Analysis of Complex Systems, Berlin, Germany. http://hdl.handle.net/20.500.12708/86893
- On Optimal Trees for Irregular Gather and Scatter Collectives / Träff, J. L. (2019). On Optimal Trees for Irregular Gather and Scatter Collectives. Kolloquium Mathematische Informatik, Goethe-Universität Frankfurt am Main, Frankfurt am Main, Germany. http://hdl.handle.net/20.500.12708/86874
- On the Importance of Data Quality when Tuning MPI Libraries / Hunold, S., & Carpen-Amarie, A. (2019). On the Importance of Data Quality when Tuning MPI Libraries. In G. Haase (Ed.), Austrian HPC Meeting 2019 - AHPC19 (AHPC19 booklet of abstracts) (p. 15). Institut für Mathematik und wissenschaftliches Rechnen der Universität Graz. http://hdl.handle.net/20.500.12708/57798
- LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources / Kainrad, T., Hunold, S., Seidel, T., & Langer, T. (2019). LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources. Journal of Chemical Information and Modeling, 59(1), 31–37. https://doi.org/10.1021/acs.jcim.8b00716
- Scalable Algorithms for MPI Intergroup Allgather and Allgatherv / Kang, Q., Träff, J. L., Al-Bahrani, R., Agrawal, A., Choudhary, A., & Liao, W. (2019). Scalable Algorithms for MPI Intergroup Allgather and Allgatherv. Parallel Computing: Systems & Applications, 85, 220–230. https://doi.org/10.1016/j.parco.2019.04.015
- Foreword EuroMPI 2019 / Träff, J. L., & Hoefler, T. (2019). Foreword EuroMPI 2019. In T. Hoefler & J. L. Träff (Eds.), Proceedings of the 26th European MPI Users’ Group Meeting on - EuroMPI ’19. ACM. https://doi.org/10.1145/3343211.3343212
- Proceedings of the 26th European MPI Users' Group Meeting, EuroMPI 2019 / Hoefler, T., & Träff, J. L. (Eds.). (2019). Proceedings of the 26th European MPI Users’ Group Meeting, EuroMPI 2019. ACM. http://hdl.handle.net/20.500.12708/24628
2018
- Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model / Pöter, M., & Träff, J. L. (2018). Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model. arXiv. https://doi.org/10.48550/arXiv.1805.08639
- Memory Models for C/C++ Programmers / Pöter, M., & Träff, J. L. (2018). Memory Models for C/C++ Programmers. arXiv. https://doi.org/10.48550/arXiv.1803.04432
- Parallel Quicksort without Pairwise Element Exchange / Träff, J. L. (2018). Parallel Quicksort without Pairwise Element Exchange. arXiv. https://doi.org/10.48550/arXiv.1804.07494
- Hierarchical Clock Synchronization in MPI / Hunold, S., & Carpen-Amarie, A. (2018). Hierarchical Clock Synchronization in MPI. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing, CLUSTER 2018, Belfast, United Kingdom, EU. IEEE. https://doi.org/10.1109/cluster.2018.00050
- Algorithm Selection of MPI Collectives Using Machine Learning Techniques / Hunold, S., & Carpen-Amarie, A. (2018). Algorithm Selection of MPI Collectives Using Machine Learning Techniques. In 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 9th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2018) in conjunction with SC 2018, Dallas, Texas, USA, Non-EU. IEEE. https://doi.org/10.1109/pmbs.2018.8641622
- Brief Announcement / Pöter, M., & Träff, J. L. (2018). Brief Announcement. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures. 30th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2018), Vienna, Austria, Austria. ACM. https://doi.org/10.1145/3210377.3210661
- <i>Stamp-it</i> , amortized constant-time memory reclamation in comparison to five other schemes / Pöter, M., & Träff, J. L. (2018). Stamp-it , amortized constant-time memory reclamation in comparison to five other schemes. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 23rd Symposium on Principles and Practice of Parallel Programming (PPoPP 2018), Vienna, Austria, Austria. ACM. https://doi.org/10.1145/3178487.3178532
-
Practical, distributed, low overhead algorithms for irregular gather and scatter collectives
/
Träff, J. L. (2018). Practical, distributed, low overhead algorithms for irregular gather and scatter collectives. Parallel Computing: Systems & Applications, 75, 100–117. https://doi.org/10.1016/j.parco.2018.04.003
Project: MPI (2013–2018) - Supporting concurrent memory access in TCF processor architectures / Forsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2018). Supporting concurrent memory access in TCF processor architectures. Microprocessors and Microsystems, 63, 226–236. https://doi.org/10.1016/j.micpro.2018.09.013
- On Optimal trees for Irregular Gather and Scatter Collectives / Träff, J. L. (2018). On Optimal trees for Irregular Gather and Scatter Collectives. Uppsala University, Uppsala, Sweden, EU. http://hdl.handle.net/20.500.12708/86726
- Full-Duplex Inter-Group All-to-All Broadcast Algorithms with Optimal Bandwidth / Kang, Q., Träff, J. L., Al-Bahrani, R., Agrawal, A., Choudhary, A., & Liao, W. (2018). Full-Duplex Inter-Group All-to-All Broadcast Algorithms with Optimal Bandwidth. In Proceedings of the 25th European MPI Users’ Group Meeting. 25th European MPI Users’ Group Meeting (EuroMPI 2018), Barcelona, Spain, EU. ACM. https://doi.org/10.1145/3236367.3236374
- Implementation of Multioperations in Thick Control Flow Processors / Forsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2018). Implementation of Multioperations in Thick Control Flow Processors. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 20th Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2018) in conjunction with IPDPS 2018, Vancouver, British Columbia, Canada, Non-EU. IEEE. https://doi.org/10.1109/ipdpsw.2018.00121
- Autotuning MPI Collectives using Performance Guidelines / Hunold, S., & Carpen-Amarie, A. (2018). Autotuning MPI Collectives using Performance Guidelines. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2018), Tokyo, Japan, Non-EU. ACM. https://doi.org/10.1145/3149457.3149461
2017
- A new and five older Concurrent Memory Reclamation Schemes in Comparison (Stamp-it) / Pöter, M., & Träff, J. L. (2017). A new and five older Concurrent Memory Reclamation Schemes in Comparison (Stamp-it). arXiv. https://doi.org/10.48550/arXiv.1712.06134
- On Optimal Trees for Irregular Gather and Scatter Collectives / Träff, J. L. (2017). On Optimal Trees for Irregular Gather and Scatter Collectives. arXiv. https://doi.org/10.48550/arXiv.1711.08731
- Better Process Mapping and Sparse Quadratic Assignment / Schulz, C., & Träff, J. L. (2017). Better Process Mapping and Sparse Quadratic Assignment. arXiv. https://doi.org/10.48550/arXiv.1702.04164
-
Practical, Linear-time, Fully Distributed Algorithms for Irregular Gather and Scatter
/
Träff, J. L. (2017). Practical, Linear-time, Fully Distributed Algorithms for Irregular Gather and Scatter (1702.05967). arXiv. https://doi.org/10.48550/arXiv.1702.05967
Project: MPI (2013–2018) - VieM v1.00 - Vienna Mapping and Sparse Quadratic Assignment User Guide / Schulz, C., & Träff, J. L. (2017). VieM v1.00 - Vienna Mapping and Sparse Quadratic Assignment User Guide. arXiv. https://doi.org/10.48550/arXiv.1703.05509
-
Micro-benchmarking MPI Neighborhood Collective Operations
/
Lübbe, F. D. (2017). Micro-benchmarking MPI Neighborhood Collective Operations. In F. F. Rivera, T. F. Pena, & J. C. Cabaleiro (Eds.), Euro-Par 2017: Parallel Processing 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28 – September 1, 2017, Proceedings (pp. 65–78). Springer. https://doi.org/10.1007/978-3-319-64203-1_5
Project: MPI (2013–2018) - Tuning MPI Collectives by Verifying Performance Guidelines / Hunold, S., & Carpen-Amarie, A. (2017). Tuning MPI Collectives by Verifying Performance Guidelines. arXiv. https://doi.org/10.48550/arXiv.1707.09965
- Better Process Mapping and Sparse Quadratic Assignment / Schulz, C., & Träff, J. L. (2017). Better Process Mapping and Sparse Quadratic Assignment. In C. S. Iliopoulos, S. P. Pissis, S. J. Puglisi, & R. Raman (Eds.), 16th International Symposium on Experimental Algorithms, SEA 2017 (pp. 4:1-4:15). Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH. https://doi.org/10.4230/LIPIcs.SEA.2017.4
-
On expected and observed communication performance with MPI derived datatypes
/
Carpen-Amarie, A., Hunold, S., & Träff, J. L. (2017). On expected and observed communication performance with MPI derived datatypes. Parallel Computing: Systems & Applications, 69, 98–117. https://doi.org/10.1016/j.parco.2017.08.006
Projects: EPiGRAM (2013–2016) / MPI (2013–2018) - Scheduling Independent Moldable Tasks on Multi-Cores with GPUs / Bleuse, R., Hunold, S., Kedad-Sidhoum, S., Monna, F., Mounie, G., & Trystram, D. (2017). Scheduling Independent Moldable Tasks on Multi-Cores with GPUs. IEEE Transactions on Parallel and Distributed Systems, 28(9), 2689–2702. https://doi.org/10.1109/tpds.2017.2675891
- MPI Is 25 Years Old! / Lusk, E., & Träff, J. L. (2017). MPI Is 25 Years Old! HPCwire, MAY 1. http://hdl.handle.net/20.500.12708/146783
- Autotuning MPI Collectives using Performance Guidelines / Hunold, S., & Carpen-Amarie, A. (2017). Autotuning MPI Collectives using Performance Guidelines. LIG - Bâtiment IMAG, St Martin d’Hères, France, EU. http://hdl.handle.net/20.500.12708/86599
- The past 25 years of MPI / Träff, J. L. (2017). The past 25 years of MPI. Panel at ISC High Performance Conference 2017 - The HPC Event, Intel booth, Frankfurt, Germany, EU. http://hdl.handle.net/20.500.12708/86517
- Fast Processing of MPI Derived Datatypes? / Träff, J. L. (2017). Fast Processing of MPI Derived Datatypes? Mini Workshop Algorithms Engineering, Uni Wien, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/86518
- High Performance Expectations for MPI / Träff, J. L. (2017). High Performance Expectations for MPI. Friedrich-Alexander-Universität Erlangen-Nürnberg, Prof. Dr. Gerhard Wellein, Erlangen, Germany, EU. http://hdl.handle.net/20.500.12708/86505
- Is Gossip-inspired reduction competitive in high performance computing? / Wimmer, E. (2017). Is Gossip-inspired reduction competitive in high performance computing? International Workshop on Parallel Numerics (PARNUM 2017), Waischenfeld, Germany, EU. http://hdl.handle.net/20.500.12708/86504
- Euro-Par 2016: Parallel Processing Workshops / Desprez, F., Dutot, P.-F., Kaklamanis, C., Marchal, L., Molitorisz, K., Ricci, L., Scarano, V., Vega-Rodriguez, M. A., Varbanescu, A. L., Hunold, S., Scott, S. L., Lankes, S., & Weidendorfer, J. (Eds.). (2017). Euro-Par 2016: Parallel Processing Workshops. Springer Nature Switzerland AG 2021. https://doi.org/10.1007/978-3-319-58943-5
- Exploiting Common Neighborhoods to Optimize MPI Neighborhood Collectives / Mirsadeghi, S. H., Träff, J. L., Balaji, P., & Afsahi, A. (2017). Exploiting Common Neighborhoods to Optimize MPI Neighborhood Collectives. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC). 24th IEEE International Conference on High Performance Computing (HiPC 2017), Jaipur, India, Non-EU. IEEE. https://doi.org/10.1109/hipc.2017.00047
- Supporting concurrent memory access in TCF-aware processor architectures / Forsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2017). Supporting concurrent memory access in TCF-aware processor architectures. In J. Nurmi, M. Vesterbacka, J. J. Wikner, A. Alvandpour, M. Nielsen-Lönn, & I. R. Nielsen (Eds.), 2017 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC). IEEE. https://doi.org/10.1109/norchip.2017.8124962
- Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node / Heinrich, F. C., Cornebize, T., Degomme, A., Legrand, A., Carpen-Amarie, A., Hunold, S., Orgerie, A.-C., & Quinson, M. (2017). Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (CLUSTER 2017), Honolulu, Hawaii, USA, Non-EU. IEEE. https://doi.org/10.1109/cluster.2017.66
-
Practical, linear-time, fully distributed algorithms for irregular gather and scatter
/
Träff, J. L. (2017). Practical, linear-time, fully distributed algorithms for irregular gather and scatter. In Proceedings of the 24th European MPI Users’ Group Meeting on - EuroMPI ’17. 24th European MPI Users’ Group Meeting (EuroMPI/USA 2017), Chicago, IL, USA, Non-EU. ACM. https://doi.org/10.1145/3127024.3127025
Project: MPI (2013–2018) - Introduction to REPPAR Workshop / Hunold, S., Legrand, A., & Nussbaum, L. (2017). Introduction to REPPAR Workshop. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. https://doi.org/10.1109/ipdpsw.2017.221
- High Performance Expectations for MPI / Träff, J. L. (2017). High Performance Expectations for MPI. In G. Baumgartner & J. Courian (Eds.), AHPC 2017, Austrian HPC Meeting 2017 (p. 33). FSP Scientific Computing, University of Innsbruck. http://hdl.handle.net/20.500.12708/56920
2016
- Message-Combining Algorithms for Isomorphic, Sparse Collective Communication / Träff, J. L., Carpen-Amarie, A., Hunold, S., & Rougier, A. (2016). Message-Combining Algorithms for Isomorphic, Sparse Collective Communication. arXiv. https://doi.org/10.48550/arXiv.1606.07676
- Benchmarking Concurrent Priority Queues: Performance of k-LSM and Related Data Structures / Gruber, J., Träff, J. L., & Wimmer, M. (2016). Benchmarking Concurrent Priority Queues: Performance of k-LSM and Related Data Structures. arXiv. https://doi.org/10.48550/arXiv.1603.05047
-
PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines
/
Hunold, S., Carpen-Amarie, A., Lübbe, F. D., & Träff, J. L. (2016). PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines. arXiv. https://doi.org/10.48550/arXiv.1606.00215
Projects: MPI (2013–2018) / ReproPC (2013–2016) -
MPI Derived Datatypes: Performance Expectations and Status Quo
/
Carpen-Amarie, A., Hunold, S., & Träff, J. L. (2016). MPI Derived Datatypes: Performance Expectations and Status Quo. arXiv. https://doi.org/10.48550/arXiv.1607.00178
Projects: EPiGRAM (2013–2016) / MPI (2013–2018) -
The EPiGRAM Project: Preparing Parallel Programming Models for Exascale
/
Markidis, S., Peng, I. B., Larsson Träff, J., Rougier, A., Bartsch, V., Machado, R., Rahn, M., Hart, A., Holmes, D., Bull, M., & Laure, E. (2016). The EPiGRAM Project: Preparing Parallel Programming Models for Exascale. In M. Taufer, B. Mohr, & J. M. Kunkel (Eds.), High Performance Computing : ISC High Performance 2016 International Workshops, ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P^3MA, VHPC, WOPSSS, Frankfurt, Germany, June 19–23, 2016, Revised Selected Papers (pp. 56–68). Springer International Publishing. https://doi.org/10.1007/978-3-319-46079-6_5
Project: EPiGRAM (2013–2016) - The art of benchmarking MPI libraries / Hunold, S. (2016). The art of benchmarking MPI libraries. Austrian HPC Meeting 2016 - AHPC16, Grundlsee, Austria. http://hdl.handle.net/20.500.12708/86269
- Brief Announcement: Benchmarking Concurrent Priority Queues: / Gruber, J., Träff, J. L., & Wimmer, M. (2016). Brief Announcement: Benchmarking Concurrent Priority Queues: In SPAA ’16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 361–362). ACM. https://doi.org/10.1145/2935764.2935803
- Viewpoint: (Mis)Managing Parallel Computing Research through EU Project Funding / Träff, J. L. (2016). Viewpoint: (Mis)Managing Parallel Computing Research through EU Project Funding. Communications of the ACM, 59(12), 46–48. https://doi.org/10.1145/2948893
- Governing energy consumption in Hadoop through CPU frequency scaling: An analysis / Ibrahim, S., Phan, T.-D., Carpen-Amarie, A., Chihoub, H.-E., Moise, D., & Antoniu, G. (2016). Governing energy consumption in Hadoop through CPU frequency scaling: An analysis. Future Generation Computer Systems: The International Journal of EScience, 54, 219–232. http://hdl.handle.net/20.500.12708/148922
- Editorial: Special Issue: Euro-Par 2015 / Lengauer, C., Bougé, L., & Träff, J. L. (2016). Editorial: Special Issue: Euro-Par 2015. Concurrency and Computation: Practice and Experience, 28(12), 3445–3446. http://hdl.handle.net/20.500.12708/148865
- Polynomial-Time Construction of Optimal MPI Derived Datatype Trees / Träff, J. L. (2016). Polynomial-Time Construction of Optimal MPI Derived Datatype Trees. Leibniz-Rechenzentrum (LRZ), Garching bei München, Germany, EU. http://hdl.handle.net/20.500.12708/86364
-
On The Power of Structured Data in MPI
/
Träff, J. L. (2016). On The Power of Structured Data in MPI. Guest Lecture of the course: Parallel and High Performance Computing, LMU Munich, Munich, Germany, EU. http://hdl.handle.net/20.500.12708/86357
Project: MPI (2013–2018) - The Art of MPI Benchmarking / Hunold, S. (2016). The Art of MPI Benchmarking. 45th SPEEDUP Workshop on High-Performance Computing, Basel, Switzerland, Non-EU. http://hdl.handle.net/20.500.12708/86310
- Tutorial: Effective MPI Programming: concepts, advanced features, do's and dont's / Träff, J. L. (2016). Tutorial: Effective MPI Programming: concepts, advanced features, do’s and dont’s. Tutorial on MPI at the 22nd International European Conference on Parallel and Distributed Computing (Euro-Par 2016), Grenoble, France, EU. http://hdl.handle.net/20.500.12708/86292
- The Art of MPI Benchmarking / Hunold, S. (2016). The Art of MPI Benchmarking. Lunchtime Seminar, Department of Computer Science, University of Innsbruck, Innsbruck, Austria, Austria. http://hdl.handle.net/20.500.12708/86282
- Clock Synchronization Algorithms and SimGrid / Hunold, S. (2016). Clock Synchronization Algorithms and SimGrid. SimGrid User Days, CNRS center Villa Clythia, Fréjus, France, EU. http://hdl.handle.net/20.500.12708/86260
- Effective MPI Programming: Concepts, Advanced Features, Do's and Don'ts / Träff, J. L. (2016). Effective MPI Programming: Concepts, Advanced Features, Do’s and Don’ts. Vienna Scientific Cluster: VSC School Seminar, TU Wien, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/86253
-
Polynomial-Time Construction of Optimal MPI Derived Datatype Trees
/
Ganian, R., Kalany, M., Szeider, S., & Träff, J. L. (2016). Polynomial-Time Construction of Optimal MPI Derived Datatype Trees. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE 30th International Parallel and Distributed Processing Symposium (IPDPS 2016), Chicago, Illinois, USA, Non-EU. IEEE Computer Society. https://doi.org/10.1109/ipdps.2016.13
Project: EPiGRAM (2013–2016) - The art of benchmarking MPI libraries / Hunold, S., Carpen-Amarie, A., & Träff, J. L. (2016). The art of benchmarking MPI libraries. In I. Reichl, C. Blaas-Schenner, & J. Zabloudil (Eds.), Austrian HPC Meeting 2016 - AHPC 2016 (p. 45). Vienna Scientific Cluster (VSC). http://hdl.handle.net/20.500.12708/56921
-
On the Expected and Observed Communication Performance with MPI Derived Datatypes
/
Carpen-Amarie, A., Hunold, S., & Träff, J. L. (2016). On the Expected and Observed Communication Performance with MPI Derived Datatypes. In D. Holmes, A. Collis, J. L. Träff, & L. Smith (Eds.), Proceedings of the 23rd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2966884.2966905
Projects: EPiGRAM (2013–2016) / MPI (2013–2018) -
A Library for Advanced Datatype Programming
/
Träff, J. L. (2016). A Library for Advanced Datatype Programming. In D. Holmes, A. Collis, J. L. Träff, & L. Smith (Eds.), Proceedings of the 23rd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2966884.2966904
Project: EPiGRAM (2013–2016) - High Performance Parallel Summed-Area Table Kernels for Multi-core and Many-core Systems / Papatriantafyllou, A., & Sacharidis, D. (2016). High Performance Parallel Summed-Area Table Kernels for Multi-core and Many-core Systems. In P.-F. Dutot & D. Trystram (Eds.), Euro-Par 2016: Parallel Processing (pp. 306–318). Springer International Publishing. https://doi.org/10.1007/978-3-319-43659-3_23
-
Automatic Verification of Self-consistent MPI Performance Guidelines
/
Hunold, S., Carpen-Amarie, A., Lübbe, F. D., & Träff, J. L. (2016). Automatic Verification of Self-consistent MPI Performance Guidelines. In P.-F. Dutot & D. Trystram (Eds.), Euro-Par 2016: Parallel Processing (pp. 433–446). Springer International Publishing. https://doi.org/10.1007/978-3-319-43659-3_32
Projects: MPI (2013–2018) / ReproPC (2013–2016) - Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016 / Holmes, D., Collis, A., Träff, J. L., & Smith, L. (Eds.). (2016). Proceedings of the 23rd European MPI Users’ Group Meeting, EuroMPI 2016. ACM. http://hdl.handle.net/20.500.12708/24173
2015
- The Shortest Path Problem with Edge Information Reuse is NP-Complete / Träff, J. L. (2015). The Shortest Path Problem with Edge Information Reuse is NP-Complete. arXiv. https://doi.org/10.48550/arXiv.1509.05637
- Polynomial-time Construction of Optimal Tree-structured Communication Data Layout Descriptions / Ganian, R., Kalany, M., Szeider, S., & Träff, J. L. (2015). Polynomial-time Construction of Optimal Tree-structured Communication Data Layout Descriptions. arXiv. https://doi.org/10.48550/arXiv.1506.09100
- The Lock-free k-LSM Relaxed Priority Queue / Wimmer, M., Gruber, J., Träff, J. L., & Tsigas, P. (2015). The Lock-free k-LSM Relaxed Priority Queue. arXiv. https://doi.org/10.48550/arXiv.1503.05698
- A Survey on Reproducibility in Parallel Computing / Hunold, S. (2015). A Survey on Reproducibility in Parallel Computing. arXiv. https://doi.org/10.48550/arXiv.1511.04217
- MPI Benchmarking Revisited: Experimental Design and Reproducibility / Hunold, S., & Carpen-Amarie, A. (2015). MPI Benchmarking Revisited: Experimental Design and Reproducibility. arXiv. https://doi.org/10.48550/arXiv.1505.07734
-
One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints
/
Hunold, S. (2015). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. Concurrency and Computation: Practice and Experience, 27(4), 1010–1026. http://hdl.handle.net/20.500.12708/150641
Project: ReproPC (2013–2016) -
Reproducibility in Parallel Computing
/
Hunold, S. (2015). Reproducibility in Parallel Computing. Session: Performance Reproducibility in HPC - Challenges and State-of-the-Art at the 27th International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2015), Austin, Texas, Non-EU. http://hdl.handle.net/20.500.12708/86091
Project: ReproPC (2013–2016) - Accurately Measuring MPI Collectives with Synchronized Clocks / Hunold, S. (2015). Accurately Measuring MPI Collectives with Synchronized Clocks. Dagstuhl Seminar 15281: Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems, Schloss Dagstuhl, Wadern, Germany, EU. http://hdl.handle.net/20.500.12708/86057
- The Power of Structured Data in MPI / Träff, J. L. (2015). The Power of Structured Data in MPI. The University of Texas at Austin, Prof. Robert A. van de Geijn, Austin, Texas, Non-EU. http://hdl.handle.net/20.500.12708/86053
- One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints / Hunold, S. (2015). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. Wirtschaftswissenschaftliche Fakultät, Universität Augsburg, Augsburg, Deutschland, EU. http://hdl.handle.net/20.500.12708/86038
- MPI Datatype reconstruction (for vector and index types) / Träff, J. L. (2015). MPI Datatype reconstruction (for vector and index types). Compilers and Languages Group, Institute of Computer Languages, TU Wien, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/86126
- The Relative Power of Synchronization Primitives / Träff, J. L. (2015). The Relative Power of Synchronization Primitives. Computational Mathematics in Engineering Group - Prof. Dr. Joachim Schöberl, Institute for Analysis and Scientific Computing, TU Wien, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/86043
- Energy Characterization and Optimization of Parallel Prefix-Sums Kernels / Papatriantafyllou, A. (2015). Energy Characterization and Optimization of Parallel Prefix-Sums Kernels. In S. Hunold, A. Costan, D. Gimenez, A. Iosup, L. Ricci, M. E. Gomez Requena, V. Scarano, A. L. Varbanescu, S. L. Scott, S. Lankes, J. Weidendorfer, & M. Alexander (Eds.), Euro-Par 2015: Parallel Processing Workshops (pp. 685–696). Springer International Publishing. https://doi.org/10.1007/978-3-319-27308-2_55
-
Specification Guideline Violations by MPI_Dims_create
/
Träff, J. L., & Lübbe, F. D. (2015). Specification Guideline Violations by MPI_Dims_create. In J. Dongarra, A. Denis, B. Goglin, E. Jeannot, & G. Mercier (Eds.), Proceedings of the 22nd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2802658.2802677
Project: MPI (2013–2018) -
On the Impact of Synchronizing Clocks and Processes on Benchmarking MPI Collectives
/
Hunold, S., & Carpen-Amarie, A. (2015). On the Impact of Synchronizing Clocks and Processes on Benchmarking MPI Collectives. In J. Dongarra, A. Denis, B. Goglin, E. Jeannot, & G. Mercier (Eds.), Proceedings of the 22nd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2802658.2802662
Projects: MPI (2013–2018) / ReproPC (2013–2016) -
Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations
/
Träff, J. L., Lübbe, F. D., Rougier, A., & Hunold, S. (2015). Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations. In J. Dongarra, A. Denis, B. Goglin, E. Jeannot, & G. Mercier (Eds.), Proceedings of the 22nd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2802658.2802663
Projects: EPiGRAM (2013–2016) / MPI (2013–2018) -
Efficient, Optimal MPI Datatype Reconstruction for Vector and Index Types
/
Kalany, M., & Träff, J. L. (2015). Efficient, Optimal MPI Datatype Reconstruction for Vector and Index Types. In J. Dongarra, A. Denis, B. Goglin, E. Jeannot, & G. Mercier (Eds.), Proceedings of the 22nd European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2802658.2802671
Project: EPiGRAM (2013–2016) - The lock-free k-LSM relaxed priority queue / Wimmer, M., Gruber, J., Träff, J. L., & Tsigas, P. (2015). The lock-free k-LSM relaxed priority queue. In A. Cohen & D. Grove (Eds.), Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM. https://doi.org/10.1145/2688500.2688547
- Euro-Par 2015: Parallel Processing Workshops / Euro-Par 2015: Parallel Processing Workshops. (2015). In S. Hunold, A. Costan, D. Gimenez, A. Iosup, L. Ricci, M. E. Gomez Requena, V. Scarano, A. L. Varbanescu, S. L. Scott, S. Lankes, J. Weidendorfer, & M. Alexander (Eds.), Lecture Notes in Computer Science. Springer International Publishing. https://doi.org/10.1007/978-3-319-27308-2
- Euro-Par 2015: Parallel Processing / Euro-Par 2015: Parallel Processing. (2015). In J. L. Träff, S. Hunold, & F. Versaci (Eds.), Lecture Notes in Computer Science. Springer-Verlag Berlin Heidelberg. https://doi.org/10.1007/978-3-662-48096-0
2014
- Preface: Selected Papers from EuroMPI 2012 / Träff, J. L., & Benkner, S. (2014). Preface: Selected Papers from EuroMPI 2012. Computing, 96(4), 259–261. https://doi.org/10.1007/s00607-013-0335-z
- An improved, easily computable combinatorial lower bound for weighted graph bipartitioning / Träff, J. L., & Wimmer, M. (2014). An improved, easily computable combinatorial lower bound for weighted graph bipartitioning. arXiv. https://doi.org/10.48550/arXiv.1410.0462
- Stepping Stones to Reproducible Research: A Study of Current Practices in Parallel Computing / Carpen-Amarie, A., Rougier, A., & Lübbe, F. D. (2014). Stepping Stones to Reproducible Research: A Study of Current Practices in Parallel Computing. In L. Lopes, J. Zilinskas, A. Costan, R. G. Cascella, G. Kecskemeti, E. Jeannot, M. Cannataro, L. Ricci, S. Benkner, S. Petit, V. Scarano, J. Gracia, S. Hunold, S. L. Scott, S. Lankes, C. Lengauer, J. Carretero, J. Breitbart, & M. Alexander (Eds.), Euro-Par 2014: Parallel Processing Workshops Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part I (pp. 499–510). Springer International Publishing. https://doi.org/10.1007/978-3-319-14325-5_43
- Pheet meets C++11 / Pöter, M. (2014). Pheet meets C++11. arXiv. https://doi.org/10.48550/arXiv.1411.1951
- Perfectly Load-Balanced, Stable, Synchronization-Free Parallel Merge / Siebert, C., & Träff, J. L. (2014). Perfectly Load-Balanced, Stable, Synchronization-Free Parallel Merge. Parallel Processing Letters, 24(01), 1450005. https://doi.org/10.1142/s0129626414500054
-
Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think
/
Hunold, S., Carpen-Amarie, A., & Träff, J. L. (2014). Reproducible MPI Micro-Benchmarking Isn’t As Easy As You Think. Research Group Theory and Applications of Algorithms, University of Vienna, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/85872
Projects: MPI (2013–2018) / ReproPC (2013–2016) -
One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints
/
Hunold, S. (2014). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. AIT Austrian Institute of Technology, Seibersdorf, Austria, Austria. http://hdl.handle.net/20.500.12708/85871
Project: ReproPC (2013–2016) - The Power of Structured Data in MPI / Träff, J. L. (2014). The Power of Structured Data in MPI. Compiler Technology and Computer Architecure Group at the University of Hertfordshire, Hertfordshire, United Kingdom, EU. http://hdl.handle.net/20.500.12708/85832
- The Power of Structured Data in MPI / Träff, J. L. (2014). The Power of Structured Data in MPI. Research Group Theory and Applications of Algorithms and Research Group Scientific Computing, University of Vienna, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/85825
- Moldable Task Scheduling: Theory and Practice / Hunold, S. (2014). Moldable Task Scheduling: Theory and Practice. Workshop on New Challenges in Scheduling Theory, Aussois, France, EU. http://hdl.handle.net/20.500.12708/85817
-
Reproducibility of Experiments: It's about the WHO and less the HOW
/
Hunold, S. (2014). Reproducibility of Experiments: It’s about the WHO and less the HOW. Panel on reproducible research methodologies and new publication models, 4th International Workshop on Adaptive Self-tuning Computing Systems (ADAPT 2014) co-located with HiPEAC 2014, Vienna, Austria, Austria. http://hdl.handle.net/20.500.12708/85814
Project: ReproPC (2013–2016) - One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints / Hunold, S. (2014). One Step towards Bridging the Gap between Theory and Practice in Moldable Task Scheduling with Precedence Constraints. 9th Scheduling for Large Scale Systems Workshop, Lyon, France, EU. http://hdl.handle.net/20.500.12708/85812
-
The Power of Structured Data in MPI
/
Träff, J. L. (2014). The Power of Structured Data in MPI. I3MS Seminar Series, Aachen GRS, RWTH Aachen, Aachen, Germany, EU. http://hdl.handle.net/20.500.12708/85805
Projects: EPiGRAM (2013–2016) / MPI (2013–2018) - Datatypes in Exascale message-passing / Rougier, A. (2014). Datatypes in Exascale message-passing. 3rd Vienna Scientific Cluster User Workshop, Neusiedl am See, Austria. http://hdl.handle.net/20.500.12708/85788
- Multi-core prefix-sums / Papatriantafyllou, A. (2014). Multi-core prefix-sums. 3rd Vienna Scientific Cluster User Workshop, Neusiedl am See, Austria. http://hdl.handle.net/20.500.12708/85787
- Implementing a classic: zero-copy all-to-all communication with MPI datatypes / Träff, J. L. (2014). Implementing a classic: zero-copy all-to-all communication with MPI datatypes. Department of Computer Science, University of Copenhagen, Copenhagen, Denmark, EU. http://hdl.handle.net/20.500.12708/85783
- Euro-Par 2014: Parallel Processing Workshops / Lopes, L., Zilinskas, J., Costan, A., Cascella, R. G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S. L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., & Alexander, M. (Eds.). (2014). Euro-Par 2014: Parallel Processing Workshops. Springer. https://doi.org/10.1007/978-3-319-14313-2
- Euro-Par 2014: Parallel Processing Workshops / Lopes, L., Zilinskas, J., Costan, A., Cascella, R. G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S. L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., & Alexander, M. (Eds.). (2014). Euro-Par 2014: Parallel Processing Workshops. Springer. https://doi.org/10.1007/978-3-319-14325-5
-
Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think
/
Hunold, S., Carpen-Amarie, A., & Träff, J. L. (2014). Reproducible MPI Micro-Benchmarking Isn’t As Easy As You Think. In J. Dongarra, Y. Ishikawa, & A. Hori (Eds.), Proceedings of the 21st European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2642769.2642785
Projects: MPI (2013–2018) / ReproPC (2013–2016) -
Optimal MPI Datatype Normalization for Vector and Index-block Types
/
Träff, J. L. (2014). Optimal MPI Datatype Normalization for Vector and Index-block Types. In J. Dongarra, Y. Ishikawa, & A. Hori (Eds.), Proceedings of the 21st European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2642769.2642771
Project: EPiGRAM (2013–2016) - Zero-copy, Hierarchical Gather is not possible with MPI Datatypes and Collectives / Träff, J. L., & Rougier, A. (2014). Zero-copy, Hierarchical Gather is not possible with MPI Datatypes and Collectives. In J. Dongarra, Y. Ishikawa, & A. Hori (Eds.), Proceedings of the 21st European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2642769.2642772
- MPI Collectives and Datatypes for Hierarchical All-to-all Communication / Träff, J. L., & Rougier, A. (2014). MPI Collectives and Datatypes for Hierarchical All-to-all Communication. In J. Dongarra, Y. Ishikawa, & A. Hori (Eds.), Proceedings of the 21st European MPI Users’ Group Meeting. ACM. https://doi.org/10.1145/2642769.2642770
- Scheduling Moldable Tasks with Precedence Constraints and Arbitrary Speedup Functions on Multiprocessors / Hunold, S. (2014). Scheduling Moldable Tasks with Precedence Constraints and Arbitrary Speedup Functions on Multiprocessors. In R. Wyrzykowski, J. Dongarra, K. Karczewski, & J. Wasniewski (Eds.), Parallel Processing and Applied Mathematics (pp. 13–25). Springer. https://doi.org/10.1007/978-3-642-55195-6_2
- Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop / Ibrahim, S., Moise, D., Chihoub, H.-E., Carpen-Amarie, A., Bougé, L., & Antoniu, G. (2014). Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop. In F. Pop & M. Potop-Butucaru (Eds.), Adaptive Resource Management and Scheduling for Cloud Computing (pp. 147–164). Springer International Publishing. https://doi.org/10.1007/978-3-319-13464-2_11
- Implementing a classic / Träff, J. L., Rougier, A., & Hunold, S. (2014). Implementing a classic. In M. Gerndt, P. Stenström, L. Rauchwerger, B. Miller, & M. Schulz (Eds.), Proceedings of the 28th ACM international conference on Supercomputing - ICS ’14. ACM. https://doi.org/10.1145/2597652.2597662
- Data structures for task-based priority scheduling / Wimmer, M., Versaci, F., Träff, J. L., Cederman, D., & Tsigas, P. (2014). Data structures for task-based priority scheduling. In Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP ’14. 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, Orlando, Florida, USA, Non-EU. ACM. https://doi.org/10.1145/2555243.2555278
2013
- Fair scheduling of bag-of-tasks applications using distributed Lagrangian optimization / Bertin, R., Hunold, S., Legrand, A., & Touati, C. (2013). Fair scheduling of bag-of-tasks applications using distributed Lagrangian optimization. Journal of Parallel and Distributed Computing, 74(1), 1914–1929. https://doi.org/10.1016/j.jpdc.2013.08.011
- Perfectly load-balanced, optimal, stable, parallel merge / Siebert, C., & Träff, J. L. (2013). Perfectly load-balanced, optimal, stable, parallel merge. arXiv. https://doi.org/10.48550/arXiv.1303.4312
-
Configurable Strategies for Work-stealing
/
Wimmer, M., Cederman, D., Träff, J. L., & Tsigas, P. (2013). Configurable Strategies for Work-stealing. arXiv. https://doi.org/10.48550/arXiv.1305.6474
Project: PEPPHER (2011–2012) - A Note on (Parallel) Depth- and Breadth-First Search by Arc Elimination / Träff, J. L. (2013). A Note on (Parallel) Depth- and Breadth-First Search by Arc Elimination. arXiv. https://doi.org/10.48550/arXiv.1305.1222
- On the State and Importance of Reproducible Experimental Research in Parallel Computing / Hunold, S., & Träff, J. L. (2013). On the State and Importance of Reproducible Experimental Research in Parallel Computing. arXiv. https://doi.org/10.48550/arXiv.1308.3648
- Data Structures for Task-based Priority Scheduling / Wimmer, M., Cederman, D., Versaci, F., Träff, J. L., & Tsigas, P. (2013). Data Structures for Task-based Priority Scheduling. arXiv. https://doi.org/10.48550/arXiv.1312.2501
- BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism / Tzenakis, G., Papatriantafyllou, A., Vandierendonck, H., Pratikakis, P., & Nikolopoulos, D. S. (2013). BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism. In C. Wu & A. Cohen (Eds.), Advanced Parallel Processing Technologies (pp. 17–31). Springer. https://doi.org/10.1007/978-3-642-45293-2_2
-
The Pheet Task-Scheduling Framework on the Intel&#x00AE; Xeon Phi Coprocessor and other Multicore Architectures
/
Wimmer, M., Pöter, M., & Träff, J. L. (2013). The Pheet Task-Scheduling Framework on the Intel&#x00AE; Xeon Phi Coprocessor and other Multicore Architectures. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. Workshop on Multithreaded Architectures and Applications (MTAAP 2013) in conjunction with IPDPS 2013, Boston, Massachusetts, USA, Non-EU. IEEE Computer Society. https://doi.org/10.1109/ipdpsw.2013.22
Project: PEPPHER (2011–2012) - On the Scalability of Moldable Task Scheduling Algorithms / Hunold, S. (2013). On the Scalability of Moldable Task Scheduling Algorithms. Dagstuhl Seminar 13381: Algorithms and Scheduling Techniques for Exascale Systems, Schloss Dagstuhl, Wadern, Germany, EU. http://hdl.handle.net/20.500.12708/85623
- Large-scale message passing concepts in EPiGRAM / Träff, J. L. (2013). Large-scale message passing concepts in EPiGRAM. Workshop on Exascale MPI (ExaMPI 2013) at Supercomputing Conference 2013, Denver, Colorado, USA, Non-EU. http://hdl.handle.net/20.500.12708/85624
- Can I repeat your parallel computing experiment? Yes, you can't / Hunold, S. (2013). Can I repeat your parallel computing experiment? Yes, you can’t. Technische Universität Dresden, Zentrale für Informationsdienste und Hochleistungsrechnen (ZIH), Dresden, Deutschland, EU. http://hdl.handle.net/20.500.12708/85620
- History of MPI / Träff, J. L. (2013). History of MPI. UPMARC Summer School on Multicore Computing, Uppsala University, Uppsala, Sweden, EU. http://hdl.handle.net/20.500.12708/85614
- OutFlank routing: increasing throughput in toroidal interconnection networks / Versaci, F. (2013). OutFlank routing: increasing throughput in toroidal interconnection networks. Scalable Approaches to High Performance and High Productivity Computing, ScalPerf’13, Bertinoro, Italy, EU. http://hdl.handle.net/20.500.12708/85607
- Work-stealing with Configurable Scheduling Strategies / Wimmer, M., Cederman, D., Träff, J. L., & Tsigas, P. (2013). Work-stealing with Configurable Scheduling Strategies. MADALGO Summer School on DATA STRUCTURES, Aarhus University, Denmark, EU. http://hdl.handle.net/20.500.12708/85595
- Challenges in Message-Passing Interfaces for Large-Scale Parallel Systems / Träff, J. L. (2013). Challenges in Message-Passing Interfaces for Large-Scale Parallel Systems. UPMARC Summer School on Multicore Computing, Uppsala University, Uppsala, Sweden, EU. http://hdl.handle.net/20.500.12708/85569
-
The Pheet Task-Scheduling Framework
/
Wimmer, M. (2013). The Pheet Task-Scheduling Framework. Massachusetts Institute of Technology - Group CSAIL, Cambridge, Massachusetts, USA, Non-EU. http://hdl.handle.net/20.500.12708/84680
Project: PEPPHER (2011–2012) - Unique Features of MPI: Collective Operations on Structured Data / Träff, J. L. (2013). Unique Features of MPI: Collective Operations on Structured Data. 20th European MPI Users’ Group Meeting, EuroMPI 2013, Madrid, Spain, EU. http://hdl.handle.net/20.500.12708/84678
-
Wait-free Hyperobjects for Task-Parallel Programming Systems
/
Wimmer, M. (2013). Wait-free Hyperobjects for Task-Parallel Programming Systems. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS 2013), Boston, Massachusetts, USA, Non-EU. IEEE Computer Society. https://doi.org/10.1109/ipdps.2013.10
Project: PEPPHER (2011–2012) -
Work-stealing with configurable scheduling strategies
/
Wimmer, M., Cederman, D., Träff, J. L., & Tsigas, P. (2013). Work-stealing with configurable scheduling strategies. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP ’13. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2013, Shenzhen, China, Non-EU. ACM. https://doi.org/10.1145/2442516.2442562
Project: PEPPHER (2011–2012) - OutFlank Routing: Increasing Throughput in Toroidal Interconnection Networks / Versaci, F. (2013). OutFlank Routing: Increasing Throughput in Toroidal Interconnection Networks. In 2013 International Conference on Parallel and Distributed Systems. 19th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2013, Seoul, Korea, Non-EU. IEEE Computer Society. https://doi.org/10.1109/icpads.2013.40
2012
- Simplified, stable parallel merging / Träff, J. L. (2012). Simplified, stable parallel merging. arXiv. https://doi.org/10.48550/arXiv.1202.6575
- Alternative, uniformly expressive and more scalable interfaces for collective communication in MPI / Träff, J. L. (2012). Alternative, uniformly expressive and more scalable interfaces for collective communication in MPI. Parallel Computing: Systems & Applications, 38(1–2), 26–36. https://doi.org/10.1016/j.parco.2011.10.009
-
Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems
/
Kessler, C., Dastgeer, U., Thibault, S., Namyst, R., Richards, A., Dolinsky, U., Benkner, S., Träff, J. L., & Pllana, S. (2012). Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems. In Design, Automation & Test in Europe Conference & Exhibition (DATE 2012) Proceedings (pp. 1403–1408). EDAA. http://hdl.handle.net/20.500.12708/54301
Project: PEPPHER (2011–2012) - Poster: Leveraging PEPPHER Technology for Performance Portable Supercomputing / Kessler, C., Dastgeer, U., Majeed, M., Furmento, N., Thibault, S., Namyst, R., Benkner, S., Pllana, S., Träff, J. L., & Wimmer, M. (2012). Poster: Leveraging PEPPHER Technology for Performance Portable Supercomputing. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. Supercomputing 2012 Conference, Salt Lake City, Utah, USA, Non-EU. IEEE Computer Society. https://doi.org/10.1109/sc.companion.2012.213
-
Abstract: Leveraging PEPPHER Technology for Performance Portable Supercomputing
/
Kessler, C., Dastgeer, U., Majeed, M., Furmento, N., Thibault, S., Namyst, R., Benkner, S., Pllana, S., Träff, J. L., & Wimmer, M. (2012). Abstract: Leveraging PEPPHER Technology for Performance Portable Supercomputing. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. IEEE Computer Society. https://doi.org/10.1109/sc.companion.2012.212
Project: PEPPHER (2011–2012) - Reproducibility and Data Provenance with VisTrails / Hunold, S. (2012). Reproducibility and Data Provenance with VisTrails. WP8 meeting, ANR SONGS project, INRIA, Paris, France, EU. http://hdl.handle.net/20.500.12708/85431
- History and development of the MPI standard / Träff, J. L. (2012). History and development of the MPI standard. AIT Austrian Institute of Technology, Seibersdorf, Austria, Austria. http://hdl.handle.net/20.500.12708/85424
- History and Development of MPI: the Message-Passing Interface / Träff, J. L. (2012). History and Development of MPI: the Message-Passing Interface. CSE Day KTH Stockholm, Stockholm, Sweden, EU. http://hdl.handle.net/20.500.12708/85393
- Evolutionary Scheduling of Parallel Tasks Graphs onto Homogeneous Clusters / Hunold, S., & Lepping, J. (2012). Evolutionary Scheduling of Parallel Tasks Graphs onto Homogeneous Clusters. New Challenges in Scheduling Theory, Centre CNRS, Frejus, France, EU. http://hdl.handle.net/20.500.12708/85392
- Stochastic optimization and memory management / Versaci, F., & Bilardi, G. (2012). Stochastic optimization and memory management. Scalable Approaches to High Performance and High Productivity Computing, ScalPerf’12, Bertinoro, Italy, EU. http://hdl.handle.net/20.500.12708/85391
- Paralleles Rechnen für wissenschaftliche Anwendungen / Wimmer, M. (2012). Paralleles Rechnen für wissenschaftliche Anwendungen. International Summer School Lower Austria, Waidhofen/Ybbs, Austria, Austria. http://hdl.handle.net/20.500.12708/85362
- MPI related experience / Levonyak, M. (2012). MPI related experience. NEC User Group: NUG XXIV, Potsdam, Germany, EU. http://hdl.handle.net/20.500.12708/85357
- Scalability, Expressivity and Performance Portability of Message-Passing Interface(s) / Träff, J. L. (2012). Scalability, Expressivity and Performance Portability of Message-Passing Interface(s). VSC Workshop Vienna Scientific Cluster, Neusiedl/See, Austria, Austria. http://hdl.handle.net/20.500.12708/85337
- Efficient MPI Implementation of a Parallel, Stable Merge Algorithm / Siebert, C., & Träff, J. L. (2012). Efficient MPI Implementation of a Parallel, Stable Merge Algorithm. In J. L. Träff, S. Benkner, & J. Dongarra (Eds.), Recent Advances in the Message Passing Interface Proceedings of the 19th European MPI Users’ Group Meeting, EuroMPI 2012 (pp. 204–213). Springer. http://hdl.handle.net/20.500.12708/54228
- mpicroscope: Towards an MPI Benchmark Tool for Performance Guideline Verification / Träff, J. L. (2012). mpicroscope: Towards an MPI Benchmark Tool for Performance Guideline Verification. In J. L. Träff, S. Benkner, & J. Dongarra (Eds.), Recent Advances in the Message Passing Interface Proceedings of the 19th European MPI Users’ Group Meeting, EuroMPI 2012 (pp. 100–109). Springer. http://hdl.handle.net/20.500.12708/54224
- Processor Allocation for Optimistic Parallelization of Irregular Programs / Versaci, F., & Pingali, K. (2012). Processor Allocation for Optimistic Parallelization of Irregular Programs. In B. Murgante, O. Gervasi, S. Misra, N. Nedjah, A. M. A. C. Rocha, D. Taniar, & B. O. Apduhan (Eds.), Computational Science and Its Applications – ICCSA 2012 (pp. 1–14). Springer. https://doi.org/10.1007/978-3-642-31125-3_1
- Recent Advances in the Message Passing Interface Proceedings of the 19th European MPI Users' Group Meeting, EuroMPI 2012, LNCS 7490 / Träff, J. L., Benkner, S., & Dongarra, J. (Eds.). (2012). Recent Advances in the Message Passing Interface Proceedings of the 19th European MPI Users’ Group Meeting, EuroMPI 2012, LNCS 7490. Springer. http://hdl.handle.net/20.500.12708/23532
2011
-
PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems
/
Benkner, S., Pllana, S., Träff, J. L., Tsigas, P., Dolinsky, U., Augonnet, C., Bachmayer, B., Kessler, C., Moloney, D., & Osipov, V. (2011). PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. IEEE Micro, 31(5), 28–41. https://doi.org/10.1109/mm.2011.67
Project: PEPPHER (2011–2012) - Performance Expectations and Guidelines for MPI Derived Datatypes / Gropp, W., Hoefler, T., Thakur, R., & Träff, J. L. (2011). Performance Expectations and Guidelines for MPI Derived Datatypes. EuroMPI 2011, Santorini, Greece, EU. http://hdl.handle.net/20.500.12708/85310
- Using MPI Derived Datatypes in Numerical Libraries / Bajrovic, E., & Träff, J. L. (2011). Using MPI Derived Datatypes in Numerical Libraries. EuroMPI 2011, Santorini, Greece, EU. http://hdl.handle.net/20.500.12708/85309
Theses
-
pSTL-Bench : evaluating the capabilities of ISO C++ parallel STL implementations on modern parallel hardware using microbenchmarking
/
Krupitza, D. (2023). pSTL-Bench : evaluating the capabilities of ISO C++ parallel STL implementations on modern parallel hardware using microbenchmarking [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.113280
Download: PDF (5.86 MB) -
Online algorithm selection of MPI collective communication operations
/
Steiner, S. (2023). Online algorithm selection of MPI collective communication operations [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.105821
Download: PDF (948 KB) -
The Causes of run time variability in HPC, how to pin them down and how to handle them
/
Roth, N. (2021). The Causes of run time variability in HPC, how to pin them down and how to handle them [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.83802
Download: PDF (4.94 MB) -
To Co-schedule or not to co-schedule? Efficiently utilizing large multicore machines
/
Sarközi, B. A. (2021). To Co-schedule or not to co-schedule? Efficiently utilizing large multicore machines [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.87732
Download: PDF (8.69 MB) -
Efficient process mapping for cartesian topologies
/
Lehr, M. (2019). Efficient process mapping for cartesian topologies [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.65323
Download: PDF (6.84 MB) - Benchmarking and scheduling on parallel machines / Hunold, S. (2019). Benchmarking and scheduling on parallel machines [Professorial Dissertation, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/159450
-
Providing transparent remote access to HPC resources for graphical desktop applications
/
Kainrad, T. (2018). Providing transparent remote access to HPC resources for graphical desktop applications [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.45383
Download: PDF (6.71 MB) -
Mehr Parallelismus in Single-Source Shortest Path Algorithmen : Simulation und Implementierung
/
Kainer, M. (2018). Mehr Parallelismus in Single-Source Shortest Path Algorithmen : Simulation und Implementierung [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.49709
Download: PDF (1.31 MB) -
Effective memory reclamation for lock-free data structures in C++
/
Pöter, M. J. (2018). Effective memory reclamation for lock-free data structures in C++ [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.51367
Download: PDF (1.88 MB) -
Untersuchung der Implementierbarkeit eines lock-freien binären Suchbaumes
/
Mayerhofer, N. (2016). Untersuchung der Implementierbarkeit eines lock-freien binären Suchbaumes [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2016.37683
Download: PDF (3.2 MB) -
Erweiterung des Pheet Frameworks für Pipeline-Parallele Anwendungen
/
Redl, B. (2016). Erweiterung des Pheet Frameworks für Pipeline-Parallele Anwendungen [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2016.35987
Download: PDF (1.77 MB) -
KLSM: a relaxed lock-free priority queue
/
Gruber, J. (2016). KLSM: a relaxed lock-free priority queue [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2016.33752
Download: PDF (599 KB) -
Efficient construction of provably optimal MPI datatype representations
/
Kalany, M. (2015). Efficient construction of provably optimal MPI datatype representations [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2015.30389
Download: PDF (1.05 MB) -
Adaptive work-stealing techniques
/
Haselsteiner, L. (2015). Adaptive work-stealing techniques [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2015.26564
Download: PDF (1010 KB) -
Variations on task scheduling for shared memory systems
/
Wimmer, M. (2014). Variations on task scheduling for shared memory systems [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2014.24761
Download: PDF (1.48 MB) - Embedded real-time 3D stereo vision on multicore digital signal processors / Eisserer, C. (2013). Embedded real-time 3D stereo vision on multicore digital signal processors [Diploma Thesis, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/159718
Awards
-
Sascha Hunold:
Best Short Paper / PMBS@Supercomputing
2022 / USA -
Sascha Hunold:
Best Paper Award IEEE CLUSTER 2020
2020 / Japan -
Jesper Larsson Träff:
Innovation Radar: Innovation Title: PGAS-based MPI with interoperability; Innovation Category: Exploration; FP 7 project EPiGRAM
2018 / Project -
Sascha Hunold:
Best Paper Award EuroMPI/Asia
2014 / Japan -
Jesper Larsson Träff:
Best Paper Award: "Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think", S. Hunold, A. Carpen-Amarie, J. Träff, 21st European MPI Users' Group Meeting, EuroMPI/ASIA 2014, Kyoto, Japan, September 9-12, 2014
2014 / Program Chairs of EuroMPI/ASIA 2014 / Japan
And more…
Soon, this page will include additional information such as reference projects, conferences, events, and other research activities.
Until then, please visit Parallel Computing’s research profile in TISS .