Publications Search

OpenMP 5.0 added support for reductions over explicit tasks. This expands the previous reduction support that was limited primarily to worksharing and parallel constructs. While the scope of a reduction operation in a worksharing construct is the scope of the construct itself, the scope of a task reduction can vary. This difference requires syntactical means to define the scope of reductions, e.g., the task_reduction clause, and to associate participating tasks, e.g., the in_reduction clause. Furthermore, the disassociation of the number of threads and the number of tasks creates space for different implementations in the OpenMP runtime. In this work, we provide insights into the behavior and performance of task reduction implementations in GCC/g++ and LLVM/Clang. Our results indicate that task reductions are well supported by both compilers, but their performance differs in some cases and is often determined by the efficiency of the underlying task management.

More Details

TYPE Conference Presentation YEAR 2022

DOI OSTI Scopus

Characterizing the Performance of Task Reductions in OpenMP 5.X Implementations

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Ciesko, Jan; Olivier, Stephen L.

OpenMP 5.0 added support for reductions over explicit tasks. This expands the previous reduction support that was limited primarily to worksharing and parallel constructs. While the scope of a reduction operation in a worksharing construct is the scope of the construct itself, the scope of a task reduction can vary. This difference requires syntactical means to define the scope of reductions, e.g., the task_reduction clause, and to associate participating tasks, e.g., the in_reduction clause. Furthermore, the disassociation of the number of threads and the number of tasks creates space for different implementations in the OpenMP runtime. In this work, we provide insights into the behavior and performance of task reduction implementations in GCC/g++ and LLVM/Clang. Our results indicate that task reductions are well supported by both compilers, but their performance differs in some cases and is often determined by the efficiency of the underlying task management.

More Details

TYPE Conference Paper YEAR 2022

DOI OSTI Scopus

Kokkos Remote Spaces ? Public Preview 3

Ciesko, Jan

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Kokkos 3: Programming Model Extensions for the Exascale Era

IEEE Transactions on Parallel and Distributed Systems

Trott, Christian R.; Lebrun-Grandie, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh Q.; Ellingwood, Nathan D.; Gayatri, Rahulkumar; Harvey, Evan C.; Hollman, Daisy S.; Ibanez-Granados, Daniel A.; Liber, Nevin; Madsen, Jonathan; Miles, Jeff S.; Poliakoff, David; Powell, Amy J.; Rajamanickam, Sivasankaran; Simberg, Mikael; Sunderland, Dan; Turcksin, Bruno; Wilke, Jeremiah

As the push towards exascale hardware has increased the diversity of system architectures, performance portability has become a critical aspect for scientific software. We describe the Kokkos Performance Portable Programming Model that allows developers to write single source applications for diverse high performance computing architectures. Kokkos provides key abstractions for both the compute and memory hierarchy of modern hardware. Here, we describe the novel abstractions that have been added to Kokkos recently such as hierarchical parallelism, containers, task graphs, and arbitrary-sized atomic operations. We demonstrate the performance of these new features with reproducible benchmarks on CPUs and GPUs.

More Details

TYPE Journal Article YEAR 2021

DOI OSTI

Comms - Summit Series-VIII

Ciesko, Jan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Kokkos EcoSystem Update

Trott, Christian R.; Lebrun-Grandie, Damien; Gayatri, Rahul; Ciesko, Jan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Building a VR motion simulator

Ciesko, Jan

Abstract not provided.

More Details

TYPE Conference Presentation YEAR 2021

DOI OSTI

Implementing Flexible Threading Support in Open MPI

Proceedings of ExaMPI 2020: Exascale MPI Workshop, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

Evans, Noah; Ciesko, Jan; Olivier, Stephen L.; Pritchard, Howard; Iwasaki, Shintaro; Raffenetti, Ken; Balaji, Pavan

Multithreaded MPI applications are gaining popularity in scientific and high-performance computing. While the combination of programming models is suited to support current parallel hardware, it moves threading models and their interaction with MPI into focus. With the advent of new threading libraries, the flexibility to select threading implementations of choice is becoming an important usability feature. Open MPI has traditionally avoided componentizing its threading model, relying on code inlining and static initialization to minimize potential impacts on runtime fast paths and synchronization. This paper describes the implementation of a generic threading runtime support in Open MPI using the Opal Modular Component Architecture. This architecture allows the programmer to select a threading library at compile-or run-time, providing both static initialization of threading primitives as well as dynamic instantiation of threading objects. In this work, we present the implementation, define required interfaces, and discuss trade-offs of dynamic and static initialization.

More Details

TYPE Conference Paper YEAR 2020

OSTI Scopus