Publications

Results 2401–2500 of 9,998

Search results

Jump to search filters

Hardware MPI message matching: Insights into MPI matching behavior to inform design: Hardware MPI message matching

Concurrency and Computation. Practice and Experience

Ferreira, Kurt; Grant, Ryan; Levenhagen, Michael; Levy, Scott L.N.; Groves, Taylor

Here, this paper explores key differences of MPI match lists for several important United States Department of Energy (DOE) applications and proxy applications. This understanding is critical in determining the most promising hardware matching design for any given high-speed network. The results of MPI match list studies for the major open-source MPI implementations, MPICH and Open MPI, are presented, and we modify an MPI simulator, LogGOPSim, to provide match list statistics. These results are discussed in the context of several different potential design approaches to MPI matching–capable hardware. The data illustrate the requirements for different hardware designs in terms of performance and memory capacity. Finally, this paper's contributions are the collection and analysis of data to help inform hardware designers of common MPI requirements and highlight the difficulties in determining these requirements by only examining a single MPI implementation.

More Details

Compressed Optimization of Device Architectures for Semiconductor Quantum Devices

Physical Review Applied

Ward, Daniel R.; Frees, Adam; Gamble, John K.; Blume-Kohout, Robin; Eriksson, M.A.; Friesen, Mark; Coppersmith, S.N.

Recent advances in nanotechnology have enabled researchers to manipulate small collections of quantum-mechanical objects with unprecedented accuracy. In semiconductor quantum-dot qubits, this manipulation requires controlling the dot orbital energies, the tunnel couplings, and the electron occupations. These properties all depend on the voltages placed on the metallic electrodes that define the device, the positions of which are fixed once the device is fabricated. While there has been much success with small numbers of dots, as the number of dots grows, it will be increasingly useful to control these systems with as few electrode voltage changes as possible. Here, we introduce a protocol, which we call the "compressed optimization of device architectures" (CODA), in order both to efficiently identify sparse sets of voltage changes that control quantum systems and to introduce a metric that can be used to compare device designs. As an example of the former, we apply this method to simulated devices with up to 100 quantum dots and show that CODA automatically tunes devices more efficiently than other common nonlinear optimizers. To demonstrate the latter, we determine the optimal lateral scale for a triple quantum dot, yielding a simulated device that can be tuned with small voltage changes on a limited number of electrodes.

More Details

SST-GPU: An Execution -Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model

Khairy, Mahmoud; Zhang, Mengchi; Green, Roland; Hammond, Simon; Hoekstra, Robert J.; Rogers, Timothy; Hughes, Clayton

Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel acceleration where the context for thousands of concurrent threads are resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several massively parallel computing cores. The design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. To address the need for a scalable, decentralized GPU model that can model large GPUs, chiplet-based GPUs and multi-node GPUs, this report details the first steps in integrating the open-source, execution driven GPGPU-Sim into the SST framework. The first stage of this project, creates two elements: a kernel scheduler SST element accepts work from SST CPU models and schedules it to an SM-collection element that performs cycle-by-cycle timing using SSTs Mem Hierarchy to model a flexible memory system.

More Details

Curvature Based Analysis to Identify and Categorize Trajectory Segments

Schrum Jr., Paul T.; Foulk, James W.; Newton, Benjamin D.

Since the attacks carried out against the United States on September 11, 2001, which involved the commandeering of commercial aircraft, interest has increased in performing trajectory analysis of vehicle types not constrained by roadways or railways, i.e., aircraft and watercraft. Anomalous trajectories need to be automatically identified along with other trajectories of interest to flag them for further investigation. There is also interest in analyzing trajectories without a focus on anomaly detection. Various approaches to analyzing these trajectories have been undertaken with useful results to date. In this research, we seek to augment trajectory analysis by carrying out analysis of the trajectory curvature along with other parameters, including distance and total deflection (change in direction). At each point triplet in the ordered sequence of points, these parameters are computed. Adjacent point triplets with similar values are grouped together to form a higher level of semantic categorization. These categorizations are then analyzed to form a yet higher level of categorization which has more specific semantic meaning. This top level of categorization is then summarized for all trajectories under study, allowing for fast identification of trajectories with various semantic characteristics.

More Details

An optimization-based framework to define the probabilistic design space of pharmaceutical processes with model uncertainty

Processes

Laky, Daniel; Xu, Shu; Rodriguez, Jose S.; Vaidyaraman, Shankar; Munoz, Salvador G.; Laird, Carl

To increase manufacturing flexibility and system understanding in pharmaceutical development, the FDA launched the quality by design (QbD) initiative. Within QbD, the design space is the multidimensional region (of the input variables and process parameters) where product quality is assured. Given the high cost of extensive experimentation, there is a need for computational methods to estimate the probabilistic design space that considers interactions between critical process parameters and critical quality attributes, as well as model uncertainty. In this paper we propose two algorithms that extend the flexibility test and flexibility index formulations to replace simulation-based analysis and identify the probabilistic design space more efficiently. The effectiveness and computational efficiency of these approaches is shown on a small example and an industrial case study.

More Details

ASCR Workshop on In Situ Data Management

Peterka, Tom; Bard, Deborah; Bennett, Janine C.; Bethel, E.W.; Oldfield, Ron; Pouchard, Line; Sweeney, Christine; Wolf, Matthew

In January 2019, the U.S. Department of Energy, Office of Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions for in situ data management (ISDM). The workshop defined ISDM as the practices, capabilities, and procedures to control the organization of data and enable the coordination and communication among heterogeneous tasks, executing simultaneously in a high-performance computing system, cooperating toward a common objective. The workshop revealed two primary, interdependent motivations for processing and managing data in situ. The first motivation is that the in situ methodology enables scientific discovery from a broad range of data sources over a wide scale of computing platforms: leadership-class systems, clusters, clouds, workstations, and embedded devices at the edge. The successful development of ISDM capabilities will benefit real-time decision-making, design optimization, and data-driven scientific discovery. The second motivation is the need to decrease data volumes. ISDM can make critical contributions to managing large data volumes from computations and experiments to minimize data movement, save storage space, and boost resource efficiency, often while simultaneously increasing scientific precision.

More Details

Description and evaluation of the Community Ice Sheet Model (CISM) v2.1

Geoscientific Model Development

Lipscomb, William H.; Price, Stephen F.; Hoffman, Matthew J.; Leguy, Gunter R.; Bennett, Andrew R.; Bradley, Sarah L.; Evans, Katherine J.; Fyke, Jeremy G.; Kennedy, Joseph H.; Perego, Mauro; Ranken, Douglas M.; Sacks, William J.; Salinger, Andrew G.; Vargo, Lauren J.; Worley, Patrick H.

We describe and evaluate version 2.1 of the Community Ice Sheet Model (CISM). CISM is a parallel, 3-D thermomechanical model, written mainly in Fortran, that solves equations for the momentum balance and the thickness and temperature evolution of ice sheets. CISM's velocity solver incorporates a hierarchy of Stokes flow approximations, including shallow-shelf, depth-integrated higher order, and 3-D higher order. CISM also includes a suite of test cases, links to third-party solver libraries, and parameterizations of physical processes such as basal sliding, iceberg calving, and sub-ice-shelf melting. The model has been verified for standard test problems, including the Ice Sheet Model Intercomparison Project for Higher-Order Models (ISMIP-HOM) experiments, and has participated in the initMIP-Greenland initialization experiment. In multimillennial simulations with modern climate forcing on a 4 km grid, CISM reaches a steady state that is broadly consistent with observed flow patterns of the Greenland ice sheet. CISM has been integrated into version 2.0 of the Community Earth System Model, where it is being used for Greenland simulations under past, present, and future climates. The code is open-source with extensive documentation and remains under active development.

More Details

VideoSwarm: Analyzing video ensembles

IS and T International Symposium on Electronic Imaging Science and Technology

Martin, Shawn; Sielicki, Milosz; Gittinger, Jaxon M.; Letter, Matthew; Hunt, Warren L.; Crossno, Patricia J.

We present VideoSwarm, a system for visualizing video ensembles generated by numerical simulations. VideoSwarm is a web application, where linked views of the ensemble each represent the data using a different level of abstraction. VideoSwarm uses multidimensional scaling to reveal relationships between a set of simulations relative to a single moment in time, and to show the evolution of video similarities over a span of time. VideoSwarm is a plug-in for Slycat, a web-based visualization framework which provides a web-server, database, and Python infrastructure. The Slycat framework provides support for managing multiple users, maintains access control, and requires only a Slycat supported commodity browser (such as Firefox, Chrome, or Safari).

More Details

Understanding the Machine Learning Needs of ECP Applications

Ellis, John A.; Rajamanickam, Sivasankaran

In order to support the codesign needs of ECP applications in current and future hardware in the area of machine learning, the ExaLearn team at Sandia studied the different machine learning use cases in three different ECP applications. This report is a summary of the needs of the three applications. The Sandia ExaLearn team will develop a proxy application representative of ECP application needs, specifically the ExaSky and EXAALT ECP projects. The proxy application will allow us to demonstrate performance portable kernels within machine learning codes. Furthermore, current training scalability of machine learning networks in these applications is negatively affected by large batch sizes. Training throughput of the network will increase as batch size increases, but network accuracy and generalization worsens. The proxy application will contain hybrid model- and data-parallelism to improve training efficiency while maintaining network accuracy. The proxy application will also target optimizing 3D convolutional layers, specific to scientific machine learning, which have not been as thoroughly explored by industry.

More Details

ECP Milestone Memo WBS 2.3.4.13 ECP/VTK-m FY19Q1 [MS-19/01-03] ZFP / Release / Clip STDA05-17

Moreland, Kenneth D.

The STDA05-17 milestone comprises the following 3 deliverables. VTK-m Release 2 We will provide a release of VTK-m software and associated documentation. The source code repository will be tagged at a stable state, and, at a minimum, tarball captures of the source code will be made available from the web site. A version of the VTK-m User's Guide documenting this release will also be made available. Productionize zfp compression The "ZFP: Compressed Floating-Point Arrays" project (WBS 1.3.4.13) is creating an implementation of ZFP compression in VTK-m. Their implementation will be focused on operating in CUDA. The VTK-m project will assist by generalizing the implementation to other devices (such as multi-core CPUs). We will also assist in productionizing the code such that it can be used by external projects and products. Clip Clip operations intersect meshes with implicit functions. It is the foundation of spatial subsetting algorithms, such as "box," and the foundation of data-based subsetting, such as "isovolume." The algorithm requires considering thousands of possible cases, and is thus quite difficult to implement. This milestone will implement clipping to be sufficient for Visit's and ParaView's needs.

More Details

ECP Milestone Memo WBS 2.3.4.13 ECP/VTK-m FY18Q4 [MS-18/09-10] Dynamic Types / Rendering Topologies STDA05-16

Moreland, Kenneth D.

The STDA05-16 milestone comprises the following 3 distinct deliverables. OpenMP VTK-m currently supports three types of devices: serial CPU, TBB, and CUDA. To run algorithms on multicore CPU-type devices (such as Xeon and Xeon Phi), TBB is required. However, there are known issues with integrating a software product using TBB with another one using OpenMP. Therefore, we will add an OpenMP device to the VTK-m software. When engaged, this device will run parallel algorithms using OpenMP directives. This will mesh more nicely with other code also using OpenMP. Rendering Topological Entities VTK-m currently supports surface rendering by tessellation of data structures,and rendering the resulting triangles. We will extend current functionality to include face, edge, and point rendering. Better Dynamic Types Impl For the best efficiency across all platforms, VTK-m algorithms use static typing with C++ templates. However, many libraries like VTK, ParaView, and Visit use dynamic types with virtual functions because data types often cannot be determined at compile time. We have an interface in VTK-m to merge these two typing mechanisms by generating all possible combinations of static types when faced with a dynamic type. Although this mechanism works, it generates very large executables and takes a very long time to compile. As we move forward, it is clear that these problems will get worse and become infeasible at exascale. We will rectify the problem by introducing some level of virtual methods, which require only a single code path, within VTK-m algorithms. This first milestone produces a design document to propose an approach to the new system.

More Details

Determination of ballistic limit of skin-stringer panels using nonlinear, strain-rate dependent peridynamics

AIAA Scitech 2019 Forum

Cuenca, Fernando; Weckner, Olaf; Silling, Stewart; Rassaian, Mostafa

Significant testing is required to design and certify primary aircraft structures subject to High Energy Dynamic Impact (HEDI) events; current work under the NASA Advanced Composites Consortium (ACC) HEDI Project seeks to determine the state-of-the-art of dynamic fracture simulations for composite structures in these events. This paper discusses one of three Progressive Damage Analysis (PDA) methods selected for the second phase of the NASA ACC project: peridynamics, through its implementation in EMU. A brief discussion of peridynamic theory is provided, including the effects of nonlinearity and strain rate dependence of the matrix followed by a blind prediction and test-analysis correlation for ballistic impact testing performed for configured skin-stringer panels.

More Details
Results 2401–2500 of 9,998
Results 2401–2500 of 9,998