Julian Satran – Talk

Data Intensive High Performance Computing – Challenges for the Future

Panel Chair: Julian Satran, Distinguished Engineer, IBM Research Laboratory in Haifa, Israel

Abstract:
This panel has the unenviable task of attempting a glimpse into the future. There is a risk that we would look like fools in x years! Instead, let us attempt to line up challenges, and align them to what might happen. I would suggest that we limit our attention to the following categories:

  • basic storage technology
  • storage structures as they relate to various application areas
  • commodity component trend

Basic storage technology deals with two aspects that are not unrelated – devices and storage system structure. The storage technology devices in widespread use today are DRAM/SRAM, Flash (NAND and NOR), HDD, Optical, and Tape. At a closer look, the density and price for all (or almost all) of these are determined by a single technology – lithography. It is being argued that while this is true, no new storage technology will appear in this landscape independent of lithography. On the other hand (as you have heard repeatedly), the need for storage increases exponentially. But that is not so for the "performance factors" of the components. We risk ending up with disk farms in which no disk will be filled before it fails!

Storage structure has assumed that whenever there is a performance gap we will be able to cover it by introducing a level of caching. The current cache hierarchy has DRAM, HDD, Optical, and Tape. But that is true for our "classical" storage consumers. What interests this (HPC) community is compute servers. All commercial predictions indicate that most of the storage will be used outside this domain. Steve Hetzler – an IBM Fellow in Almaden Center has recently proposed another model – the Hetzler model. It shows concentric circles with performance increasing outward and arc segments indicating markets. A radial line indicates a system design for a segment.

That brings us to the last category: the commodity component trend. For the last 20 years High Performance Computing has enjoyed an enormous increase in performance and decrease in price based on the commoditization of components. However, the components were built mainly for computers and used the designs and knowledge from previous computer generations (before they became commoditized). The target market is changing and the accumulated IP (intellectual Property) is wearing thin. Storage is also becoming highly important as the main repository of human knowledge. And, that brings a plethora of new requirements which are not always aligned with High Performance Data Intensive computing

And now the questions:

  1. For the next several years, assuming an increasing demand for storage capacity and speed, but no change in the storage device availability and structure, what should be the areas for system research ? Assuming that the "preservation" trend continues and its challenges can be met by a new function in storage then how could the HPC community benefit from it?
  2. Object Storage has for a long time been the "love-kid" of this community. As time passes and the component economics change, how should Object Storage evolve?:
    To a completely programmable device? Or partly programmable device? And, then what? Can it satisfy other markets?
  3. Programming models have a very basic support for storage: the best example is, that with all the widespread usage of search programming, ingest of data is still completely tied to precisely defined data set. Is this good enough? How can we bridge the gap between search and data usage by programs?
  4. As the storage usage volume moves away from computing, do we need to look again beyond commodity, to some forms of storage that are "specialized for HPC”? Should this be an industry effort? (Is there enough market?) Or should this be a government(s) sponsored effort?