Prof. Walt Ligon – Talk

Defining Data Intensive Computing

Panel Chair: Prof. Walt Ligon, Clemson University, USA

What defines data intensive computing? Is a large volume of I/O enough or is a significant computation required. High performance computing, and thus high performance I/O, has been dominated by scientific simulation. Applications such as airflow over a surface, weather or ocean circulation, galaxy formation, or molecular dynamics have been the driving force shaping the system architectures and software we develop. Today, however, advances in technology have put supercomputing systems into the hands of just about any reserach group that wants one, and new applications are beginning to make heavy use of both their computational and I/O capabilities. Of particular interest is bioinformatics, a dicipline that is "half databases, and half computational model." Still other applications have very little computational aspect at all. Data mining is heavily used by business, government, and in some cases, science. These applications bear more resemblence to the data processing plants of the 60’s and 70’s, moving massive amounts of information in order to establish connections and correlations in the data. Are these appliations becomming a new driver for parallel I/O and parallel systems in general, or will the "traditional" supercomputing applications still be the main focus?

Questions:

  1. What do you see as the driving applications for parallel file systems and high performance I/O over the next 5 to 10 years? Does this represent a change over the last decade?
  2. What do these applications need that current parallel files systems don’t provide? Do we need a shift in thinking about high performance I/O?
  3. Should we expect the same file systems to support applications that have both significant computation and I/O as well as those with only significant I/O?
  4. The file systems from the 60’s and 70’s, designed for data processing, were quite different from those of the 80’s and 90’s, designed for interactive workstations. Have we lost any good ideas from those older systems that we should be reconsidering now?