Data Science Home

Definitions

  • Algorithm design is the method of creating a mathematical model to solve a problem and is the building block for any computer program.

  • Artificial intelligence (AI) aims to or is required to synthesize goal-orientated processes such as problem-solving, decision-making, environmental adaptation, learning and communication found in humans and animals.

  • Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.

  • Databases are organized collections of data consisting of schemas, tables, queries, reports, views, and other objects.

  • Data mining is the computational process of discovering patterns in large data sets utilizing techniques from machine learning, statistics, and database systems in order to extract information from a data set and convert it into a well-known structure for future use.
    • Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. Typical text mining tasks include categorization, clustering, extracting concepts, developing finely detailed taxonomies, sentiment analysis, summarizing documents and learning relationships between named things.
  • Graph analysis/theory provides algorithms to analyze data represented as a graph. Graphs organize data in a way that represents connections or relationships between pairs of data objects and consists of nodes representing entities (people, businesses, systems, etc.) and edges representing relationships or connections. Graph analytics solves complex problems by uncovering relationships and finding hidden patterns in the data.

  • High performance computing uses parallel processors to run advanced applications or analysis as efficiently, reliably and quickly as possible.
  • Machine Learning explores the study and design of algorithms that can learn from and make predictions on data. These algorithms operate by initially building a model based on an example training set of input observations, which are aimed at making data-driven predictions or decisions.
    • Pattern recognition is a branch of machine learning that focuses on classifying input data into objects or classes based on key features. There are two classification methods in pattern recognition: supervised and unsupervised classification.
  • Optimization is the selection of the best option or element from a set of available alternatives given certain predetermined constraints.

  • Predictive analytics is the branch of data mining concerned with the prediction of future probabilities and trends.

  • Scientific visualization is the representation of data graphically as a means of gaining understanding and insight into the data. The purpose of scientific visualization is to graphically illustrate scientific data to enable scientists to understand, illustrate, and glean insight from their data.

  • Streaming data analysis arises when there is too much data to store or transmit requiring that data be analyzed or sampled as it arrives. Applications to which streaming data analysis is critical includes networks (cyber-security), sensor networks, and utility companies. Research in sampling methods becomes critical in order to be able to sample from data streams of unknown length and extract the most useful information with regards to the question of concern.

  • Uncertainty quantification is the science of quantitative characterization and reduction of uncertainty in both computational and real world applications. It tries to determine how likely certain outcomes are even when the system is not completely defined.