Publications

3 Results

Search results

Jump to search filters

Navigating nuclear science: Enhancing analysis through visualization

Irwin, N.H.

Data visualization is an emerging technology with high potential for addressing the information overload problem. This project extends the data visualization work of the Navigating Science project by coupling it with more traditional information retrieval methods. A citation-derived landscape was augmented with documents using a text-based similarity measure to show viability of extension into datasets where citation lists do not exist. Landscapes, showing hills where clusters of similar documents occur, can be navigated, manipulated and queried in this environment. The capabilities of this tool provide users with an intuitive explore-by-navigation method not currently available in today`s retrieval systems.

More Details

Domain-independent information extraction in unstructured text

Irwin, N.H.

Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.

More Details

Extraction of information from unstructured text

Irwin, N.H.

Extracting information from unstructured text has become an emphasis in recent years due to the large amount of text now electronically available. This status report describes the findings and work done by the end of the first year of a two-year LDRD. Requirements of the approach included that it model the information in a domain independent way. This means that it would differ from current systems by not relying on previously built domain knowledge and that it would do more than keyword identification. Three areas that are discussed and expected to contribute to a solution include (1) identifying key entities through document level profiling and preprocessing, (2) identifying relationships between entities through sentence level syntax, and (3) combining the first two with semantic knowledge about the terms.

More Details
3 Results
3 Results