CODATA Euro-American Workshop
Visualization of Information and Data: Where We Are and Where Do We Go From Here?
24-25 June 1997


Toward a Unified Visual Representation of Documents and Concepts

John F. Greco
Cornell University
Ithaca New York
jfg6@cornell.edu
Antonio Gonzalez-Walker
University of Puerto Rico
San Juan, Puerto Rico
tonyg@tc.cornell.edu

Introduction

The uniform commercial code (UCC) is a complex legal document that governs most commercial transactions in the United States. It is approximately 1000 pages in length, including the official commentary and annotations. Legal practitioners and scholars may spend much of their careers understanding, applying, even memorizing the UCC. In law school curricula entire courses are dedicated to the study of a single article of the UCC. The outcome of major commercial disputes frequently hinge on a single phrase of this dense text. Locating specific phrases and phrases combinations and understanding their interrelationship with the entire text is a significant problem.

In this case study, we apply data visualization techniques to the UCC to answer both theoretical and practical legal questions. Overall, the project involves three components: concepts and algorithms for 3D text visualization; practical software implementation of these ideas; and the application of these techniques to be UCC.

Methods

The algorithms and techniques developed for this study use a three dimensional approach where large chunks of text are displayed as successive pages or sheets. Each page of text consists of true three dimensional objects that can be viewed from any direction or orientation, allowing readers to interactively view hundreds of pages of text at any one time. These text pages can be displayed either as fully formed characters drawn using three dimensional vectors or with more impressionistic glyphs which can be set to represent a single character, word, sentence, paragraph or other unit of text. Furthermore, 3D characters may be mixed with glyphs to produce a hierarchical display that maximizes information content on the display surface.

Users can superimpose research results directly onto the three dimensional text through color, intensity, shape and other visual cues to show the distribution of information within the context of the whole document. By searching the UCC for a specific phrase, a reader can locate clusters where the phrase appears frequently. Similarly, outliers, instances where the phrase appears in isolation, can be quickly spotted. Multiple searches, which consist of several phrases or terms, can be overlaid on the text display and interactively explored. Finally, relaxed or fuzzy searches that would typically produce an unwieldy number of hits can be displayed in a comprehensive format where interesting combinations and patterns can be isolated and evaluated. Selected areas of the text may be concurrently displayed in a pop-up text editor to allow easy integration into traditional word processing environments. This enables legal practitioners and scholars to utilize the UCC in innovative and potentially more effective ways.

Results

Our software implementation, currently in C and C++ using OpenGL on Silicon Graphics Workstations and Cornell's IBM SP2 supercomputer, allows animation and queries on very large texts in real time. We are also exploring a projection-based full immersion virtual reality environment that lets readers 'get inside the text'.

Overall, our development environment is a flexible testbed for creating new algorithms and display methods to visualize large texts. The result is a unique and powerful tool for legal practitioners and scholars to explore complex texts such as the Uniform Commercial Code.

References

[1] Antonio Gonzalez-Walker, Language Visualization and Multilayer Text Analysis, available at: http://www.tc.cornell.edu/Visualization/contrib/cs400-95to96/tonyg/Language.Viz1.html [updated (7/1/98) to: http://www.tc.cornell.edu/Visualization/contrib/cs490-95to96/tonyg/Language.Viz1.html]

[2] D. Small, Navigating Large Bodies of Text, IBM Systems Journal, Vol 35, Nos 3+4, 1996 pp.515-525


Short Paper Presentations
Visualization and Processing of Document
Document Clustering in Concept Space: The NIST Information Retrieval Visualization Engine (NIRVE) by John Cugini and Sharon Laskowski
Toward a Unified Visual Representation of Documents and Concepts by John Greco
Scalable Visualization by Gene Golovchinsky
Visualization and Processing of Document

Table of Contents