Time Series Data Exploration

CODATA Euro-American Workshop
Visualization of Information and Data: Where We Are and Where Do We Go From Here?
24-25 June 1997

Time Series Data Exploration

Nancy Grady, Raymond Flanery, Jr., June Donato, Jack Schryver
Oak Ridge National Laboratory
Oak Ridge, Tennessee, USA
{gradynw, flaneryrejr, donatojm, schryverjc}@ornl.gov

Introduction

Data Mining techniques are becoming increasingly important in handling large data sets. While it is vital to develop automated techniques which can filter and categorizing data, it is equally important to take the final step to communicate and understand the process and results of these techniques using visual techniques. Data Exploration has traditionally been considered in terms of the user interface issues for interactive browsing. As the quantity of data at to be examined increases, however, it becomes even more important to blend the boundaries between the visual presentation of the data mining process and other data exploration.

Background

Time series data streams present three major challenge areas for generalized data exploration systems; filtering out data that is an interesting, providing algorithms for the analysis and creation of metadata, and interactive exploration of the regions of the interest. Further operational constraints are present based on the size of the dataset and the speed with which the analysis must take place.

In earlier work [1] an approach was taken to explore time-series numeric data using statistical filters to select a data window around a region of interest, storing the data along with those statistically determined metadata in a Postgres database. A simple 2-D graphical representation of the metadata, using the visualization package AVS, guides the user through the regions of the interest, with a graph of the data region being retrieved through selection of the metadata point of interest. This approach worked very well for one dimensional streams of time series data within scientific data sets.

ORNL has recently developed a data mining approach to predict bankruptcy in personal credit card accounts [2]. While the transaction patterns are time-series data streams, there are both numeric and character data attributes resulting in a number of more complex factors to be considered in communicating the knowledge content within this data. In the data mining phase of this work, decision trees have been used to partition the accounts within the database into groups of good, bankrupt, or delinquent based on attributes signifying their current status. The transactions for the accounts within each group are then used to train a partially recurrent neural network to recognize the behavior pattern in the running balance. When tested against a new set of accounts, the system does indicate a predictive power for bankruptcy.

Work in Progress

Evaluation of the success of this approach requires the integration of visual informational representations for both the transaction data in the database and for algorithm animation in order to understand the behavioral indicators for bankruptcy. The separation of accounts through a decision tree provides its own natural representation. A simple enhancement over merely presenting the hierarchical structure of the tree uses size and color for the nodes to indicate the predicted status of the represented accounts. The tree further provides an appropriate interface for accessing the grouped accounts' transaction data. The challenge is in the representation of multiple transaction data streams of the differing data types. These temporal sequences represent multidimensional data, a noisy level of detail, and the transactions are not all at the same behavioral point at a given time (i.e. remaining time until declaration of bankruptcy). That transaction period must be processed to present varying degrees of detailed to bring out of the broad features in the continuous data are, to provide dimensional reduction for the discreet to data, and to allows some modification to the time origin in order to align the data to a future point in time of the declaration of bankruptcy.

The second visualization challenge is in the animation of the status of the neural network. Artificial neural networks are an excellent data mining tool to learn a pattern of behavior. As a non-linear systems, however, they make human interpretation of the learned "features" quite difficult. Presentation of the knowledge contained in the artificial neural network will require connections between the predicted value of the network (probability of bankruptcy), the trajectory of the context of layer in reduced dimensions, and the raw transaction patterns. Another useful representation would show the trajectory taken by each account as it moves through a high-dimensional hidden/context layer "space" front loaded to bankrupt status.

Conclusion

The data mining project for predicting personal bankruptcy is a work in progress which will require unique solutions for data exploration and interpretation on a massive data set, including the comparison of a large number of sliding origin temporal transaction patterns, and the animation of algorithms being used to analyze them.

References

[1] Flanery, R. E., Jr., and Donato, J. M., Visualization for the Large Scale Data Analysis Project, ORNL/TM-13227.

[2] Donato, J. M., Schryver, J.C., Grady, M. W., Schmoyer, R. L., Hunkel, G. C., A Data Mining Approach to Personal Bankruptcy Prediction, (unpublished).

Posters Presentation Screen Layout Methods for Multidimensional Visualization by Matthew O. Ward and Daniel Keim Time Series Data Exploration by Nancy Grady, et al. A Visualization Architecture for Enterprise Information by Lester Lee
Table of Contents