CODATA Euro-American Workshop
Visualization of Information and Data: Where We Are and Where Do We Go From Here?
24-25 June 1997


Scalable Visualization

Gene Golovchinsky
FX Palo Alto Liaoratory
3400 Hillview Ave., Bldg 4
Palo Alto, CA 94304 USA
gene@pal.xerox.com

Introduction

Visualization of retrieval results is an important aspect of information retrieval interface research. Visualizations are particularly important when you viewers must understand relationships among large volumes of data. Furthermore, the real-time, interactive visualizations are more effective in helping users to make decisions about the documents they are viewing. This paper describes to visualization techniques suitable for large collections of documents. The two visualizations are designed to complement each other: one depicts the browsing history from the document's perspective, while the other shows a more global overview. Both techniques can accommodate arbitrarily large document collections.

Document-centric browsing history

During an interactive browsing session users may retrieve the same document multiple times due to the similarity of queries. In practical browsing situations, users can not examine the results sets exhaustively, and tended to sample documents from the head of a ranked list of documents. If desired documents are not found relatively quickly, users may form new queries based on the experience gained from previous interactions. This task of interactive query formulation punctuated by browsing may be aided by visualizations that indicate whether or not the given document has been retrieved in the past, and if so, how important it was with respect to the query is that retrieved it. That a document had never been retrieved (or that his it had been retrieved by every query so far) may be quite important.

One possible way of representing this information is to construct a histogram with one bar for each query. The height of each bar is a function of its rank: the lower the rank (first is best), the higher to score. The left-to-right order of the bars corresponds to the temporal order of the queries made during the browsing session. Figure 1 illustrates several examples of such histograms. These images were generated by VOIR, a query-mediated hypertext information exploration interface [2]. The colors of reflect the different types of links; the horizontal lines facilitate the comparison of bar heights.

[FIGURE NOT AVAILABLE]

Figure 1. Document retrieval history histograms.

Figure 1a shows a histogram that corresponds to the first time a document has been retrieved. The gap to the left of the bar indicates that the document was retrieved somewhat after the beginning of the browsing session. Figure 1b shows a histogram of another document, one that was ranked higher initially, then was not retrieved for a while, and finally reappeared with a slightly lower score. Finally, figure 1c corresponds to a document that has been retrieved periodically, and was ranked quite high almost every time. In hypertext terms, such a document made be a landmark node.

Global overview

In conjunction with the document-centric and visualization described above, it is possible to visualize cumulative browsing history by depicting documents that have been retrieved during some period of time. Clustering or MDS-based visualizations are limited by high-order computation algorithms (e.g., [1]); furthermore, clustering the entire document space may introduce much information not of interest to the user, thereby obscuring information and that is of interest.

One possible solution is to cluster documents based on the queries that had retrieved them. Each query may be represented as a vector of documents; these factors may be organized into nearly-disjoint groups, each representing some topics of interest to the user. The process of categorizing query vectors into groups is computationally efficient, and does not depend on computing a pair-wise document similarity of my tricks typical of clustering algorithms.

When visualized, document groups can provide a top-level view of all documents that the user had found potentially relevant during a browsing session, or across all browsing sessions. Each group may be represented as a circle with a radius proportional to the log of a number of documents in that group. Clicking on such a circle would allow the user to "Drill down" into the set of retrieved documents. Drilling down and could generate a clustering of just the documents in that group using traditional clustering techniques. New queries could be projected onto the existing groups to indicate where they fit into existing categories. If a query is sufficiently similar to some group, the corresponding circle could be highlighted by increasing its saturation. If a query overlaps more than one group, the degree of overlap may be reflected by changing the saturation of the affected groups in proportion to the degree of match. This visualization allows users to detect new themes that connect to previously unrelated topics.

Conclusions

This paper described two complementary techniques for visualizing browsing histories of information exploration environments. The techniques are independent of the number of documents in the database, and instead reflect the retrieval patterns of individual users. These techniques do not foster a spatial metaphor for navigation, and, as a result may decrease the "lost in hypertext" disorientation typical of navigational metaphors.

References

[1] Chalmers, M. and Chitson, P. Bead: Explorations in Information Visualization. In Proceedings of SIGIR '92, (June, 1992, Copenhagen, Denmark). ACM Press. 330-337.
[2] Golochinsky, G. Queries? Links? Is there a Difference? In Proceedings of CHI '97 (March 1997, Atlanta, Georgia). ACM Press. 407-414.


Short Paper Presentations
Visualization and Processing of Document
Toward a Unified Visual Representation of Documents and Concepts by John Greco
Scalable Visualization by Gene Golovchinsky
Graphs

Table of Contents