Visualizing
Information-Space Associations
Steven Noela, Vijay Raghavanb, C.-H. Henry Chub
aCenter for Secure Information Systems, George Mason
University
bCenter for Advanced Computer Studies, The University of
Louisiana at Lafayette
We have proposed a new methodology for visualizing associations mined from information spaces [1]. Similarities among objects in a space are computed from the mined associations. The similarities are consistent with important stronger associations, but still retain a simple pairwise structure, despite the fact that mined associations involve higher-cardinality sets. Thus clustering visualization tools can still be applied, while the distance information upon which they are based is richer. Our approach is applicable to general association mining applications, as well as applications involving information spaces modeled by directed graphs, e.g. the Web. In the context of collections of hypertext documents, the inter-document distances capture the information inherent in a collection's link structure, a form of link mining.
Figures 1 and 2 demonstrate our methodology with document sets extracted from the Science Citation Index [2]. Points within each figure represent individual documents. These points fall within clusters based on their citation associations. After computing similarities from mined associations, spatial positions are assigned to documents through a combination of minimum spanning tree and spring embedder [3] algorithms. Spatial filtering is applied to the positioned points, and then visualized as a 2-dimensional contour image and a 3-dimensional shaded surface with lighting, respectively.
The data for the document maps in the figures are generated as follows. First, a search is made in the Science Citation Index for documents published in 1999 matching the keyword wavelet*. For each document in the result set, a ``basket'' is constructed consisting of the documents that it cites. Here only documents cited more than a specified number of times are included, so that only the more frequently cited documents are included in the map. Association mining is then applied to the resulting set of baskets. This yields, for each subset of the cited documents, a count of the number of baskets that the subset appears in (known as the ``support'' of the subset). Finally, the similarity between any pair of cited documents is determined as a function not only of the support of that pair, but also of the supports of higher-cardinality supersets of the pair.

Figure 1: Two-dimensional visualization of document collection from Science Citation Index (SCI).

Figure 2: Landscape visualization of SCI document collection.
[1] S. Noel, Data Mining and Visualization of Reference Associations: Higher Order Citation Analysis, Ph.D. dissertation, University of Louisiana, Lafayette, Louisiana, Fall 2000. Download PDF
[2] Science Citation Index® is part of Institute for Scientific Information's Web of Science®, available at http://www.isinet.com/isi/products/citation/wos/.
[3] T. Fruchterman, E. Reingold, Graph Drawing by Force-Directed Placement, Software – Practice and Experience, eds. D. Comer and A. Willings, 21, pp. 1129-1164, 1991.
Steven Noel
Center
for Secure Information Systems
George
Mason University
Mail
Stop 4A4
Fairfax,
VA 22030-4444
Phone:
(703) 993-3946
Vijay Raghavan
Center for Advanced Computer Studies
The University of Louisiana at Lafayette
P.O.
Box 44330
Lafayette, LA 70504-4330
Phone: (337) 482-6603
C.-H. Henry Chu
Center for Advanced Computer Studies
The University of Louisiana at Lafayette
P.O.
Box 44330
Lafayette, LA 70504-4330
Phone: (337) 482-6309