Visualizing Information-Space Associations

 

Steven Noela, Vijay Raghavanb, C.-H. Henry Chub

 

aCenter for Secure Information Systems, George Mason University

bCenter for Advanced Computer Studies, The University of Louisiana at Lafayette

 

 

We have proposed a new methodology for visualizing associations mined from information spaces [1].  Similarities among objects in a space are computed from the mined associations.  The similarities are consistent with important stronger associations, but still retain a simple pairwise structure, despite the fact that mined associations involve higher-cardinality sets.  Thus clustering visualization tools can still be applied, while the distance information upon which they are based is richer.  Our approach is applicable to general association mining applications, as well as applications involving information spaces modeled by directed graphs, e.g. the Web.  In the context of collections of hypertext documents, the inter-document distances capture the information inherent in a collection's link structure, a form of link mining.

 

Figures 1 and 2 demonstrate our methodology with document sets extracted from the Science Citation Index [2].  Points within each figure represent individual documents.  These points fall within clusters based on their citation associations.  After computing similarities from mined associations, spatial positions are assigned to documents through a combination of minimum spanning tree and spring embedder [3] algorithms.  Spatial filtering is applied to the positioned points, and then visualized as a 2-dimensional contour image and a 3-dimensional shaded surface with lighting, respectively.

 

The data for the document maps in the figures are generated as follows.  First, a search is made in the Science Citation Index for documents published in 1999 matching the keyword wavelet*.  For each document in the result set, a ``basket'' is constructed consisting of the documents that it cites.  Here only documents cited more than a specified number of times are included, so that only the more frequently cited documents are included in the map.  Association mining is then applied to the resulting set of baskets.  This yields, for each subset of the cited documents, a count of the number of baskets that the subset appears in (known as the ``support'' of the subset).  Finally, the similarity between any pair of cited documents is determined as a function not only of the support of that pair, but also of the supports of higher-cardinality supersets of the pair.

 

 

 

Figure 1: Two-dimensional visualization of document collection from Science Citation Index (SCI).

 

 

 

 

Figure 2: Landscape visualization of SCI document collection.

 

 

References

 

[1]  S. Noel, Data Mining and Visualization of Reference Associations: Higher Order Citation Analysis, Ph.D. dissertation, University of Louisiana, Lafayette, Louisiana, Fall 2000.   Download PDF

     

[2]  Science Citation Index® is part of Institute for Scientific Information's Web of Science®, available at http://www.isinet.com/isi/products/citation/wos/.

 

[3]  T. Fruchterman, E. Reingold, Graph Drawing by Force-Directed Placement, Software – Practice and Experience, eds. D. Comer and A. Willings, 21, pp. 1129-1164, 1991.

 

 

Additional author information

Steven Noel

Center for Secure Information Systems

George Mason University

Mail Stop 4A4

Fairfax, VA 22030-4444

Phone: (703) 993-3946

snoel@gmu.edu

 

Vijay Raghavan

Center for Advanced Computer Studies

The University of Louisiana at Lafayette

P.O. Box 44330

Lafayette, LA  70504-4330

Phone: (337) 482-6603

vraghavan@cacs.louisiana.edu

 

C.-H. Henry Chu

Center for Advanced Computer Studies

The University of Louisiana at Lafayette

P.O. Box 44330

Lafayette, LA  70504-4330

Phone: (337) 482-6309

cice@cacs.louisiana.edu