Please use this identifier to cite or link to this item: https://open.uns.ac.rs/handle/123456789/6436
Title: Hubness-based clustering of high-dimensional data
Authors: Tomašev N.
Mladenić D.
Radovanović, Milan
Ivanović, Mirjana 
Issue Date: 1-Jan-2015
Journal: Partitional Clustering Algorithms
Abstract: © Springer International Publishing Switzerland 2015. Hubness has recently been established as a significant property of k–nearest neighbor (k–NN) graphs obtained from high–dimensional data using a distance measure, with traits and effects relevant to the cluster structure of data, as well as clustering algorithms. The hubness property is manifested with increasing (intrinsic) data dimensionality. The distribution of data point in–degrees, i.e. the number of times points appear among the k nearest neighbors of other points in the data, becomes highly skewed. This results in hub points that can have in–degrees multiple orders of magnitude higher than expected. In this chapter we review and refine existing work which explains the mechanisms of the phenomenon, establishes the location of hub points near central regions of clusters in the data, and shows how hubness can negatively affect existing clustering algorithms by virtue of hub points lowering between–cluster distance. Next, we review the newly proposed partitional clustering algorithms, based on K–means, which take advantage of hubness by employing hubs in the process of cluster prototype selection. These ˵soft˶ K–means extensions avoid premature convergence to suboptimal stable cluster configurations and are able to reach the global optima more often. The algorithms offer significant improvements over the K–means baseline in scenarios involving high-dimensional and noisy data. The improvements stem from a better placement of hub points into clusters, which helps in increasing the between–cluster distance. Finally, we introduce novel clustering algorithms as ˵kernelized˶ versions of the most successful hubness–based methods discussed above, that are able to more effectively handle arbitrarily–shaped clusters.
URI: https://open.uns.ac.rs/handle/123456789/6436
ISBN: 9783319092591
DOI: 10.1007/978-3-319-09259-1_11
Appears in Collections:PMF Publikacije/Publications

Show full item record

SCOPUSTM   
Citations

9
checked on May 3, 2024

Page view(s)

16
Last Week
3
Last month
0
checked on May 10, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.