Hubness-based clustering of high-dimensional data

Tomašev N.; Mladenić D.; Radovanović, Milan; Ivanović, Mirjana

Please use this identifier to cite or link to this item: https://open.uns.ac.rs/handle/123456789/6436

DC Field	Value	Language
dc.contributor.author	Tomašev N.	en
dc.contributor.author	Mladenić D.	en
dc.contributor.author	Radovanović, Milan	en
dc.contributor.author	Ivanović, Mirjana	en
dc.date.accessioned	2019-09-30T08:55:01Z	-
dc.date.available	2019-09-30T08:55:01Z	-
dc.date.issued	2015-01-01	en
dc.identifier.isbn	9783319092591	en
dc.identifier.uri	https://open.uns.ac.rs/handle/123456789/6436	-
dc.description.abstract	© Springer International Publishing Switzerland 2015. Hubness has recently been established as a significant property of k–nearest neighbor (k–NN) graphs obtained from high–dimensional data using a distance measure, with traits and effects relevant to the cluster structure of data, as well as clustering algorithms. The hubness property is manifested with increasing (intrinsic) data dimensionality. The distribution of data point in–degrees, i.e. the number of times points appear among the k nearest neighbors of other points in the data, becomes highly skewed. This results in hub points that can have in–degrees multiple orders of magnitude higher than expected. In this chapter we review and refine existing work which explains the mechanisms of the phenomenon, establishes the location of hub points near central regions of clusters in the data, and shows how hubness can negatively affect existing clustering algorithms by virtue of hub points lowering between–cluster distance. Next, we review the newly proposed partitional clustering algorithms, based on K–means, which take advantage of hubness by employing hubs in the process of cluster prototype selection. These ˵soft˶ K–means extensions avoid premature convergence to suboptimal stable cluster configurations and are able to reach the global optima more often. The algorithms offer significant improvements over the K–means baseline in scenarios involving high-dimensional and noisy data. The improvements stem from a better placement of hub points into clusters, which helps in increasing the between–cluster distance. Finally, we introduce novel clustering algorithms as ˵kernelized˶ versions of the most successful hubness–based methods discussed above, that are able to more effectively handle arbitrarily–shaped clusters.	en
dc.relation.ispartof	Partitional Clustering Algorithms	en
dc.title	Hubness-based clustering of high-dimensional data	en
dc.type	Book Chapter	en
dc.identifier.doi	10.1007/978-3-319-09259-1_11	en
dc.identifier.scopus	2-s2.0-84938429403	en
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/84938429403	en
dc.relation.lastpage	386	en
dc.relation.firstpage	353	en
item.grantfulltext	none	-
item.fulltext	No Fulltext	-
crisitem.author.dept	Prirodno-matematički fakultet, Departman za matematiku i informatiku	-
crisitem.author.orcid	0000-0003-1946-0384	-
crisitem.author.parentorg	Prirodno-matematički fakultet	-
Appears in Collections:	PMF Publikacije/Publications

Show simple item record

SCOPUS^TM
Citations

9

checked on May 3, 2024

Page view(s)

16

Last Week
3

Last month
0

checked on May 10, 2024

Google Scholar^TM

Check

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM