You are here

CURE, AGNES VE K-MEANS ALGORİTMALARINDAKİ KÜMELEME YETENEKLERİNİN KARŞILAŞTIRILMASI

COMPARISON OF CLUSTERING CHARACTERISTICS OF CURE, AGNES AND K-MEANS ALGORITHMS

Journal Name:

Publication Year:

Keywords (Original Language):

Abstract (2. Language): 
In this study, applications on the synthetic datasets using hierarchical clustering algorithms, CURE (Clustering Using REpresentatives) and AGNES (AGglomerative NEsting), and a partitioning based clustering algorithm, k-means are compared. This applied study shows that k-means algorithm can find discrete and condensed clusters successfully. According to the results of k-means applications, this algorithm can be used to find similar sized and spherical clusters, but, it divides the big clusters into smaller partitions even they are spherical. Applications on AGNES algorithm show that AGNES can find spherical clusters effectively, but, it is very sensitive to the outliers. Applied studies on CURE algorithm show that this algorithm can find different sized and different shaped clusters effectively. On CURE applications, it is found out that, clustering process is not affected from outliers but it is very sensitive to the value of the input parameters.
Abstract (Original Language): 
Bu çalışmada, hiyerarşik kümeleme algoritmalarından CURE (Clustering Using REpresentatives) ve AGNES (AGglomerative NEsting) ile bölümleyici kümeleme algoritmalarından çok sık kullanılan k-means’ in sentetik veri setlerinde uygulanmasıyla elde edilen sonuçların karşılaştırması açıklanmaktadır. Gerçekleştirilen uygulamalarda, k-means algoritmasının ayrıkve sıkışık bulutlar halindeki kümeleri başarıyla bulduğu görülmüştür. Bu algoritma benzer büyüklükteki küresel kümeleri bulabilirken, çok büyük kümeleri küresel de olsa parçalara ayırmaktadır. AGNES algoritması uygulamaları bu algoritmanın küresel kümeleri etkili bir şekilde bulduğunu, ancak sıradışı noktalara karşı çok duyarlı olduğunu göstermiştir. CURE algoritması uygulamalarında bu algoritmanın farklı büyüklüklerde ve farklı şekillerdeki kümeleri sıradışı noktalardan etkilenmeden başarıyla bulduğu görülmüştür. Ancak, CURE algoritmasıyla elde edilen kümelerin girişparametrelerinin değerlerinden etkilendiği saptanmıştır.
1-18

REFERENCES

References: 

Anders K-H., (2003), A Hierarchical Graph Clustering Approach to Find Groups of
Data, Institute of Cartography and Geoinformatics University of Hannover.
Baltrunas L. ve Gordevicius J., “Implementation ofCURE Clustering Algorithm”,
Technical Report, http://www.inf.unibz.it/dis/teaching/DWDM05/reports/cure.pdf ;
Erişim tarihi: 17/04/2005.
Berkhin P., (2002), Survey of Clustering Data Mining Techniques, San Jose,
California, USA, Accrue Software Inc..
Bilgin T., (2003), Veri Madenciliğinde Kümeleme Analizi Yöntemi Uygulaması,
Yüksek Lisans Tezi, Marmara Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar ve
Kontrol Eğitimi.
Meral DEMİRALAY, A. Yılmaz ÇAMURCU
16
Bilgin T. ve Çamurcu Y., (2003), “A Data Mining Application on Air temperature
Database”, Lecture Notes in Computer Science, Springer-Verlag.
Boutsinas B. ve Gnardellis T., (2002), “On Distributing the Clustering Process”,
Pattern Recognition Letters 23, 999-1008.
Fasulo D., (1999), An Analysis of Recent Work on Clustering Algorithms,
Technical Report, 01-03-02, Department of Computer Science & Engineering,
University of Washington.
Guha S., (2000), Approximation Algorithms for Facility Location Problems,
Stanford University Computer Science.
Guha S., Rastogi R. ve Shim K., (2002), “CURE: An Efficient Clustering Algorithm
for Large Databases”, Information Systems 26, 1, 35-58.
Halkidi M., Batistakis Y. ve Vazirgiannis M., (2001), On Clustering Validation
Techniques, Kluwer Academic Publishers.
Han E.-H., (2005), İnternette Kişisel Görüşme, Research Associate, Department of
Computer Science, University of Minnesota, Minneapolis.
Han J. ve Kamber M., (2001), Data Mining Concepts and Techniques, Morgan
Kauffmann Publishers Inc.
Han J., Kamber M. ve Tung A. K. H., (2001) “SpatialClustering Methods in Data
Mining: A Survey”, Geographic Data Mining and Knowledge Discovery, H. Miller
ve J. Han (ed.), Taylor and Francis.
Ho T. K. ve KleinBerg E. M., (1996), “Building Projectable Classifiers of Arbitrary
Complexity”, Proceedings of the 13th International Conference on Pattern
Recognition, Vienna, Austria, 880-885.
Ho T. K. ve KleinBerg E. M., Checkboard Dataset
http://www.cs.wisc.edu/math-prog/mpml.html ; Erişim tarihi: 07/01/2005.
Jain A. K. ve Dubes R. C., (1988), ”Algorithms for Clustering Data”, Englewood
Cliffs, New Jersey, 07632, Prentice Hall.
Jain A. K., Murty M. N. ve Flynn P. J., (1999), “Data Clustering: A Review”, ACM
Computing Surveys, 31, 3.
İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi Güz 2005/2
17
Jang R., Computer Science Department of Tsing Hua University, Taiwan,
http://neural.cs.nthu.edu.tw/jang/matlab/demo/ ; Erişim tarihi: 06/06/2005.
Karypis G., Han E. H. ve Kumar V., (1999), “CHAMELEON: A Hierarchical
Clustering Algorithm Using Dynamic Modeling”, IEEE Computer 32, 8, 68-75.
Kaufman L. ve Rousseeuw P. J., (1990), Finding Groups in Data: an Introduction to
Cluster Analysis, John Wiley and Sons.
MacQueen J., (1967), Some Methods for Classification and Analysis of Multivariate
Observations, Berkeley, University of California Press.
Mercer D. P., (2003), ”Clustering Large Datasets”,
http://www.stats.ox.ac.uk/~mercer/documents/transfer.pdf ; Erişim tarihi:
13/05/2005.
Shlens J., e-posta: jonshlens@ucsd.edu , Erişim tarihi: 09/10/2002.
Syed A. A., (2004), Performance Analysis of K-MeansAlgorithm and Kohonen
Networks, Yüksek Lisans Tezi, Florida Atlantic University, Master of Science .
Szymkowiak A., Larsen J. ve Hansen L. K., (2001), “Hierarchical Clustering for
Data Mining”, KES'2001 Fifth International Conference on Knowledge-Based
Intelligent Information Engineering Systems & Allied Technologies, Osaka-Japan.
Valgeirsson A. G., Erlingsson B. ve Einarson I. S.,(2003), Using Clustering to
Index Image Descriptors: A Performance Evaluation, ReykJavik University, B.Sc.
Project.
Wang W. ve Zaiane O. R., (2002), ”Clustering Web Sessions by Sequence
Alignment”, SIGMOD Conference.
Witten I. H., Frank E., (1999), “Data Mining: Practical machine learning tools with
Java implementations.”, San Francisco, Morgan Kaufmann.
Xiong H., Steinbach M., Tan P.-N. ve Kumar V., (2004), “HICAP: Hierarchial
Clustering with Pattern Preservation”, In Proc. of the Fourth SIAM International
Conf. on Data Mining (SDM'04), Florida, USA.
Meral DEMİRALAY, A. Yılmaz ÇAMURCU
18
Zhao Y. ve Karypis G., (2002), ”Clustering in Life Sciences.”, Technical Report,
Department of Computer Science and Engineering University of Minnesota, TR 02-016.

Thank you for copying data from http://www.arastirmax.com