You are here

Estimation and Selection in Regression Clustering

Journal Name:

Publication Year:

Author NameUniversity of Author

AMS Codes:

Abstract (2. Language): 
Regression clustering is an important model-based clustering tool having applications in a variety of disciplines. It discovers and reconstructs the hidden structure for a data set which is a random sample from a population comprising a fixed, but unknown, number of sub-populations, each of which is characterized by a class-specific regression hyperplane. An essential objective, as well as a preliminary step, in most clustering techniques including regression clustering, is to determine the underlying number of clusters in the data. In this paper, we briefly review regression clustering methods and discuss how to determine the underlying number of clusters by using model selection techniques, in particular, the information-based technique. A computing algorithm is developed for estimating the number of clusters and other parameters in regression clustering. Simulation studies are also provided to show the performance of the algorithm.
455-466

REFERENCES

References: 

[1] C Hennig. Identifiability of models for clusterwise linear regression. Journal of Classifi-
cation, 17:273–296, 2000.
[2] C Rao and Y Wu and Q Shao. An M-Estimation-Based Procedure for Determining the
Number of Regression Models in Regression Clustering. Journal of Applied Mathematics
and Decision Sciences, 2007, 2007.
[3] D Pollard. Strong consistency of k-means clustering. The Annals of Statistics, 9:135–140,
1981.
[4] H Bock. The equivalence of two extremal problems and its application to the iterative
classification of multivariate data. Manuscript for the medizinische statistik conference,
Forschungsinstitut Oberworfachl, 1969.
[5] H Bock. Probability models and hypotheses testing in partitioning cluster analysis. In P
Arabie and L Hubert and G De Soete, editor, Clustering and Classification., pages 377–
453, River Edge, New Jersey., 1996. World Scientific Publishing.
[6] H Späth. Clusterwise linear regression. Computing, 22:367–373, 1979.
[7] H Späth. Algorithm 48: A fast algorithm for clusterwise linear regression. Computing,
29:175–181, 1982.
[8] J Hartigan. Consistency of single linkage for high-density clusters. Journal of the Amer-
ican Statistical Association, 76:388–394, 1981.
[9] J Hartigan and M Wong. Algorithm as 136: A k-means clustering algorithm. Applied
Statistics, 28:100–108, 1978.
[10] J MacQueen. Some methods for classification and analysis of multivariate observations.
In N Le Cam and J Neyman, editors, Proceedings of the 5th Berkeley Symposium on Math-
ematical Statistics and Probability., volume 1, pages 281–297. University of California
Press., 1967.
[11] L Kaufman and P Rousseeuw. Finding Groups in Data. Wiley-Interscience, New York,
1990.
[12] M Wong. A hybrid clustering method for identifying high-density clusters. Journal of
the American Statistical Association, 77:841–847, 1982.
[13] P Rousseeuw and A Leroy. Robust Regression and Outlier Detection. Wiley, New York,
1987.
[14] Q Shao and Y Wu. A consistent procedure for determining the number of clusters in
regression clustering. Journal of Statistical Planning and Inference, 135:461–476, 2005.
[15] R Quandt and J Ramsey. Estimating mixtures of normal distributions and switching
regressions. Journal of the American statistical Association., 73:730–752, 1978.
[16] W DeSarbo and W Cron. A maximum likelihood methodology for clusterwise linear
regression. Journal of Classification, 5:249–282, 1988.

Thank you for copying data from http://www.arastirmax.com