You are here

GENETİK ALGORİTMA İLE DOĞRUSAL REGRESYONDA TAHMİN AMAÇLI MODEL SEÇİMİ

PREDICTIVE MODEL SELECTION IN LINEAR REGRESSION BY GENETIC ALGORITHMS

Journal Name:

Publication Year:

DOI: 
10.5505/pausbed.2017.82712
Author NameUniversity of AuthorFaculty of Author
Abstract (2. Language): 
A procedure based on a heuristic approach called Genetic Algorithms (GA) is proposed for selecting regression models constructed by different size of independent variables. Instead of binary representation, the chromosomes are encoded as user-defined size (p) of integer arrays which represent variable subsets. The GA uses an evaluation function which consists of an average fitness (residual mean square error) of the regression model (chromosome) fitted in to 20 bootstrap samples in order to rank the chromosomes. GA runs for different size of variable subset in order to minimize the fitness function. The subsets determined by GA are finally evaluated by leave-one-out-cross-validation in order to decide the best variable subset. The proposed GA is applied to Communities and Crime dataset taken from UCI dataset repository. The GA is used to select different number of variables and the variable subset containing 30 variables (p=30) is found as the best variable subset based on leave-one-out-cross-validation score. The proposed procedure was compared with available feature selection methods and showed better performance.
Abstract (Original Language): 
Farklı sayıda değişken içeren regresyon modellerinden seçim yapmak için Genetik Algoritmalar (GA) olarak adlandırılan sezgisel yaklaşıma dayanan bir prosedür önerilmektedir. GA’nın kromozomları ikili sayısı dizi yerine, uzunluğu (p) kullanıcı tarafından belirlenen ve değişken setlerini temsil eden tamsayı dizisi olarak kodlanmıştır. GA, kromozomları sıralamak için kromozomundaki değişkenlerle elde edilen regresyon modellerinin 20 tane Bootstrap örneklemindeki RMSE (tahmin hatalarının karelerinin ortalaması) değerlerinin ortalamasından oluşan bir değerlendirme fonksiyonu kullanmaktadır. GA, farklı değişken sayılarıyla değerlendirme fonksiyonunu en aza indirgemek için çalıştırılır. GA tarafından seçilen setler nihai olarak en iyi değişken alt setini belirlemek için tek gözlemli çapraz geçerlilik yöntemi ile değerlendirilmektedir. Önerilen GA, UCI veri deposundan alınan Topluluklar ve Suç veri setine uygulanmıştır. GA, farklı sayılarda (p) değişken seçmek için kullanılmış ve 30 değişken (p = 30) içeren alt set, tek gözlemli çapraz geçerlilik kriterine göre en iyi alt set olarak bulunmuştur. Önerilen prosedür mevcut değişken seçim yöntemleri ile karşılaştırılmış ve daha iyi performans göstermiştir.
213
233

REFERENCES

References: 

Baker, J. (1985). “Adaptive Selection Methods for Genetic Algorithms”, Hillsdale, NJ, United States: L. Erlbaum Associates Inc..In Grefenstette , J. J. (Eds.), The First International Conference on Genetic Algorithms and Their Applications (p. 101-111).
Chatterjee, S., Hadi, A.S. (2006). Regression Analysis by Example, 4 ed. New Jesey: Wiley Series.
Communities and Crime Data Set, UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime
De Jong, K. A. (1975). Analysis of the Behavior of a Class of Genetic Adaptive Systems, Ph.D. Thesis, Department of Computer and Communication Sciences: University of Michigan.
Field, P. (1995). A Multary Theory for Genetic Algorithms: Unifying Binary and Nonbinary Problem Representations, Ph.D. Thesis, Department of Computer Science. London: University of London.
Fogel, L. J. (1997). “A Retrospective View and Outlook on Evolutionary Algorithms”. Berlin, Germany: Springer-Verlag.In Reusch, B. (Eds.), Computational Intelligence: Theory and Applications, 5th Fuzzy Days (p. 337-342).
Goldberg, D. E., & Lingle, R. (1985). “Alleles, Loci, and the Traveling Salesman Problem”, Hillsdale, New Yersey, United States: Lawrence Erlbaum.In Grefenstette, J. J. International Conference on Genetic Algorithms and Their Applications (p. 154-159).
IBM Corp. Released 2010. IBM SPSS Statistics for Windows, Version 19.0. Armonk, NY: IBM Corp.
Jung, M., & Zscheischler, J. (2013). “A guided hybrid genetic algorithm for feature selection with expensive cost functions”. Procedia Computer Science,18, 2337-2346.
Kabir, M. M., Shahjahan, M., & Murase, K. (2011). “A new local search based hybrid genetic algorithm for feature selection”. Neurocomputing, 74(17), 2914-2928.
Kewley, R., Embrechts, M. J., & Breneman, C. M. (1998). “Neural Network Analysis for Data Strip Mining Problems”, Intelligent Engineering Systems through Artificial Neural Networks, vol. 8, C. Dagli, Ed. Nashville - Missouri: ASME Press, pp. 391-396.
Leardi, R., Boggia, R., & Terrile, M. (1992). “Genetic algorithms as a strategy for feature selection”. Journal of chemometrics, 6(5), 267-281.
Lumley, T. (2009). Leaps: regression subset selection using Fortran code by Alan Miller, R package version 2.9. http://CRAN.R-project.org/package=leaps
Mallows, C. L. (1973). “Some Comments on Cp”, Technometric, vol. 15, pp. 661-675.
Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution Programs, 2. ed, Springer-Verlag, New York, United States.
Miller, A. J. (1984). “Selection of Subsets of Regression Variables”, Journal of the Royal Statistical Society. Series A (General), Vol. 147, No. 3, 389 -425.
Montgomery, D. C., Peck, E. A.,Vining, G. G. (2012). Introduction to Linear Regression Analysis, 5 ed., John Willey & Sons, Inc., New Jersey, United States.
Özdemir, M. (2011). “Genetik Algoritma Kullanılarak Portföy Seçimi”, İktisat İşletme ve Finans, Cilt 26, Sayı. 299, Sayfa: 67–89. DOI: 10.3848/iif.2011.299.2831
Paterlini, S., & Minerva, T. (2010). “Regression model selection using genetic algorithms”. In Proceedings of the 11th WSEAS international conference on nural networks and 11th WSEAS international conference on evolutionary computing and 11th WSEAS international conference on Fuzzy systems (pp. 19-27). World Scientific and Engineering Academy and Society (WSEAS).
Pamukkale Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, Sayı 28, Eylül 2017 M.Özdemir
230
R Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
Ruengvirayudh P., Brooks, G. P. (2016).” Comparing Stepwise Regression Models to the Best-Subsets Models”, the Art of Stepwise General Linear Model Journal, Vol. 42(1) pp. 1-14
Thompson, M. L. (1978). “Selection of Variables in Multiple Regression: Part I. A Review and Evaluation”, International Statistical Review, Vol. 46, No. 1, pp. 1-19.
Tsai, C., Eberle, & W., Chu, C. (2013). “Genetic algorithms in feature and instance selection”, Knowledge-Based Systems , 39, 240–247.
Van Rooji, A. J. F., Jain, L. C., & Johnson, R. P. (1996). “Neural Networks Training Using Genetic Algorithms”. Series in Machine Perception and Artificial Intelligence, Vol. 26, pp.130, Singapore: World Scientific.
Vose, M. D. (2010). The Simple Genetic Algorithm: Foundations and Theory. Cambridge, Massachussets, United States: MIT Press.
Whitley, D. (1989). “The GENITOR Algorithm and Selection Pressure: Why Rank-based Allocation of Reproductive Trials is Best”, San Mateo, CA, United States: Morgan Kaufmann.In Schaffer , J. D. (Eds.), Third International Conference on Genetic Algorithms (p. 116-121).
Yu, T. (2016). “Nonlinear variable selection with continuous outcome: a nonparametric incremental forward stagewise approach”. arXiv preprint arXiv:1601.05285.

Thank you for copying data from http://www.arastirmax.com