You are here

Regularized SVM Classification with a new Complexity-Driven Stochastic Optimizer

Journal Name:

Publication Year:

AMS Codes:

Abstract (2. Language): 
Given a multivariate dataset composed of data from different known sources or processes, how can we create a rule to separate the data, and classify any future data? Kernel discriminant analysis is one of many supervised learning techniques that handle this problem. Recently, in this and other knowledge discovery problems, kernel methods have gained popularity. This is somewhat ironic as another common theme is variable reduction, and kernel methods actually inflate dimensionality. Due to the substantial benefits of processing "kernelized" data, this is excusable - kernel methods frequently outperform traditional classification techniques for real data when the classes are not easily separable. In performing kernel discriminant analysis, there are two main issues that we address in this article. The first is that, in the literature, the question of which kernel function to use is often subjectively selected a prior, or determined by cross-validation with the sole objective of maximizing classification performance. Secondly, after obtaining discriminant functions or support vectors to classify a dataset, how do we know which of our variables are most responsible for, and important to, the classification? In this research, we develop a new regularized algorithmthat simultaneously selects the kernel function and subset of original variables. Our algorithm, a hybrid of cross-validation and the genetic algorithm, does this by optimizing a function that rewards correct classification while penalizing model complexity and misclassification. We report results on three real datasets, including data from a medical imaging study. For the latter, we obtained an impressively low misclassification rate of 0.3%, while reducing the number of features from p = 20 to p∗ = 6.
216
230

REFERENCES

References: 

[1] S Aeberhard, D Coomans, and O de vel. Comparison of Classifiers in High Dimensional
Settings. Technical Report 92-02, Dept. of Computer Science and Dept. of Mathematics
and Statistics, James Cook University of North Queensland, 1992.
[2] N Aronszajn. Theory of Reproducing Kernels. In Transactions of the American Mathemat-
ical Society, volume 68, pages 337–404, 1950.
[3] H Bozdogan. ICOMP: A New Model-Selection Criteria. In H H Bock, editor, Classifica-
tion and Related Methods of Data Analysis, pages 599–608. Elsevier Science Publishers,
Amsterdam, The Netherlands, 1988.
[4] H Bozdogan. On the Information-Based Measure of Covariance Complexity and its Application
to the Evaluation of Multivariate Linear Models. Communication in Statistics,
Theory and Methods, 19:221–278, 1990.
[5] H Bozdogan. Akaike’s Information Criterion and Recent Developments in Information
Complexity. Journal of Mathematical Psychology, 44:62–91, March 2000.
[6] M Chen. Estimation of Covariance Matrices Under a Quadratic Loss Function. Research
Report S-46, Department of Mathematics, SUNY at Albany, 1976.
[7] D Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-
Wesley Longman Publishing Col, Inc.„ Boston, USA, 1989.
[8] R Haupt and S Haupt. Practical genetic algorithms. John Wiley, Hoboken, USA, 2004.
[9] J H Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with
Applications to Biology, Control, and Artificial Intelligence. The University of Michigan
Press, Ann Arbor, USA, 1975.
[10] J H Holland. Genetic Algorithms. Scientific American, 267:66–72, 1992.
[11] C Hsu and C Lin. A comparison of methods for multiclass support vector machines. In
IEEE Transactions on Neural Networks, volume 13, pages 415–425, 2002.
[12] C Liberati, J A Howe, and H Bozdogan. Data Adaptive Simultaneous Parameter and
Kernel Selection in Kernel Discriminant Analysis Using Information Complexity. Journal
of Pattern Recognition Research, 4(1):119–132, 2009.
REFERENCES 230
[13] J Pearlman. Nuclear Magnetic Resonance Spectral Signatures of Liquid Crystals in Human
Atheroma As Basis For Multi-Dimensional Digital Imaging of Atherosclerosis. PhD thesis,
University of Virginia, 1986.
[14] S Press. Estimation of a Normal CovarianceMatrix. Technical report, University of British
Columbia, 1975.
[15] C Thomaz. Maximum Entropy Covariance Estimate for Statistical Pattern Recognition. PhD
thesis, University of London and Diploma of the Imperial College (D.I.C.), 2004.

Thank you for copying data from http://www.arastirmax.com