Journal Name:
- European Journal of Pure and Applied Mathematics
Key Words:
Author Name | University of Author |
---|---|
Abstract (2. Language):
Given a multivariate dataset composed of data from different known sources or processes,
how can we create a rule to separate the data, and classify any future data? Kernel discriminant analysis
is one of many supervised learning techniques that handle this problem. Recently, in this and other
knowledge discovery problems, kernel methods have gained popularity. This is somewhat ironic as
another common theme is variable reduction, and kernel methods actually inflate dimensionality. Due
to the substantial benefits of processing "kernelized" data, this is excusable - kernel methods frequently
outperform traditional classification techniques for real data when the classes are not easily separable.
In performing kernel discriminant analysis, there are two main issues that we address in this article.
The first is that, in the literature, the question of which kernel function to use is often subjectively
selected a prior, or determined by cross-validation with the sole objective of maximizing classification
performance. Secondly, after obtaining discriminant functions or support vectors to classify a dataset,
how do we know which of our variables are most responsible for, and important to, the classification?
In this research, we develop a new regularized algorithmthat simultaneously selects the kernel function
and subset of original variables. Our algorithm, a hybrid of cross-validation and the genetic algorithm,
does this by optimizing a function that rewards correct classification while penalizing model complexity
and misclassification.
We report results on three real datasets, including data from a medical imaging study. For the latter,
we obtained an impressively low misclassification rate of 0.3%, while reducing the number of features
from p = 20 to p∗ = 6.
Bookmark/Search this post with
FULL TEXT (PDF):
- 2