Detecting Spam by Weighting Message Words (Series C)
Journal Name:
- Çankaya University Journal of Arts and Sciences
Key Words:
Author Name | University of Author |
---|---|
Abstract (2. Language):
The huge number of spam e-mail received daily by users account, made the necessity of existence of some sort of automated spam filter to detect and remove these unwanted e-mails. Most of the existing spam filters are based on naive Bayesian methods.
The work presented in this paper introduces a new automated filter based on naive Bayesian method. The basic idea of this filter is to give each word appears in e-mails a weight based on its frequency in both spam and legitimate mails. This weight value indicates its probable belongings to spam or legitimate. The proposed filter has a preprocessing component which removes all common words.
In the training phase a set of 1300 e-mails (legitimate and spam) has been used for giving weights for non common words.
The classifier uses the weight table generated in the training phase to classify a given e-mail as spam or legitimate. During testing we used 400 e-mails, 200 of them are spam and 200 of them are legitimate, the proposed algorithm achieved a 95% rate of accuracy.
Bookmark/Search this post with
- 1
1-14