Ağırlıklandırma Mesaj Kelimeler tarafından algılanıyor Spam (Seri C)

Mousa Abdoh; Nael SALMAN; Mohammad Musa

Detecting Spam by Weighting Message Words (Series C)

Journal Name:

Çankaya University Journal of Arts and Sciences

Publication Year:

2009

Key Words:

Author Name	University of Author
Mousa Abdoh	Filistin
Nael SALMAN	Filistin
Mohammad Musa	Sudan

Abstract (2. Language):

The huge number of spam e-mail received daily by users account, made the necessity of existence of some sort of automated spam filter to detect and remove these unwanted e-mails. Most of the existing spam filters are based on naive Bayesian methods. The work presented in this paper introduces a new automated filter based on naive Bayesian method. The basic idea of this filter is to give each word appears in e-mails a weight based on its frequency in both spam and legitimate mails. This weight value indicates its probable belongings to spam or legitimate. The proposed filter has a preprocessing component which removes all common words. In the training phase a set of 1300 e-mails (legitimate and spam) has been used for giving weights for non common words. The classifier uses the weight table generated in the training phase to classify a given e-mail as spam or legitimate. During testing we used 400 e-mails, 200 of them are spam and 200 of them are legitimate, the proposed algorithm achieved a 95% rate of accuracy.

Bookmark/Search this post with

FULL TEXT (PDF):

arastirmax-agirliklandirma-mesaj-kelimeler-tarafindan-algilaniyor-spam-seri-c.pdf

1

1-14

Turkish

REFERENCES

References:

[1] http://www.monkeys.com/spam-defined/ visited on Apr, 29th, 2008
[2] The wikipedia webpage http://en.wikipedia.org/ visited on Nov, 5th, 2007
[3] Bayes theory webpage http://www.trinity.edu/cbrown/bayesWeb/index.html visited in Jan, 20th, 2008. [4] The wikipedia webpage http://en.wikipedia.org/ visited on Nov, 5th,2007
[5] Hoffman,paul, "unsolicited Bulk - mail, definition and problems" UBE - DEF IMCR- 004 Oct, 5th,1997.
[6] Duncan Cook, Jacky Hartnett, Kevin Manderson and Joel Scanlan, "Catching Spam Before it Arrives".
[7] Spam Watches Webpage, http://spamwatchers.com/2008/03/20/facts-and-Figures-about-spam/ visited
on Apr, 18th, 2008.
[8] Richard O. Duda, Peter E. Hart, David G. Stork, "Pattern Classification" Wiley - Inter science
Publication, 2001.
[9] Wikipedia, URL: http://en.wikipedia.org/wiki/E-mail_address_harvesting visited on Apr, 30th, 2009. [10] Freshmeat website: URL:http://freshmeat.net/ visited on Mar, 10th, 2009.

Thank you for copying data from http://www.arastirmax.com

Buradasınız

Ağırlıklandırma Mesaj Kelimeler tarafından algılanıyor Spam (Seri C)

Journal Name:

Publication Year:

Key Words:

REFERENCES

Recommended Articles

Buradasınız

Ağırlıklandırma Mesaj Kelimeler tarafından algılanıyor Spam (Seri C)

Journal Name:

Publication Year:

Key Words:

REFERENCES

Login

Recommended Articles