Buradasınız

ARAMA MOTORLARINDA YENİ KONU TANILAMADA KARAKTER N-GRAM VE YAPAY SİNİR AĞLARI UYGULAMASI

Character N-gram and Neural Network Application for New Topic Identification in Search Engines

Journal Name:

Publication Year:

Abstract (2. Language): 
Nowadays, the estimate of web users’ behaviors has been important due to the web search engine usage increase. To date, many content-ignorant studies have been performed for automatic new topic identification. Although, some studies performed well, it was observed that they often made mistakes when queries had spelling differences. In this study the character n-gram methodology, which is content ignorant, was used for new topic identification. In addition, it was aimed to improve previous content-ignorant studies. Consideration of previous studies it was observed that the neural network applications gave better results than the other studies. Thus, the neural network method’s estimations were used in this study and character n-gram methodology was used in order to eliminate wrong estimations, because of spelling errors.
Abstract (Original Language): 
Günümüzde, arama motorlarının kullanımının artmasıyla beraber kullanıcı davranışlarının tahmini önem kazanmıştır. Bugüne kadar anlam bazlı olmayan pek çok yöntem yeni konu tanılamada kullanılmıştır. Bazı çalışmalardan iyi sonuçlar elde edilmesine rağmen, genelde çalışmaların yazım farklılığı içeren sorgularda hatalı tahminler yaptığı gözlenmiştir. Bu çalışmada, anlam bazlı olmayan, karakter n-gram yöntemi, yeni konu tanılamada kullanılmıştır. Bununla beraber karakter n-gram yöntemiyle önceki anlam bazlı olmayan çalışmaları iyileştirmek hedeflenmiştir. Önceki çalışmalar incelendiğinde yapay sinir ağları yönteminin diğerlerinden daha iyi sonuçlar verdiği gözlenmiştir. Bu yüzden, çalışmada yapay sinir ağları yönteminin tahminleri kullanılmış ve yazım yanlışlarından kaynaklanan hatalı tahminlerin giderilmesi için karakter n-gram yöntemi kullanılmıştır.
75-91

REFERENCES

References: 

1. He, D., Goker, A., Harper, D.J., (2002) Combining evidence for automatic Web session identification, Information
Processing and Management 38,727–742.
2. Huang, X., Peng, F., An, A., Shuurmans, D., Cercone N. (2003) Applying Machine Learning to Text Segmentation
for Information Retrieval, Information Retrieval 6:333–362.
3. Özmutlu, S, Spink, A., and Özmutlu, H.C. (2002) Analysis of large data logs: an application of Poisson
sampling to Excite Web queries. Information Processing and Management, 38(3), 473–490.
4. Özmutlu, H.C., Çavdur, F., Spink, A. and Özmutlu, S. (2004a). Neural network applications for automatic
new topic identification on excite web search engine data logs, Proceedings of ASIST 2004: 67th Annual
Meeting of the American Society for Information Science and Technology, Providence, RI, pp. 317-323.
5. Özmutlu, S., Özmutlu, H.C. and Spink, A. (2004b) A day in the life of Web searching: an exploratory study,
Information Processing and Management, 40, 319-345.
6. Özmutlu, H.C. and Çavdur, F. (2005a). Application of automatic topic identification on excite web search
engine data logs, Information Processing and Management, 41(5), 1243-1262.
7. Özmutlu, S. ve Çavdur, F., (2005b). Neural Network Applications for Automatic New Topic Identification,
Online Information Review 29: 34-53.
8. Özmutlu, H.C., Çavdur, F. and Özmutlu, S. (2006). Automatic New Topic Identification in Search Engine
Datalogs, Internet Research: Electronic Networking Applications and Policy, 16, 323-338.
9. Özmutlu, S., Özmutlu, H.C., Buyuk, B. (2007). Using Conditional Probabilities for Automatic New Topic
Identification, Internet Research: Electronic Networking Applications and Policy, 37, 491-515.
10. Özmutlu, S., Özmutlu, H.C., Buyuk, B. (2008a). A Monte-Carlo simulation application for automatic new
topic identification of search engine transaction logs”, Simulation Modeling Practice and Theory, 16, 519-
538.
11. Özmutlu, S., Özmutlu, H.C. and Cosar, G.C. (2008b). Neural Network Applications for Automatic New
Topic Identification of FAST and Excite search engine transaction logs.(Yayımda)
12. Shannon, C. E. (1951) Prediction and entropy of printed English. Bell System Technical Journal 30:50-64.
13. Spink, A., Wolfram, D., Jansen, B.J. and Saracevic, T. (2001) Searching the Web: The public and their queries.
Journal of the American Society for Information Science and Technology, 53(2), 226–234.

Thank you for copying data from http://www.arastirmax.com