FİLTRE FREKANS ÖLÇEĞİ DEĞİŞİMLERİNİN KONUŞMACI TANIMAYA ETKİSİ

Ömer ESKİDERE; Figen ERTAŞ

THE EFFECTS OF FILTER FREQUENCY SCALE VARIABILITY ON SPEAKER IDENTIFICATION PERFORMANCE

Journal Name:

Sigma Mühendislik ve Fen Bilimleri Dergisi

Publication Year:

2009

Key Words:

Keywords (Original Language):

Author Name	University of Author	Faculty of Author
Ömer ESKİDERE	Uludağ Üniversitesi	Teknik Bilimler Meslek Yüksekokulu
Figen ERTAŞ	Uludağ Üniversitesi	Mühendislik Mimarlık Fakültesi

Abstract (2. Language):

Extracting discriminatory feature vectors that contain speaker specific information is of crucial importance in speaker identification. Although the cepstrum coefficients on the Mel frequency scale are commonly used as feature vectors, it is demonstrated in this paper that linear and ERB frequency scales provide better results compared to the Mel scale. In the paper, ERB, Bark and linear scales are compared with Mel scale on the TIMIT and NTIMIT databases. On the TIMIT database, an identification rate of 100% is obtained with the linear frequency scale when the filter-bank is placed in 0-8 KHz range, and a rate of 98.81% is obtained with the ERB scale using 0-4 KHz filter-bank frequency range. On the NIMIT database, 73.51% identification rate is achieved with linear scale, resulting in 2.97% improvement over that of the Mel scale.

Bookmark/Search this post with

Tweet Widget

Abstract (Original Language):

Kişileri birbirinden ayırt edici özellikleri taşıyan öznitelik vektörlerinin elde edilmesi, konuşmacı tanımanın en önemli kısmıdır. Öznitelik vektörü olarak her ne kadar Mel frekans ölçeğindeki kepstrum katsayıları yaygın olarak kullanılsa da, bu makalede görüleceği üzere doğrusal ve ERB frekans ölçekleri kullanılarak Mel ölçeğe kıyasla daha iyi sonuçlar elde edilmiştir. Bu makalede, TIMIT ve NTIMIT veritabanları için, Mel ölçeği ile ERB, Bark ve doğrusal ölçek karşılaştırılmıştır. TIMIT veritabanında süzgeç dizilerinin yerleştirildiği frekans bandı 0-8 kHz için doğrusal ölçekle %100, 0-4 kHz frekans bandı için ERB ölçekle %98.81 konuşmacı tanıma oranı elde edilmiştir. NTIMIT veritabanında doğrusal ölçekle %73.51 konuşmacı tanıma oranı elde edilip Mel ölçeğe kıyasla %2.97 tanıma artışı sağlanmıştır.

FULL TEXT (PDF):

arastrmx_845_27_pp_197-207.pdf

3

197-207

Turkish

REFERENCES

References:

[1] Liu, Li., J. He and Palm G., “Signal Modeling for Speaker Identifiction”. Proc. Int.
Conference on Acoustics, Speech, and Signal Processing (ICASSP-96), Vol. 2, 1996, pp.
665-668.
[2] Stevens, S. and J. Volkman, “The Relation of Pitch to Frequency”. American Journal of
Psychology, vol. 53, p. 329, 1940.
[3] Kinnunen, T. “Spectral Features for Automatic Text-independent Speaker Recognition”,
Ph.Lic. thesis, University of Joensuu, Department of Computer Science p. 49-115, 2003.
[4] Ganchev, T. “Speaker Recognition”, Ph.D. thesis, Dept. of Electrical and Computer
Engineering, University of Patras, Greece. p. 61-82. 2005.
[5] Reynolds D. A., and Rose, R. C., “Robust Text-Independent Speaker Identification Using
Gaussian Mixture Speaker Models”, IEEE Trans. Speech Audio Proc., 3, (1), pp. 72–83,
1995.
[6] Reynolds, D. A., “A Gaussian Mixture Modeling Approach to Text Independent Speaker
Identification”, Ph.D. Thesis, Georgia Institute of Technology, 1992.
[7] Umesh, S., L. Cohen and Nelson D., “Fitting the Mel Scale”. Proc. Int. Conference on
Acoustics, Speech, and Signal Processing (ICASSP-99), Vol. 1, 1999, pp. 217–220.
[8] O’Shaughnessy, D., “Speech Communication Human and Machine”. Addison Wesley,
New York, 1987.
[9] Fant, G., “Acoustic Theory of Speech Production”. Mouton & Co., The Hauge, 1960.
[10] Slaney, M., “An Efficient Implementation of the Patterson-Holdsworth Auditory Filter
Bank”, Tech. Rep. 35, Apple Computer, Inc., 1993.
[11] Picone, J., “Fundamentals of Speech Recognition: a Short Course”. Institute for Signal
and Information Processing, pp. 68-69, 1996.
[12] Moore , B. C. J. and B. Glasberg R., “Suggested Formula for Calculating Auditory Filter
Bandwidths and Exicatation Patterns”, J. Acoust. Soc. Am., 74, p. 750-753, 1983.
[13] Ertaş, F., “Ses İşaretlerine Karşı Basilar Membran Hareketinin Yazılım Benzetimi”,
S.D.Ü. Fen Bilimleri Dergisi 6:1, s. 86-93, 2002.
[14] Glasberg, B. R. and Moore B. C. J., “Derivation of Auditory Filter Shapes From
Notched-Noise Data”, Hearing Research, vol. 47, pp. 103–108, 1990.
[15] Aliaa, A. Y., Ebada A. S. and El Behaidy W. H., “Development of Automatic Speaker
Identification System”, st 21 National Radio Science Conf., 2004.
[16] Reynolds D. A., Zissman M. A., Quatieri T. F., et. al., “The Effects of Telephone
Transmission Degradations on Speaker Recognition Performance”, ICASSP (Detroit),
May 9-12, 1995, 329-331.

Thank you for copying data from http://www.arastirmax.com

Tweet Widget

Buradasınız

FİLTRE FREKANS ÖLÇEĞİ DEĞİŞİMLERİNİN KONUŞMACI TANIMAYA ETKİSİ

Journal Name:

Publication Year:

Key Words:

Keywords (Original Language):

Konuşmacı tanıma

Gauss karışım modeli

Süzgeç frekans ölçekleri

TIMIT/NTIMIT veritabanı.

REFERENCES

Recommended Articles

Buradasınız

FİLTRE FREKANS ÖLÇEĞİ DEĞİŞİMLERİNİN KONUŞMACI TANIMAYA ETKİSİ

Journal Name:

Publication Year:

Key Words:

Keywords (Original Language):

Konuşmacı tanıma

Gauss karışım modeli

Süzgeç frekans ölçekleri

TIMIT/NTIMIT veritabanı.

REFERENCES

Login

Recommended Articles