Buradasınız

MEL FREKANSI KEPSTRUM KATSAYILARINDAKİ DEĞİŞİMLERİN KONUŞMACI TANIMAYA ETKİSİ

The Effects of Variabilities in Mel Frequency Cepstrum Coeffcients On Speaker Recognition

Journal Name:

Publication Year:

Keywords (Original Language):

Abstract (2. Language): 
Extraction of speaker-specific features which characterize the information towards identification of the correct speaker is vital importance. In this work TIMIT and NTIMIT databases are used. The effect of changing the feature vector elements to the speaker identification is analyzed and the best identifying elements are found. The best identifying feature vector elements may also be used for other speaker identification studies using the same databases. This way, any future work using these databases may not need to optimize the feature vectors towards identification
Abstract (Original Language): 
Konuşmacıya özgü bilgileri karakterize eden özniteliklerin çıkartılması, konuşmacı tanıma sisteminin performansı için hayati öneme sahiptir. Bu makalede, TIMIT ve NTIMIT veritabanları kullanılarak öznitelik vektörü oluşturma aşamalarının her biri için parametre değişiminin konuşmacı tanımaya etkisi incelenmekte ve tanımayı arttırıcı en iyi parametre değerleri bulunmaktadır. Bu veritabanları ile yapılacak diğer konuşmacı tanıma çalışmaları için, kaynak olabilecek optimum öznitelik değerleri belirlenmiştir. Bu sayede diğer araştırmacıların, en iyi parametreleri bulmak için tekrar deney yapmalarına gerek kalmayacaktır.

REFERENCES

References: 

1. Alexandre, P., P. Lockwood, (1993) Root Cepstral Analysis: A Unified View Application to Speech Processing
in Car Noise Environments. Speech Communication, Vol.12, p. 277-288.
2. Bhattacharyya, S. T. Srikanthan, P. Krishnamurthy, (2001) Ideal GMM. parameters & Posterior Log Likelihood
for Speaker Verification, Proceedings of the IEEE Signal Processing Society Workshop, USA. ISBN:
0-7803-7196-8, p. 471-480.
3. Besacier, L. J.F. Bonastre, (1998) Frame Pruning for Automatic Speaker Verification, Proc. EUSIPCO'98,
Greece, September 8-11, Vol.1, p. 367-370.
4. Chu, K. K., S. H. Leung and C. S. Yip, (2003) Perceptually non-uniform spectral compression for noisy
speech recognition, Proc. ICASSP 2003, p. 404-407.5. Claudio, B. and L. P. Ricotti (1999) Speech Recognition Theory and C++ Implementation. John
WILEY&Sons, Ltd, p. 125-137.
6. Davis, S. B. and P. Mermelstein (1980) Comparison of parametric repesentations for monosyllabic word
recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing,
vol. ASSP-28, p. 389-397.
7. Dempster, A. N. Laird and D. Rubin (1977) Maximum likelihood from incomplete data via the EM alorithm,
Journal of the Royal Statistical Society, vol. 39, p. 1-38.
8. Furui, S. (1989) Digital Speech Processing, Synthesis, and Recognition. M. Dekker Inc.
9. Ganchev, T. (2005) Speaker Recognition , PhD thesis, Dept. of Electrical and Computer Engineering, University
of Patras, Greece. p. 61-82.
10. Huang, X., Acero, A., Hon, H.-W., (2001) Spoken Language Processing: a Guide to Theory, Algorithm, and
System Development. Prentice-Hall, New Jersey.
11. Hunt, M. J., (1999) Spectral Signal Processing for ASR. IEEE ASRU Workshop, Colorado, Keystone,
U.S.A.
12. Jankowski, C., Kalyanswamy, A., Bason, S. and Spitz, J. (1990) NTIMIT: A phonetically balanced, continuous
speech, telephone bandwidth speech database, IEEE International Conference on Acoustic, Speech and
Signal Processing , p. 109-112
13. Karpov, E. (2003) Real-Time Speaker Identification, Master thesis, University of Joensuu, Department of
Computer Science p. 17-26.
14. Kinnunen, T. (2003) Spectral Features for Automatic Text-independent Speaker Recognition, Ph.D. thesis,
University of Joensuu, Department of Computer Science p. 49-115.
15. Lim, J. S. (1979) Spectral Root Homomorphic Deconvolution system, IEEE Trans. on ASSP, Vol. ASSP-27,
No. 3.
16. Linde, Y., A. Buzo., R. M. Gray. (1980) An Algorithm for Vector Quantization, IEEE Trans. Communications,
Vol. 28, No. 1, p. 84-95.
17. Lincoln, M. (1999) Characterization of Speakers for Improved Automatic Speech Recognition. Thesis Doctor
of Philosophyin the School of Information Systems, University of East Anglia, Norwich. p. 18-23.
18. Matsui, T. and S. Furui (1995) Speaker Recognition Tecnology. NNT Review, Vol. 7, No. 2, p. 40-48.
19. Rabiner, L. R. and B. H. Juang (1993) Fundamentals of Speech Recognition. Prentice Hall, Englewood
Cliffs.
20. Reynolds, D.A. (1992) A Gaussian Mixture Modeling Approach to Text Independent Speaker Identification.
Ph.D. thesis, Georgia Inst. of Technology.
21. Reynolds D. A., and Rose, R.C. (1995) Robust Text-Independent Speaker Identification Using Gaussian
Mixture Speaker Models, IEEE Trans. Speech Audio Proc., 3, (1), p. 72–83.
22. Sarıkaya, R., J. H. L. Hansen (2001) Analysis of the Root-Cepstrum for Acoustic Modeling and Fast Decoding
in Speech Recognition, Eurospeech-2001, Denmark p. 2-4.
23. Sarma, S. (1997) A Segment-based Speaker Verification System S.M. thesis, MIT Department of Electrical
Engineering and Computer Science, p. 84-86.
24. Slaney, M. (1998) Auditory Toolbox: A MATLAB Toolbox for Auditory Modeling Work Technical Report,
Interval Research Corproation, p. 29-32.
25. Wildermoth, B. R. (2001) Text Independent Speaker Recognition Using Source Based Features. Master of
Philosophy, Griffith University, Australia, p. 21-29.
26. Wolf, J. (1972) Efficient Acoustic Parameters for Speaker Recognition. Journal of the Acoustical Society of
America, vol. 51, no. 6, p. 2044-2056.
27. Zhu, Q., A. Alwan, (2000) On the use of variable frame rate in speech recognition. In Proc. Int. Conf. on
Acoustics, Speech, and Signal Processing ICASSP 2000, Turkey, vol. 3, p. 1783–1786.
28. Zue, V., Seneff, S. And Glass, J., (1990) Speech Database Development at MIT: TIMIT and beyond, Speech
Communication, p. 351-356

Thank you for copying data from http://www.arastirmax.com