Buradasınız

Arkaplan Veri Süresinin Konuşmacı Doğrulama Performansına Etkisi

EFFECTS OF BACKGROUND DATA DURATION ON SPEAKER VERIFICATION PERFORMANCE

Journal Name:

Publication Year:

Keywords (Original Language):

Abstract (2. Language): 
Gaussian mixture models with universal background model (GMM-UBM) and vector quantization with universal background model (VQ-UBM) are the two well-known classifiers used for speaker verification. Generally, UBM is trained with many hours of speech from a large pool of different speakers. In this study, we analyze the effect of data duration used to train UBM on text-independent speaker verification performance using GMM-UBM and VQ-UBM modeling techniques. Experiments carried out NIST 2002 speaker recognition evaluation (SRE) corpus show that background data duration to train UBM has small impact on recognition performance for GMM-UBM and VQ-UBM classifiers.
Abstract (Original Language): 
Gauss karışım modeli genel arka plan modeli (GKM-GAM) ve vektör nicemleme genel arka plan modeli (VN-GAM) konuşmacı doğrulamada sık kullanılan iki yöntemdir. Genellikle GAM modeli fazla sayıda farklı konuşmacının bulunduğu bir kümeden seçilen saatlerce uzunluktaki ses işaretleri kullanılarak eğitilir. Bu çalışmada, GAM modelinin eğitiminde kullanılan veri miktarının metinden bağımsız konuşmacı doğrulama performansına etkisi incelenmektedir. NIST 2002 konuşmacı tanıma değerlendirme veritabanı ile GKM-GAM ve VN-GAM yöntemleri kullanılarak yapılan deneysel çalışmalar arka plan modelini eğitmek için kullanılan veri miktarının konuşmacı tanıma performansına çok fazla etkisinin olmadığı görülmüştür.
FULL TEXT (PDF): 

REFERENCES

References: 

1. Campbell, W., Sturim, D. E., Reynolds, D. A., Support Vector Machines Using GMM
Supervectors for Speaker Verification, IEEE Signal Processing Letters, Vol. 13, No. 5, pp.
308–311, May 2006.
2. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P and Ouellet, P. (2011) Front-End Factor
Analysis for Speaker Verification, IEEE Transactions on Audio, Speech and Language
Processing, 19(4), 788-798.
3. Hanilçi, C. and Ertaş, F. (2011) Comparison of the impact of some Minkowski metrics on
VQ/GMM based speaker recognition, Computers & Electrical Engineering, 37(1), 41-56.
4. Hautamäki, V., Kinnunen, T., Kärkkäinen, I., Tuononen, M., Saastamoinen, J. and Fränti,
P. (2008) Maximum a Posteriori Estimation of the Centroid Model for Speaker
Verification, IEEE Signal Processing Letters, 15: 162--165.
5. Kenny, P., Boulianne, G., Ouellet, P. and Dumouchel, P. (2007) Joint factor analysis
versus eigenchannels in speaker recognition, IEEE Transactions on Audio, Speech and
Language Processing, 15 (4), 1435-1447.
6. Kinnunen, T., Saastamoinen, J., Hautamäki, V., Vinni, M. and Fränti, P. (2009)
Comparative Evaluation of Maximum a Posteriori Vector Quantization and Gaussian
Mixture Models in Speaker Verification, Pattern Recognition Letters, 30(4): 341--347.
7. Kinnunen, T. and Li, H. (2011) An Overview of Text-Independent Speaker Recognition:
from Features to Supervectors, Speech Communication 52(1), 12--40.
8. NIST, (2001). http://www.itl.nist.gov/iad/mig/tests/sre/2002/index.html, Retrieved: July
2012, Subject: NIST 2002 SRE Evaluation Plan
9. NIST, (2002). http://www.itl.nist.gov/iad/mig/tests/sre/2001/index.html, Retrieved: July
2012, Subject: NIST 2001 SRE Evaluation Plan
10. Reynolds, D. A., Quatieri, T. F. and Dunn, R. B. (2000) Speaker Verification Using
Adapted Gaussian Mixture Models, Digital Signal Processing, 10(1-3), 19-41.

Thank you for copying data from http://www.arastirmax.com