You are here

Mobil Metne Bağımlı Tek Cümle Konuşmacı Tanıma Uygulamasında Kayıttan Sahte Doğrulama

The vulnerability of mobile textdependent single utterance speaker verification to replay attacks

Journal Name:

Publication Year:

Author NameUniversity of Author
Abstract (2. Language): 
Adapting different technologies for mobile platforms have become an important industry due to the vast use of mobile applications. With the significant increase in mobile applications, the security issues have also become a major concern for the mobile users. The aim of speaker recognition is to recognize the identity of the speaker from his/her voice. Thus, it provides a good alternative for mobile security. Speaker recognition technology can be used to increase the overall security of the applications requiring high security. It can also add extra security to an application by verifying the user with the voice in addition to a typed password. Speaker verification applications might be divided into two categories; text-dependent and textindependent. In text-dependent applications, vocabulary is usually constrained to digit strings or pre-defined pass phrases. In text-independent applications, there is no such constraint and system tries to verify the identity of the speaker from his/her natural speech. In text-dependent single utterance (TDSU) speaker verification, speakers repeat a fixed pass phrase in both enrollment and authentication sessions. The repetition of a single utterance improves the overall recognition accuracy of the system since the authentication utterance is included in the enrollment as a whole. Repetition of the same utterance also makes the usage easier. However, TDSU applications become vulnerable to replay attacks due to the same reason. A pre-record of the pass phrase might be used to spoof the system. In this study, we evaluate the robustness of mobile TDSU applications to replay attacks. In order to test the robustness of mobile TDSU applications to replay attacks, we construct a new speaker recognition database. We choose the Turkish utterance “benim parolam ses kaydımdır (my voice is my password)” as the pass phrase in the TDSU task since it contains 5 of the 8 vowels in the Turkish language. The database consists of 124 speakers. 62 of the speakers are female and 62 are male. The recordings are taken in 2 separate sessions using 2 different smart phones. Using the database, a realistic simulation of the replay attacks is performed by playing the recordings from one phone and recording to the other. The replay recordings are used as imposter trials in the verification tests. Until recently, Gaussian mixture models (GMMs) have been the dominant modeling approach for text independent speaker verification. In GMM, each speaker is modeled with a mixture of Gaussians. Generally, speaker models are adapted from a speaker independent universal background model (UBM). Maximum a posterior (MAP) method is usually used for the adaptation. In text dependent applications, hidden Markov model (HMM) based approaches are used since they better capture the co-articulation information. In a TDSU task, a single whole phrase HMM might be constructed for the pass phrase. The sentence HMM topology might be preferred over the phone HMM in order to better model the co-articulation and improve the verification performance. Recently, very powerful channel compensation techniques such as joint factor analysis (JFA), i-vector and ivector/ probabilistic linear discriminant analysis (ivector/ PLDA) are proposed. The methods achieved very good verification performance especially for text independent tasks. The performance gain of the methods for the text-dependent tasks is still investigated. In this study, we implement GMM, sentence HMM and i-vector/PLDA methods for the TDSU speaker verification task. The methods are tested against the replay spoofing attacks. The baseline equal error rate (EER) of the three methods with zero-effort imposter trials are about 0.5-1%. The best performance is achieved with the sentence HMM method in the baseline case. The verification performance of all three methods significantly decreases when zero-effort imposter trials are replaced with the replay spoofing attacks. The equal error rate increase to 10-25% from 0.5-1% with the replay trials. i-vector/PLDA results in the best performance in the spoofing experiment.
Abstract (Original Language): 
Son yıllarda akıllı telefon gibi mobil araçların kullanımındaki hızlı artış farklı teknolojileri bu platformlar için gerçekleştirmeyi önemli bir sektör haline getirmiştir. Mobil uygulama sayısındaki bu artış bu uygulamalardaki güvenlik meselesini de ön plana çıkarmıştır. Konuşmacının sesinden kimliğinin otomatik olarak belirlenmesini sağlayan konuşmacı tanıma teknolojisi kişisel bilgi güvenliği gerektiren mobil uygulamalarda güvenlik açığını gidermek için kullanılabilir. Metne bağımlı tek cümle konuşmacı tanıma uygulamasında konuşmacılar eğitim ve tanıma sırasında ortak parola cümlesini tekrar ederler. Eğitim ve tanımada aynı metnin tekrarlaması tanıma performansını arttırdığı gibi kullanım kolaylığı da sağlamaktadır. Bununla birlikte tek cümle uygulamaları özellikle kayıttan sahte doğrulama ataklarına karşı son derece savunmasızdır. Bu çalışmada metne bağımlı tek cümle uygulamasının kayıttan sahte doğrulama ataklarına karşı dayanıklılığı test edilmiştir. Bu çalışmada mobil araçlar için geliştirilecek tek cümle uygulamasının kayıttan sahte doğrulama ataklarına karşı dayanıklılığını test edebilmek için yeni bir konuşmacı tanıma veri tabanı oluşturulmuştur. Bu veri tabanında 124 konuşmacı (62 bayan + 62 bay) 2 ayrı oturumda belirlenen parola cümlesini tekrar etmiştir. Kayıtlar 2 farklı akıllı telefon kullanılarak alınmıştır. Bu veri tabanı ile kayıttan sahte doğrulama saldırıları simüle edilmiştir. Gauss karışım modeli (Gaussian mixture models - GKM) metinden bağımsız uygulamalarda en sık kullanılan yöntemlerdendir. Saklı Markov model (hidden Markov model - SMM) tabanlı yöntemler ise metne bağımlı uygulamalarda artikülasyon bilgisinden daha iyi faydalandıkları için tercih edilmektedir. Son dönemlerde kanal uyuşmazlığı problemini gidermek için i-vektör/PLDA yöntemi önerilmiş ve özellikle metinden bağımsız uygulamalarda son derece başarılı sonuçlar vermiştir. Bu çalışmada GKM, cümle SMM ve i-vektör/PLDA yöntemleri mobil metne bağımlı tek cümle uygulamasında kayıttan sahte doğrulama ataklarına karşı test edilmiştir. Deneylerde tüm yöntemlerin sahte doğrulama saldırılarından önemli ölçüde etkilendiği gözlenmiştir. Yaptığımız testlerde eşit hata oranları normal sahte doğrulama denemelerinde %0.5-1 aralığındayken, kayıttan sahte doğrulama denemeleriyle %10-25 aralığına yükselmiştir.
77
88

REFERENCES

References: 

Alam, M.J., Kenny, P., Bhattacharya, G.
Stafylakis, T., (2015). Development of CRIM
System for the Automatic Speaker
Verification Spoofing and Countermeasures
Challenge 2015, Proc. of the European
Conference on Speech Communication and
Technology 2015 (INTERSPEECH 2015).
Alegre, F., Janicki, A., Evans, N., (2014). Reassessing
the threat of replay spoofing attacks
against automatic speaker verification, in Proc.
Int. Conf. of the Biometrics Special Interest
Group (BIOSIG), 2014.
Aronowitz, H., (2012). Voice biometrics for user
authentication, Afeka-AVIOS Speech Processing
Conference 2012, Tel-Aviv, Israel, pp. 1-4.
Blouet, R., Mokbel, C., Mokbel, H., Soto, E. S.,
Chollet, G., Greige, H., (2004). Becars: A free
software for speaker verification, Proc. of the
Speaker and Language Recognition Workshop
2004 (ODYSSEY 2004), Toledo, Spain.
Buyuk, O., (2011). Telephone-based Text-
Dependent Speaker Verification, PhD. Thesis,
Bogazici University, Turkey.
Buyuk, O., Arslan, L.M., (2012). Model Selection
and Score Normalization for Text-Dependent
Single Utterance Speaker Verification, Turkish
Journal of Electrical Engineering and Computer
Sciences 20 (sup.2), 1277-1295.
Chen, N., Qian, Y., Dinkel, H., Chen, B., Yu,
K., (2015). Robust Deep Feature for
Spoofing Detection - The SJTU System for
ASVspoof 2015 Challenge, Proc. of the
European Conference on Speech
Communication and Technology 2015
(INTERSPEECH 2015).
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P.,
Ouellet, P., (2011). Front-end factor analysis for
speaker verification, IEEE Transactions on
Audio, Speech, and Language Processing 19 (4),
pp. 788-798.
Ferrer, L., McLaren, M., Scheffer, N., Lei, Y.,
Graciarena, M., Mitra, V., (2013). A noise-robust
system for NIST 2012 speaker recognition
evaluation, Proc. of the European Conference on
Speech Communication and Technology 2013
(INTERSPEECH 2013), Lyon, France, pp. 1981-
1985.
Garcia-Romero, D., Espy-Wilson, C. Y., (2011).
Analysis of i-vector length normalization in
speaker recognition systems, Proc. of the
European Conference on Speech Communication
and Technology 2011 (INTERSPEECH 2011),
Florence, Italy, pp. 249-252.
Hasan, T., Sadjadi, S. O., Liu, G., Shokouhi, N.,
Boril, H., Hansen, J. H., (2013). CRSS systems
for 2012 NIST speaker recognition evaluation,
Proc. of the IEEE International Conference on
Acoustics, Speech and Signal Processing 2013
(ICASSP 2013), Vancouver, Canada, pp. 6783-
6787.
Janicki, A., (2015). Spoofing Countermeasure
Based on Analysis of Linear Prediction
Error, Proc. of the European Conference on
Speech Communication and Technology
2015 (INTERSPEECH 2015).
Kenny, P., (2010). Bayesian speaker verification
with heavy-tailed priors, Proc. of the Speaker and
Language Recognition Workshop 2010
(ODYSSEY 2010), Brno, Czech Republic, pp.
014.
Kenny, P., Stafylakis, T., Alam, J., Oullet, P.,
Kockmann, M. (2014). Joint factor analysis for
text-dependent speaker verification, Proc. of the
Speaker and Language Recognition Workshop
2014 (ODYSSEY 2014), Joensuu, Finland, pp.
200-207.
Larcher, A., Lee, K. A., Ma, B., Li, H. (2013).
“Phonetically constrained PLDA modeling for
text-dependent speaker verification with multiple
short utterances”, Proc. of the IEEE International
87
Mobil metne bağımlı tek cümle konuşmacı tanıma uygulamasında kayıttan sahte doğrulama
Conference on Acoustics, Speech and Signal
Processing 2013 (ICASSP 2013), Vancouver,
Canada, pp. 7673-7677.
Novoselov, S., Pekhovsky, T., Shulipa, A.,
Sholokhov, A., (2014). Text-dependent GMMJFA
system for password based speaker
verification, Proc. of the IEEE International
Conference on Acoustics, Speech and Signal
Processing 2014 (ICASSP 2014), Florence, Italy,
pp. 729-737.
Prince, S. J. D., Elder, J. H., (2007). Probabilistic
linear discriminant analysis for inferences about
identity, Proc. of the IEEE International
Conference on Computer Vision 2007 (ICCV
2007), Rio de Janeiro, Brazil, pp. 1-8.
Reynolds, D.A., Quatieri, T.F., Dunn, R.B., (2000).
Speaker Verification Using Adapted Gaussian
Mixture Models, Digital Signal Processing, 10
(1-3), 19-41.
Sadjadi, S. O., Slaney, M., Heck, L. P., (2013). MSR
identity toolbox: A MATLAB toolbox for
speaker recognition research, version 1.0,
Technical Report, Microsoft Research,
Conversational Systems Research Center
(CSRC), Nov. 2013.
Shang, W., Stevenson, M., (2010). Score
normalization in playback attack detection, Proc.
of the IEEE International Conference on
Acoustics, Speech, and Signal Processing 2010
(ICASSP 2010).
Stafylakis. T., Kenny, P., Ouellet, P., Perez, J.,
Kockmann, M., Dumouchel, P., (2013): IVector/
PLDA variants for text-dependent speaker
recognition, Technical Report, June 2013,
Montreal, CRIM.
Sturim, D., Campbell, W., Dehak, N., Karam, Z.,
McCree, A., Reynolds, D. A., Richardson, F.,
Torres-Carrasquillo, P., Shum, S., (2011). The
MIT LL 2010 speaker recognition evaluation
system: Scalable language-independent speaker
recognition, Proc. of the IEEE International
Conference on Acoustics, Speech and Signal
Processing 2011 (ICASSP 2011), Prague, Czech
Republic, pp. 5272-5275.
Super Monitoring (2013), State of Mobile 2013,
http://www.supermonitoring.com/blog/2013/09/2
3/state-of-mobile-2013-infographic/#tt Son
erişim tarihi: 19 Mart 2014
Young, S., Evermann, G., Gales, M., Hain, T.,
Kershaw, D., Liu, X., Moore, G., Odell, J.,
Ollason, D., Povey, D., Valtchev, V., Woodland,
P., (2006). The HTK Book (for HTK Version
3.4), Cambridge University Engineering
Department.
Wu, Z., Kinnunen, T., Chng, E.S., Li, H.,
Ambikairajah, E., (2012). A study on spoofing
attack in state-of-the-art speaker verification: the
telephone speech case, Proc. of the Asia-Pacific
Signal Information Processing Association
Annual Summit and Conference 2012 (APSIPA
ASC 2012).
Wu, Z., Gao, S., Cling, E. S., Li, H, (2014). A study
on replay attack and anti-spoofing for textdependent
speaker verification, Proc. of the Asia-
Pacific Signal and Information Processing
Association Annual Summit and Conference
2014 (APSIPA ASC 2014).
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J.,
Alegre, F., Li, H (2015a). Spoofing and
countermeasures for speaker verification: a
survey, Speech Communication 66, pp. 130–153.
Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J.,
Hanilci, C., Sahidullah, M., Sizov, A., (2015b).
ASVspoof 2015: the First Automatic Speaker
Verification Spoofing and Countermeasures
Challenge, Proc. of the European Conference on
Speech Communication and Technology 2015
(INTERSPEECH 2015).

Thank you for copying data from http://www.arastirmax.com