Voice identification Using a Composite Haar Wavelets and Proper Orthogonal Decomposition

Journal Name:

Publication Year:

Keywords (Original Language):

Abstract (Original Language): 
In present day business and consumer environment, a robust voice identification system is needed to reduce false positives, and true negatives. In this work, a modified voice identification system is described using over sampled Haar wavelets followed by proper orthogonal decomposition. The audio signal is decomposed using over sampled Haar wavelets. This converts the audio signal into various non-correlating frequency bands. This allows us to calculate the linear predictive cepstral coefficient to capture the characteristics of individual speakers. Adaptive threshold was applied to reduce noise interference. This is followed by multi-layered vector quantization technique to eliminate the interference between multiband coefficients. Finally, proper orthogonal decomposition is used to evaluate unique characteristics for capturing more details of phoneme characters. The proposed algorithm was used on KING and MAT-400 databases. These databases were chosen as previous extraction results were available for them. In the present study, the KING database were trained with three sentences, and tested with two. On the other hand, the MAT-400 database were trained with two seconds of random voice signal, and tested with other two seconds. Results were compared with vector quantization and Gaussian mixture models. The present model gave consistently better performance on speech collected through mouthpieces, but gave comparatively poor performance on audio collected on telephones. The better performance is obtained at the cost of higher computational time.



] Soong F. K., Rosenberg A. E., Rabiner L. R., and Juang B. H., “A vector quantization approach to speaker recognition,”
Proceedings of ICASSP-85, pp. 387-390, March 1985.
[2] Furui S., “Vector-quantization-based speech recognition and speaker recognition techniques,” in Proc. IEEE ICASSP, pp.
954-958, 1991.
[3] He J., Liu L., and Palm G., “A discriminative training algorithm for VQ-Based speaker identification,” IEEE Trans. Speech
Audio Processing, vol. 7, no. 3, pp. 353-356, May 1999.
[4] Linde Y., Buzo A., and Gray R. M., “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol. 20, pp. 84-95,
[5] Furui S., “Comparison of speaker recognition methods using statistical features and dynamic features,” IEEE Trans.
Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 342- 350, June 1981.
[6] Tishby N. Z., “On the application of mixture AR hidden Markov models to text independent speaker recognition,” IEEE
Trans. Signal Process., 39, pp. 563-570, 1991.
[7] Reynolds D. A., and Rose R. C., “Robust text-independent speaker identification using gaussian mixture speaker
models,” IEEE Trans. Speech Audio Processing, vol. 3, no. 1, pp. 72-83, 1995.
[8] Miyajima C., Hattori Y., Tokuda K., Masuko T., Kobayashi T., and Kitamura T., “Text-independent speaker identification
using Gaussian mixture models based on multi-space probability distribution,” IEICE Trans. Inf. & Syst., vol. E84-D, no. 7,
pp. 847-855, July 2001.
[9] Alamo C. M., Gil F. J. C., Munilla C. T., and Gomez L. H., “Discriminative training of GMM for speaker identification,” in
Proc. IEEE ICASSP, pp. 89-92, 1996.
[10] Pellom B. L., and Hansen J. H. L., “An effective scoring algorithm for Gaussian mixture model based speaker
identification,” IEEE Signal Processing Letters, vol. 5, no. 11, pp. 281-284, Nov. 1998.
Voice identification Using a Composite Haar Wavelets and Proper Orthogonal Decomposition
ISSN : 2028-9324 Vol. 4 No. 2, Oct. 2013 358
[11] Z.R. Struzik, “The Wavelet Transform in the Solution to the Inverse Fractal Problem,” Fractals 3, No. 2, pp. 329-350,
[12] Z.R. Struzik, A.P.J.M. Siebes, The Haar Wavelet Transform in the Time Series Similarity Paradigm, in Principles of Data
Mining and Knowledge Discovery, Eds: J.M. Zytkow, J. Rauch, Springer-Verlag, pp. 12-22, 1999.
[13] Pearson, K., “On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine 2, (6):559–572,
[14] Hurase H., and Nayar S., “Visual learning and recognition of 3D objects from appearance,” Int’l J. Computer Vision, vol.
14, pp. 5-24, 1995.
[15] Belhumeur P. N., Hespanha J. P., and Kriegman D. J., “Eigenfaces vs. Fisherfaces: recognition using class specific linear
projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
[16] Martinez A. M., and Kak A. C., “PCA versus LDA,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2,
pp. 228-233, Feb. 2001.
[17] Godfrey J., Graff D., and Martin A., “Public databases for speaker recognition and verification,” in Proc. ESCA Workshop
Automat. Speaker Recognition, Identification, Verification, pp. 39-42, Apr. 1994.
[18] Wang H. C., “MAT – A project to collect Mandarin speech data through telephone networks in Taiwan”, Computational
Linguistics and Chinese language Processing, pp. 73-90, vol. 2, no. 1, 1997.

Thank you for copying data from http://www.arastirmax.com