Technical Procedure
In the area of speaker recognition differentiates between high-level and LowLevel information. High-level information is information such as dialect, the accent, the way is spoken and how the subjective constitution of the speaker is. Low-level information are values such as pitch, period, rhythm, sound, spectral magnitude, frequency, frequency and bandwidth of the voice of the user. These features are speaker verification systems used to detect. High-level information, however, are used to detect people. Speaker recognition captures the characteristics such as rhythm, pitch and frequency of each individual. For inclusion of the language is a microphone or a regular phone. For improving the quality of the input device (microphone, telephone) increases the recognition accuracy.
The variation of characteristics caused by different persons arises is Interstate spokesman called variance. The Inter-speaker variance is caused by different characteristics in different speakers. The intra-speaker variance occurs when a speaker from the same word or the same sentence several times pronounce, but it is not with the same emphasis, the same tone of voice can repeat. Another type of intra-speaker variance arises when a spokesman same word or pronounce the same sentence, but it is not with the same emphasis or in the same way by trying to attempt again. The intra-speaker variance speech contains several speeds, the emotional state of the speaker and the ambient noise. The intra-speaker variance is the main reason for the weak performance of biometric speaker recognition systems. Therefore, it is desirable that the parameters be such that they are lower intra speaker variance and a high-Inter spokesman variance TISSUE. In many applications for speaker recognition, it is possible to the intra-speaker variance to reduce, by the user is asked to deposit the reference data set to repeat the same text or the same words. This is the case of text-dependent Sprecherveri? Kationsverfahren. There are a number of procedural approaches for speaker recognition. The methods of speaker recognition can be divided into text-dependent (static text, or fixed-phrase systems) and text-independent methods categories. The text-dependent recognition based on a previously filed and the text system known phrase of the speaker. The text-independent speaker recognition, however, based on a completely unbound text phrase, the system deposited in the text phrase may differ. Text Independent methods usually require more training data as text-based methods. Text-dependent methods are generally more accurate than text-independent procedures and require the cooperation of the user.
Under the following names are in the literature procedure approaches to speaker recognition to find:
- Dynamic Time Warping (DTW)
- Vector quantization (VQ)
- Neural Networks (NN)
- Hidden Markov Model (HMM)
- Gaussian mixed model (GMM), in conjunction with maximum likelihood estimation (text independent)
- Likelihood Normalization
- Multi-variant auto-regression models (MAR) called.
Since 1975, the method of the Hidden Markov Modeling (known as HMM) method, named after the Russian mathematician AA Markov, in the field of textunabh�¤ngigen speaker recognition has become very popular. In this method, the statistical variance of the spectral characteristics gemessen762. From a wide variety of training utterances is a model calculated that the same consequences of feature vectors can produce, how they used in the analysis of training references have been found.
The Dynamic Time Warping, a dynamic process of normalization time, based on a comparison of characteristics of a reference sample and a speech, where both are the same, isolated spoken word. There the length of the two comparative samples to be different. The algorithm searches along a prescribed Pfadbereiches the optimal timing Vergleichspfad between test and Referenz�¤u�¿erung. They are from beginning to end totalised frequenzm�¤�Ÿigen differences in the analysis parameters of reference and test signal determined.
When the method of Vektorquantisierung (VQ) is the voice signal is interpreted as a set of feature vectors, the essential characteristics of the speaker represent. The feature vectors are in an encrypted code book and in order to optimize a training procedure. Over the last few years, the method of Gaussian mixture model (GMM methods) growing in the field of textunabh�¤ngigen methods enforced. The GMM method describes a generic model with multivariate probability densities, which allow arbitrary densities to describe. This method allows the general interpretation of the Speaker-dependent spectral shapes. The procedures for textunabh�¤ngigen Sprecherverkation make longer an active field of research, because the low recognition accuracy, a significant restriction for the diffusion of these systems means.