Speech contains information about the identity of the speaker. A speech signal includes also the language this is spoken, the presence and type of speech pathologies, the physical and emotional state of the speaker. Often, humans are able to extract the identity information when the speech comes from a speaker they are acquainted with.
Lawrence Kersta at the Bell Labs made the first major step from speaker verification by humans towards speaker verifications by computers in the early 1960s where he introduced the term voiceprint for a spectrogram, which was generated by a complicated electro-mechanical device. The voiceprint was matched with a verification algorithm that was based on visual comparison.
The recording of the human voice for speaker recognition requires a human to say something. In other words the human has to show some of his/her speaking behavior. Therefore, voice recognition fits within the category of behavioral biometrics. A speech signal is a very complex function of the speaker and his environment that can be captured easily with a standard microphone. In contradiction to a physical biometric technology such as fingerprint, in speaker recognition are not fixed, no static and no physical characteristics. In speaker recognition there are only information depending on an act.
Voice verification will be a complementary technique for e.g. finger-scan technology as many people see finger scanning as a higher authentication form. In general voice authentication has got a high EER, therefore it is in general not used for identification.
In speaker recognition we differ between low-level and high-level information. High level-information are values like a dialect, an accent, the talking style and the subject manner of context. These features are currently only recognitized and analyzed by humans. As low-level are denoted the information like pitch period, rhythm, tone, spectral magnitude, frequencies, and bandwiths of an individual�s voice. These features are used by speaker recognition systems.
Voice verification works with a microphone or with a regular telephone handset, although performance increases with higher quality capture devices. The hardware costs are very low, because today nearly every PC includes a microphone or it can be easily connected one. However voice recognition has got its problems with persons who are husky or mimic another voice. If this happens the user may not be recognized by the system. Additionally, the likelihood of recognition decreases with poor-quality microphones and if there is background noise. Voice verification will be a complementary technique for e.g. finger-scan technology as many people see finger recognition technology as a higher authentication form. In general voice authentication has got a high EER, therefore it is in general not used for identification. The speech is variant in time, therefore adaptive templates or methods are necessary.