Evaluation of Hybrid Face and Voice Recognition Methods for use as an effective identification mechanism in Biometric Security Systems. Abstract – Biometric Identification is a mandatory tool to secure digital
information for various industrial, government, commercial and security
applications including Airport Security, Finance Sector, and Armed Forces etc.
Face recognition is a unique problem and to date there is no single solution applicable to all
Face recognition is not effective in
identifying individuals in situations where a person is wearing glasses, hat or has a beard
etc. Alternative technologies like Iris and retina scan biometric techniques
need sophisticated equipment which are not financially viable for all
applications. Fingerprint scanner Voice recognition methods have low accuracy
and affected by situations where a change in a person’s voice due illness like a cold render absolute identification inaccurate. The objective of this paper is to propose a biometric method implementing
multiple techniques i.e.,
both FACE RECOGNITION and VOICE RECOGNITION techniques as an effective identification tool. Identification
process using a combination of two or
more biometric methods would
make a foolproof security system, thereby leaving no scope for error. A thorough review of the performance accuracy of the several
algorithms implemented in face and
voice recognition technique
would help us provide the best hybrid method. The proposed Hybrid Biometric
Identification Method can be used in industries or areas with need for high
security of their data systems.
Keywords ?- Face Recognition, Voice
recognition, Principle Component Analysis, Eigen faces, Eigenvectors, Feature
Extraction, Feature Matching, Mel
Frequency Cepstral Coefficient
(MFCC), Dynamic Time Warping, Support Virtual Machines (SVM), Neural Networks.
1. Introduction Today, Biometric Identification
is a mandatory tool to secure
digital information for various industrial applications including Airport
Security, Finance Sector, and Armed Forces etc. Face recognition is one of the common biometric tools used for many
security applications. 1.1.
Face Recognition method identifies an
individual by comparing features obtained from a live image or photograph to a digital copy. This method gained
emphasis due to significant improvements in required algorithms, accessibility
to facial image databases, and
ease of analyzing performance of
Facial Recognition Algorithms.
Algorithms widely used are Principal Component Analysis (PCA) 1 2, and Fisher Linear Discriminant Analysis (FLD) or Linear Discriminant Analysis 3. 1.2. Voice Recognition or Speaker Recognition is a proven biometric method used to identify a person based on their voice features and has
been developed over latter half of the 21st century. This method uses the
variation of acoustic factors to identify an individual’s speech pattern as no
two individuals have the same.
Every individual’s speech pattern has
unique acoustic factors depending on their physical anatomy i.e., shape and size of the individual’s mouth, throat and
communication patterns like voice pitch, accent, and speaking style. The
acoustic factors are the key to speaker identification using this method and
hence Speaker Recognition is a key behavioral biometric.
Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) are two important techniques used in analysis of Voice Recognition
Algorithms. Linear Predictive Coding
(LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN), nonparametric Algorithms use
MFCC. Non-linear Sequence Alignment Algorithm depends on DTW technique.
The objective of this paper is to study the above mentioned algorithms as biometric identification
tool using Face or Voice Recognition. The process involved in identifying an individual
with the fore mention biometric methods and best possible alternatives are
discussed in the following
Biometric Identification Process using
both Face and Voice Recognition There are three main steps involved in identifying an individual using Face or
Voice Recognition methods. Figure
1 shows a flow chart of the Biometric
Identification following are Figure 1.
Flow Chart of Biometric Identification
Process using both Face and Voice Recognition. 2.1.
Feature Extraction Feature extraction for Face Recognition is obtained from a 2-Dimensional image based on the shape and texture factors. Feature extraction for Voice Recognition is obtained from a
1-Dimensional signal depending on factors like frequency, amplitude etc. These
factors are later classified using the best suited algorithms which use
different algorithms considering the varying dimension factors.
There are a few common classification
algorithms used for both Face and Voice Recognition and
are explained in the following section. 2.2. Feature Selection Feature Selection is a sequential process used to identify unique features for an analysis of the
sample individual data in face and
voice recognition methods. In Face recognition method characteristics like size and shape of a
nose, eye brows, lips, etc.
In voice recognition it evaluates the
tone, pitch and the frequency of the individual sample voice. 1.1 Principal Component Analysis (PCA): PCA Algorithm uses statistical
methods to identify a new face by generating new parameters (Eigen Faces and Eigenvectors of a set of faces). PCA analyzes and identifies parameters of new
face to the set of faces (existing dataset) by generating new set of distinct
In Human Facial Recognition new images
are analyzed using Eigen Faces which are a collection of Eigen Vectors. The This method of using Eigen faces and
Eigen Vectors for facial identification was established by Sirovich and Kirby (1987) and integrated into face classification
techniques by Turk and Pentland (1981). The performances of the PCA were
88.7% in noisy circumstances and 94.5% in non-noisy circumstances respectively.
3 PCA Algorithm helps to create a better solution for identifying new faces. This
algorithm is simple to use and can be implemented quickly with high learning
potential to train the software for the most efficient
performance. Based on the available statistical research, PCA
technique was 88.7% accurate for facial identification in a noisy environment and 94.5
% accurate in a non-noisy environment.
1.2 Fisher Linear
Discriminant Analysis (FLD): Fisher Linear Discriminant Analysis or Linear Discriminant Analysis (LDA) is method applied in statistics, pattern recognition and machine learning to obtain a linear data arrangements of
parameters defining two or more classes of data or event.
The result of this data analysis is
applicable as a linear classifier or a key factor in dimensionality reduction
prior to feature classification. Both LDA and PCA Algorithms are correlated as
they are linear methods utilizing matrix multiplication and transformations for
data recognition. In PCA Analysis, the transformation is
determined by minimizing mean square
error between original data vectors and data vectors calculated for the reduced dimensionality the data vectors.
There is no difference in class in the PCA
Analysis whereas in LDA Analysis, the transformation is
determined by maximizing a ratio of “between-class variance” to
“within-class variance” in order to lower data variation in the same class and increase in the separation between the classes. Based on research studies,
the performance accuracy for face recognition data analysis using FLD method for a given data set was determined to be 93.8% for noisy and
95.5% for non-noisy cases respectively 3. ALGORITHM ACCURACY NOISY NON-NOISY
1.PCA 88.7% 94.5% 2.FLD 93.8% 95.8% Table 01 : Accuracy Comparison for PCA and
FLD Overall, it is observed that FLD analysis has a relatively deterministic
advantage over PCA Analysis for use in Facial Recognition Classification. 2.
Voice Identification Algorithms: Voice Recognition Technology is dependent on two critical factors which
are Digital processing of speech
signal and voice recognition
algorithms. Voice identification is achieved by analyzing the various features present in a speech form obtained for each sample data 5. Human voice characteristics like gender, emotional state are essential to
identify the speaker 6.
Based on research data typically, Voice Recognition Algorithms use Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping
(DTW) techniques for processing digital
voice signals. The steps involved in processing digital voice signals to
obtain voice features are a. human
voice is converted into digital signal form to produce digital data b.
digitized data or speech samples are then proceeded further by using MFCC to produce voice features c. Voice features are made to send through DTW to select the pattern that
matches the databases. The above two
techniques are mainly used to solve the voice recognition based
These algorithms mainly considers two
important phases – Training phase and Operation / testing phase as researched and analyzed in 8. Mel-frequency cepstral coefficients (MFCC) Algorithm – This method is used to extract voice features by deriving MFCCs. In
sound processing, the Mel-frequency
cepstrum (MFC) represents the short-term power spectrum of a
sound wave. MFC is defined as a collection of MFC Coefficients.
These MFCCs are obtained from 6+a type of cepstral representation of the
audio clip (a nonlinear “spectrum-of-a-spectrum”). The frequency bands in
MFC Spectrum are equally spaced on the
Mel scale, which closely determines the human auditory system responses. This frequency warping enables enhanced identification of the sound, for example, in audio compression. The process of deriving MFCCs is explained 4.
The resulting features extracted using
the algorithm are analyzed using the DTW Algorithm for voice matching
(feature matching). Dynamic Time Warping
(DTW) – This Algorithm is
essential for identifying an input voice by matching the voice feature to
existing databases. The feature matching process is achieved by implementing Dynamic Programming 9.
In this method, a comparison of two
similar time series which may
vary in time, speed and identify
the common factors. Using this process of comparison, the sample voice is
matched to the existing voice dataset to give the result. The important
principle of the DTW is to compute two
dynamic patterns and measure its similarity by calculating the minimum distance between them 10.
CLASSIFICATION AlGORITHMS : a) Support Vector Machines (SVM) : Support Vector Machines are supervised learning models used in machine learning for
analysis of data to use in classification
and regression analysis. An SVM training algorithm develops a model from a given set of
sample data which belong to one or two classes by using a non-probabilistic binary
A SVM Model represents the sample data as points in space,
mapped so that the samples of
different data groups are delineated by a clear gap. b) Neural Networks:
Neural Networks is an important area of Artificial Intelligence (AI),
developed to mimic the neural responses of the human brain, develop data structures and algorithms for
learning and classification of data.
The day to day tasks humans perform
naturally, like the recognition of a known person visually or vocally, is a
complex process for a Computer System using regular programming
methods. When Neural Networks techniques are implemented in a computer program,
the system can be trained using examples to create an internal structure of rules to classify different inputs, in this case facial recognition images.
c) Support vector machines (SVM) and Artificial neural
networks (ANN) COMPARISION OF PERFORMANCE
ACCURACY FOR FACIAL AND VOICE RECOGNITION
ALGORITHMS Author Algorithm Purpose
Accuracy Uddin1 Neural Networks. Facial Expression Recognition 88.5% Jixin Li2 Support Vector Machine.(Gaussian kernel) An Empirical Comparison between SVMs
and ANNs for Speech Recognition 65%
Jixin Li2 Neural Networks An
Empirical Comparison between SVMs and ANNs for Speech Recognition 83% RA Nugrahaeni3 Support Vector Machine Comparative Analysis for voice
recognition 90% · Jennifer Huang4 Support Vector Machine Face Recognition Using Component-Based
SVM Classification 85% Omaima N. A.
AL-Allaf5 Neural Networks REVIEW OF FACE DETECTION SYSTEMS BASED
ARTIFICIAL NEURAL NETWORKS ALGORITHMS 94% Based on available research data, the performance accuracy of the
different algorithms reviewed in this paper. SVM and ANN techniques are used in solving the Image classification
problem. Initially, the image is divided into many sub-images based on their corresponding image features.
The sub-images are classified into the corresponding class using the ANN. SVM
is then used to categorize the results obtained from ANN 14. The two areas Artificial Neural Network (ANN) and
Support Vector Machine (SVM) are used in Intrusion Detection, a key component of
secure information systems. SVM is a key Machine Learning algorithm used for intrusion detection due to their effective generalization characteristics
and the ability to minimize dimensionality.
Intrusion detection is an important part of secure information systems. SVM and ANN are essential for the process of finding key input features in developing an intrusion detection system (IDS). This is established by removing
the redundant inputs thereby simplifying the problem to help generate an
efficient detection process with successful results.
SVM and neural network
based IDSs provide effective performance by use of reduced number of features 11. SVMs can successfully achieve a non-linear classification using a method called the kernel trick, this is
achieved by discreetly mapping their inputs into
high-dimensional feature spaces Supervised learning is not possible when available data is not identified.
In this case, an unattended learning
approach needs to be used, by identifying natural clustering of the data to groups, and connect this data to these new groups. The clustering algorithm essential to analyze this data group and
provide backend to the support vector
machines is called support vector clustering. This technique is widely used in industrial applications when data are not labeled or when data are partially labeled during
presorting for a classification purpose.
PROPOSED METHOD ? Using Multi-Biometric
technique which includes Face Recognition
and Voice Recognition as well would help us a
lot more when it comes to security of highly confidential
information. ? The Individual biometric techniques couldn’t make it up to the
mark. ? But, the usage of this both could be beneficial. CONCLUSION and FUTURE
WORKS This research is all based on both the Face and Voice Recognition techniques being used
together, famously called as “HYBRID FACE AND VOICE RECOGNITION
SYSTEMS”. We are supposed to use the strong technologies for both the
technologies in order to be practically possible.
Using them both together would be really
useful when they are implemented in the real world scenario, where there is no security and confidentiality assured and
we can take it to another level using them both to be a part
of Multi-Biometrics technique.