Introduction
-
Designing a voice database for the five regional accents of Urdu and English spoken in Pakistan.
-
Including accents variations in the designed database so that it could be used to design robust Speaker Recognition Systems (SRS) based on this dataset.
-
Database design to particularly help banking sector applications in Pakistan.
-
Database design based on a questionnaire/script containing question answers particularly asked by bank representatives for authentication during voice calls in case of lost/theft of credit/debit cards or any other query with the banks in Pakistan.
-
Testing KNN, RF, SVM and ANN algorithms on the designed dataset.
Related research
Database
A regional voice dataset collection with accent variation
Districts | Accent | Language | Participants | ||
---|---|---|---|---|---|
Male | Female | Total | |||
Gilgit, Hunza, Nagar, Ghizer, Astore, Diamer, Skardu, Ghanche, Shigar, Kharmang | Shina | English and Urdu | 33 | 33 | 66 |
Skardu, Shigar, Kharmang. Ghanche | Balti | English and Urdu | 21 | 21 | 42 |
Gilgit, Hunza, Nagar, Ghizer, | Burushiski | English and Urdu | 18 | 18 | 36 |
Hunza, Ghizer | Wakhi | English and Urdu | 9 | 9 | 18 |
Hunza, Ghizer | Khuwar | English and Urdu | 9 | 9 | 18 |
Total participants 180 |
Age distribution of the speakers
Design of script for speakers
S. No | Authentication questions by a phone banking officers | Recorded customer’s response |
---|---|---|
1 | Asalamualaikum | Asalamualaikum |
2 | What is your Name? | My name is: ——— |
3 | Where are you calling from ? | I am calling from: ——— |
4 | What is your father’s name? | My father name is: ——— |
5 | What is your mother’s name? | My mother name is: ——— |
6 | What is your National Identity Card (NIC)? | My NIC number is: ——— |
7 | What is your postal address? | My postal address is: ——— |
8 | What is your mobile number? | My mobile number is: ——— |
9 | Is your mobile registered with this bank? | Yes/No: ——— |
10 | What is your current location? | My current location is: ——— |
11 | What is your account number? | My account number is: ——— |
12 | What is your debt card number? | My debt card number is: ——— |
13 | What is your credit card number? | My credit card number is: ——— |
14 | What is the expiry date of your debit card? | The expiry date of my debit card is: ——— |
15 | What is the expiry date of your credit card? | The expiry date of my credit card is: ——— |
16 | What is the secret code of your debit card? | The security code of my debit card is: ——— |
17 | What is the secret code of your credit card? | The security code of my credit card is: ——— |
18 | What is the expiry date of your NIC? | The expiry date of my NIC card is: ——— |
19 | What is your occupation? | My occupation is: ——— |
20 | What is your Date of Birth? | My Date of Birth is: ———— |
Voice data recording environment and device allocation
Recording sessions
Recording of voice samples
Methodology
Voice samples preprocessing
Features extraction
Classification models
Performance evaluation measurements
Results and discussion
ML Models | Parameters |
SVM | Kernal= polynomial, Seeds=1 |
RF | Trees= 100, Bag size = 100 |
KNN | K= 3, Distance = Euclidian |
ANN | Learning Rate = 0.3, Momentum = 0.2, Epochs = 500, |
Input nodes = 19, Hidden layers = 01, Hidden units = 10 |
Performance measures | Classifier models | |||
---|---|---|---|---|
ANN | SVM | KNN | RF | |
Accuracy | 88.53% | 85.54% | 86.11% | 85.28% |
RMSE | 0.032 | 0.074 | 0.0384 | 0.0499 |
Precision | 0.889 | 0.845 | 0.869 | 0.841 |
Recall | 0.885 | 0.855 | 0.861 | 0.853 |
F-Measure | 0.886 | 0.835 | 0.862 | 0.861 |
Performance measures | Classifier models | |||
---|---|---|---|---|
ANN | SVM | KNN | RF | |
Accuracy | 86.58% | 81.75% | 83.03% | 81.12% |
RMSE | 0.0346 | 0.074 | 0.0424 | 0.0517 |
Precision | 0.869 | 0.829 | 0.840 | 0.800 |
Recall | 0.866 | 0.818 | 0.830 | 0.811 |
F-Measure | 0.865 | 0.817 | 0.821 | 0.813 |
Comparison with the other systems in literature
-
Rizwan et al. [25] applied SVM on TIMIT dataset and achieved 77.8% recognition accuracy.
-
Danao et al. [27] applied MLP on their own designed Philippine dataset and achieved 56.19% recognition accuracy.
-
Shah et al. [10] applied MLP on their own developed Poshtu speakers’ dataset and achieved 87.5% recognition accuracy.
-
Liu et al. [55] developed an MFCC-based text-independent speaker identification system for access control. In their system, along with MFCC features, they used Gaussian Mixture Models (GMM) as a classifier. Their system achieved overall 86.87% identification accuracy.