The often daunting task of collecting and manually labelling biometric databases can be a barrier to research. This is especially true for a new or non-established biometric such as footsteps. The availability of very large data sets often plays a role in the research of complex modelling and normalisation algorithms and so an automatic, semi-unsupervised approach to reduce the cost of manual labelling is potentially of immense value.
This paper proposes a novel, iterative and adaptive approach to the automatic labelling of what is thought to be the first large scale footstep database (more than 10,000 examples across 127 persons). The procedure involves the simultaneous collection of a spoken, speaker-dependent password which is used to label the footstep data automatically via a pre-trained speaker recognition system. Subsets of labels are manually checked by listening to the particular password utterance, or viewing the associated talking face; both are recorded with the same time stamp as the footstep sequence.
Experiments to assess the resulting label accuracy, based on manually labelled subsets, suggest that the accuracy of the automatic labelling is better than 0.1%, and thus sufficient to assess a biometric such as footsteps, which is anticipated to have a much higher error rate.