Top

Published in:

Open Access 2022 | OriginalPaper | Chapter

In-Air Handwriting Recognition Using Acoustic Impulse Signals

Authors : Kai Niu, Fusang Zhang, Xiaolai Fu, Beihong Jin

Published in: Participative Urban Health and Healthy Aging in the Age of AI

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

This paper presents AcousticPAD, a contactless and robust handwriting recognition system that extends the input and interactions beyond the touchscreen using acoustic signals, thus very useful under the impact of the COVID-19 epidemic. To achieve this, we carefully exploit acoustic pulse signals with high accuracy of time of fight (ToF) measurements. Then we employ trilateration localization method to capture the trajectory of handwriting in air. After that, we incorporate a data augmentation module to enhance the handwriting recognition performance. Finally, we customize a back propagation neural network that leverages augmented image dataset to train a model and recognize the acoustic system generated handwriting characters. We implement AcousticPAD prototype using cheap commodity acoustic sensors, and conduct extensive real environment experiments to evaluate its performance. The results validate the robustness of AcousticPAD, and show that it supports 10 digits and 26 English letters recognition at high accuracies.

Supported by the Project funded by China Postdoctoral Science Foundation (No. 2021TQ0048), the National Natural Science Foundation of China under Grants (No. 62172394, No. 61802373), Open Research Fund Program of Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, and the Youth Innovation Promotion Association, Chinese Academy of Sciences (No. 2020109).

1 Introduction

Nowadays, touchscreen technology [1] has been widely used as a way to interact with computer systems. For example, we often use the self-ordering screen in the KFC and McDonald as shown in Fig. 1(a). In Automated Teller Machines (ATMs) information kiosks, we use touchscreen to input password and withdraw cash. However, due to the world is being affected by COVID-19, and this contact-type interaction method will increase the spread of the disease [2]. Imagine that if there is a contact-free interaction method, it will greatly reduce the spread of infections. In this paper, we aim to design a robust contact-free sensing system that can leverage cheap acoustic sensors to achieve accurate and robust input with handwriting in the air.

Some existing contact-free handwriting systems require extra customized devices with high cost (e.g., FMCW radar [3, 4] and Lidar [5, 6]). Some work [7‐10] utilizes WiFi signal which has been proved with severe location-dependent issues. Recent work [11‐13] utilizes acoustic signals embedded in smart devices to enable gesture tracking. FingerIO [11] can accurately track waving hand by transmitting OFDM (Orthogonal Frequency Division Multiplexing) modulated acoustic signals. CAT [14] utilizes acoustic FMCW (Frequency Modulated Continuous Waveform) to develop fine-grained motion tracking systems. However, these systems should tackle complex system delay and sampling frequency offset before use. And the distance measurements are difficult to obtain accurately.

In this paper, we propose AcousticPAD, a contact-free handwriting recognition system based on cheap commodity acoustic sensors. In our solution, we place two acoustic sensors on the corner of a surface and transmit acoustic pulse signals (Fig. 1(b)). By simply setting the signal voltage, the system can accurately measure the echo pulse reflected from user’s hand and further estimate the flight time of the signal. Combined with ToF from two acoustic devices, we track the hand trajectory using the trilateration localization method. To recognize the content of handwriting input, we need to have the classification method and address the challenges of labor-intensive data collection and handwriting at differen locations and orientations in the air. Therefore, we borrow the existing dataset (MINIST) [15] from image recognition field and design a data augmentation technique to enhance the data. Such a well-designed recognition process can not only reduce the time and effort required to manually collect data but also achieve the location and orientation independent handwriting recognition. Experiment results demonstrate our system is able to recognize handwriting of 10 numbers and 26 English letters with high accuracy and robustness. Please find our demo video at the link: https://youtu.be/sCZvK2rUzEU.

The main contributions of the paper are summarized as follows:

We propose a novel contactless handwriting recognition approach, which enables surface-drawn interfaces using acoustic pulse signals. Compared with existing approaches employing FMCW [16] or OFDM [11] acoustic signals, the proposed pulse acoustic signals have the advantages of accurate positioning and low energy consumption.
We develop a series of signal processing techniques to realize the system. Integrated with existing MINIST dataset and proposed data augmentation method, we are the first to demonstrate the possibility of using cross-domain training in a contactless sensing system.
We implement a prototype handwriting recognition system using commodity cheap acoustic devices and conduct evaluations. Evaluation results show that our system is robust against writing location and orientation and the average recognition accuracy of 10 digits and 26 letters is greater than 90%.

2 System Design of AcousticPAD

2.1 System Overview

Figure 2 illustrates the overview about the design of the proposed AcousticPAD system that leverages commercial acoustic sensors to transmit/receive pulse signal. AcousticPAD mainly consists of two modules: Real-Time Handwriting Acquisition and Position-Independent Handwriting Recognition. In Real-Time Handwriting Acquisition module, AcousticPAD collects sound signal from acoustic sensor and recovers the trajectory of handwriting. Then Position-Independent Handwriting Recognition module leverages the augmented MNIST/EMNIST dataset [15] to train a Back Propagation (BP) neural network and recognize the handwritings including digits and letters position-independently¹.

2.2 Real-Time Handwriting Acquisition

In this subsection, we first introduce the generation of acoustic pulse signal in our system. Then we acquire the handwriting characters in a contactless manner following steps of: distance measurement, character segmentation and hand tracking.

Trigger to Transmit and Receive Sound Pulse Signal. In AcousticPAD, two HC-SR04 acoustic sensors [17] are employed to transmit and receive sound pulse signal. The sensor contains four pins, i.e., VCC, TRIG, ECHO and GND. All the pins are connected to the Raspberry Pi [18]. When AcousticPAD works, Raspberry Pi supplies 5 V and 0 V voltage to the VCC and GND pins of sensor, respectively. Once a TTL (Transistor-Transistor Logic) pulse signal that lasts at least 10 ${\upmu }\text {s}$ is sent to the TRIG pin, the sensor automatically transmits 8 pulses signal at 40 KHz frequency and raises ECHO pin from low-level voltage to high-level voltage. Then the sensor monitors the echo-back signal. If the echo-back signal is received, the sensor converts the ECHO pin from high-level voltage to low-level voltage. Thus the time duration of high-level voltage in ECHO pin is the time of flight (ToF) for ultrasonic signal.

Distance Measurement. Suppose that the time for transition of the ECHO pin voltage is $T_s$ (from low-level to high-level) and $T_f$ (from high-level to low-level), the ToF of ultrasonic signal is the difference between $T_s$ and $T_f$. Thus the distance from the sensor to user’s hand can be denoted as:

$$\begin{aligned} d = \frac{c \cdot (T_f-T_s)}{2} \end{aligned}$$

(1)

where c is the speed of ultrasound signal in the air. For both acoustic sensors, the distances can be measured as the input of subsequent steps.

Real-Time Character Segmentation. With accurate distance measurements, we segment the consecutive handwriting inputs in realtime. As shown in Fig. 3, when the user’s hand appears in the sensing area, the measured distance from two sensors denoted as $d_1$ and $d_2$ will decrease. This is because without sensing target, the sensors measure a long distance in the environment that are outside of the sensing area. While completing the character input, the measured distance increases after user’s hand gets away from the sensing area. Thus we can set a threshold based on the sensing area to detect whether user’s hand in the sensing area to segment the handwriting in realtime. When the distances of both acoustic sensors are less then a specific threshold, AcousticPAD starts to localize user’s hand and track handwriting trajectory. Otherwise, the user finishes his input and AcousticPAD outputs the tracking results and feeds it to the recognition module.

Hand Tracking. We design a trilateration localization approach to localize user’s hand and track handwriting trajectory. As is shown in Fig. 3, two acoustic sensors are deployed at the corners of the input panel (e.g., table surface). The distance s between two sensors is set in advance. With distance measurements from hand to sensors, the three edges of triangle OAB constituted by sensors and user’s hand are known. According to the law of cosines [19], the angle of user’s hand $\theta $ with respect to OA satisfies:

$$\begin{aligned} \theta = \arccos \frac{s^2+d_1^2-d_2^2}{2sd_1} \end{aligned}$$

(2)

Thus the postiion of user’s hand is:

$$\begin{aligned} \left\{ \begin{array}{lr} x=d_1 cos\theta \\ y=d_1 sin\theta \end{array} \right. \end{aligned}$$

(3)

For each distance measurement, AcousticPAD can obtain the position of user’s hand. After successively connecting all the discrete positions during input, we are able to aquire handwriting performed by the user. Before feeding the handwriting into recognition module, we employs Savitzky-Golay filter [20] to smooth the handwriting. The filter utilizes linear least square method to fit successive subset of adjacent positions with a polynomial. After filtering out the noise, we obtain the final handwriting trajectory as the input of recognition module.

2.3 Position-Independent Handwriting Recognition

To achieve the position-independent handwriting recognition, we divide the recognition module into offline phase to train a cross-domain model with datasets from exisiting imaging dataset and online recognition with contactless acoutic system generated handwriting dataset.

Offline Phase: Different from existing work that needs to collect a large number of samples to build dataset and train a classification model, AcousticPAD leverages existing handwriting datasets, i.e., MNIST and EMNIST [15], as the training datasets without using acoustic sensing system generated data, which requires zero data collection effort. We notice that the data samples in MNIST are specific with certain style. If we directly utilize the original datasets to train a model, the model is position dependent with low generalization ability. To solve this problem, we propose a data augmentation approach to enhance the original datasets with wide applicability. Specifically, we first transform the character image samples with different rotation angles. Assume that the orientation of original image samples is $0^\circ $, the images are duplicated, converted from $0^\circ $ to $90^\circ $ (anticlockwise) and $-90^\circ $ (clockwise) with a step size of $15^\circ $. Secondly, we move the image samples in both vertical and horizontal directions with a step size of 14 pixels. After these operations, the augmented dataset is $65\times $ larger than the original ones and contains image samples in different positions.

As shown in Fig. 2, we utilize the augmented dataset to train a three-layer BP neural network, which includes a input layer, a hidden layer and an output layer. Due to the size of images is $28\times 28$ pixels, we convert them into a vector $1\times 784$ as input. Thus the number of nodes in the input layer is 784. And we set the number of nodes in the hidden layer as 500. The number of nodes in the output layer is 10/26 for digit/letter recognition, respectively. During the training process, the image data are fed in the forward direction by propagating from the input layer through the hidden layer to the output layer, we can calculate the error between output value and expected value of output layer. Then the backward propagation of errors are applied to modify the connected weight values with learning rate 0.1. In our cases, all the samples in the augmented datasets are used to train the model, while the data collected by AcousticPAD system are for testing. Benefiting from the data augmentation, the trained model is robust against position changes and can achieve position-independent handwriting recognition.

Online Phase: For online phase, AcousticPAD takes the acoustic sensing system generated handwriting trajectory as input and leverages the model trained in offline phase to classify the handwriting characters. AcousticPAD employs a camera to record the real-time video of user’s handwriting as ground truth. We develop a web based front end. After the user performs handwriting, the corresponding recognition result is displayed on the web page. AcousticPAD automatically segments the successive characters so that the user can continuously input numbers or letters using this system.

3 Evaluation

3.1 Experiment Setup

We have implemented AcousticPAD with two commodity cheap acoustic sensors (i.e., HC-SR04) [17] ($5\$$ per unit), and a Raspberry Pi 4B module [18]. The acoustic sensors transmit/receive acoustic pulse signals with 40 KHz. The operation voltage and current are DC 5 V and 15 mA, respectively. The trigger input signal is 10 ${\upmu }\text {s}$ TTL pulse. We develop a web-based user interface to demonstrate the handwriting recognition in real time, as shown in Fig. 4. The demo video can be found at: https://youtu.be/sCZvK2rUzEU.

3.2 Performance Evaluation

Hand Tracking. We first evaluate the hand tracking results. The volunteer performs hand gestures including 10 digits (0–9) and 26 letters (A-Z) in front of a table, and two small acoustic devices are placed at the table corners. As shown in Fig. 5, we show examples of all the digit and letter trajectories captured by our contactless handwriting system. It can be seen that our system is able to track the details of hand positions for writing all the digits and letters. With different angles of straight-line strokes and curves, the obtained trajectories can perfectly match the shapes of the characters.

Character Recognition Accuracy. Then we evaluate the character recognition accuracy. We let 10 participants perform handwriting experiments. For each character, the participants repeat 3 times with 5 different positions and 5 orientations, which construct a sample set with 27,000 samples in total (10 participants and 36 characters). We use the data augmented MNISIT dataset to train the BP neural network. All the acoustic sensing system collected data are used to test the recognition performance. Figure 6 illustrates the confusion matrix achieved by AcousticPAD for each character. The overall recognition accuracy for digits and letters are 92% and 90.3%, respectively. While the accuracy of some characters is relatively low (about 87%), for example, ‘3’, ‘Z’, this might be because the users perform ‘3’ and ‘Z’ with some irregular handwriting style which makes easier to confuse with other characters.

Position Independent Recognition. To demonstrate the position independent capability, we conduct the handwriting of all characters at 5 different locations and 5 orientations, and compare the performance with BP neutral model [21] without data augmentation. Figure 7 shows the comparison results of accuracy for two methods. Our methods achieve above 86% accuracy for both five locations and orientations. Compared with the model without data augmentation, our method improves accuracy for more than 32% on average, which also demonstrates the robustness of AcousticPAD system.

4 Conclusion

In this paper, we propose AcousticPAD, a contactless handwriting recognition system that employs acoustic pulse signals to enable input interactions with cheap acoustic sensors. AcousticPAD captures acoustic signals when hand moves on the table surface, and then track the handwriting trajectory to recognize the characters. We have implemented AcousticPAD as a real-time recognition system and conducted comprehensive experiments to validate the effectiveness and robustness of the system. Our results demonstrate high input recognition accuracies across different users with position-independent ability.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

previous chapter Understanding the Knowledge, Perception and Uptake of Contraception in Nigeria: A Case Study of Saye-Zaria

next chapter Novel Interactive BRAINTEASER Tools for Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS) Management

Position independent refers to the user can perform the handwriting without location and orientation dependency.

Polancos, R.V., Ruiz, J.M.B., Subang, E.A.I.: User experience study on touchscreen technology: a case study on automated payment machines. In: 2020 IEEE 7th International Conference on Industrial Engineering and Applications (ICIEA), pp. 710–714 (2020)

Yi, C., Yang, Q., Scoglio, C.: Understanding the effects of the direct contacts and the indirect contacts on the epidemic spreading among beef cattle farms in southwest kansas. BioRxiv (2020)

Wang, Y., Ren, A., Zhou, M., Wang, W., Yang, X.: A novel detection and recognition method for continuous hand gesture using FMCW radar. IEEE Access 8, 167264–167275 (2020)CrossRef

Zhang, Z., Tian, Z., Zhou, M.: Latern: Dynamic continuous hand gesture recognition using FMCW radar sensor. IEEE Sens. J. 18(8), 3278–3289 (2018)CrossRef

Jiang, F., Zhang, S., Wu, S., Gao, Y., Zhao, D.: Multi-layered gesture recognition with Kinect. J. Mach. Learn. Res. 16(1), 227–254 (2015)MathSciNetMATH

Zhang, L., et al.: BoMW: bag of manifold words for one-shot learning gesture recognition from Kinect. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2562–2573 (2018)CrossRef

Niu, K., et al.: WiMorse: a contactless Morse code text input system using ambient WiFi signals. IEEE Internet Things J. 6(6), 9993–10008 (2019)CrossRef

Wu, D., et al.: FingerDraw: sub-wavelength level finger motion tracking with WiFi signals. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4(1), 1–27 (2020)

Niu, K., Zhang, F., Wang, X., Lv, Q., Luo, H., Zhang, D.: Understanding WiFi signal frequency features for position-independent gesture sensing. IEEE Trans. Mob. Comput. 1 (2021)

10.

Niu, K., Wang, X., Zhang, F., Zheng, R., Yao, Z., Zhang, D.: Rethinking Doppler effect for accurate velocity estimation with commodity WiFi devices. IEEE J. Selected Areas Commun. 1 (2022)

11.

Nandakumar, R., Iyer, V., Tan, D., Gollakota, S.: Fingerio: using active sonar for fine-grained finger tracking. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 1515–1525 (2016)

12.

Wang, W., Liu, A.X., Sun, K.: Device-free gesture tracking using acoustic signals. In: Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking, ser. MobiCom 2016, pp. 82–94 (2016)

13.

Wu, K., Yang, Q., Yuan, B., Zou, Y., Ruby, R., Li, M.: Echowrite: an acoustic-based finger input system without training. IEEE Trans. Mob. Comput. 20(5), 1789–1803 (2021)CrossRef

14.

Mao, W., He, J., Qiu, L.: Cat: high-precision acoustic motion tracking. In: Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking, pp. 69–81 (2016)

15.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef

16.

Cai, C., Pu, H., Hu, M., Zheng, R., Luo, J.: SST: software sonic thermometer on acoustic-enabled IoT devices. IEEE Trans. Mob. Comput. 1 (2020)

17.

Ultrasonic distance sensor - hc-sr04 (2017). https://www.sparkfun.com/products/15569

18.

Halfacree, G.: Raspberry Pi 4 now comes with 2 GB Ram minimum. MagPi 91, 6–8 (2020). Accessed 28 May 2020

19.

Pickover, C.A.: The Math Book: from Pythagoras to the 57th Dimension, 250 Milestones in the History of Mathematics. Sterling Publishing Company Inc (2009)

20.

Schafer, R.: What is a savitzky-golay filter? [lecture notes]. IEEE Signal Process. Mag. 4(28), 111–17 (2011)CrossRef

21.

Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)MATH

Title: In-Air Handwriting Recognition Using Acoustic Impulse Signals
Authors: Kai Niu
Fusang Zhang
Xiaolai Fu
Beihong Jin
Publisher: Springer International Publishing
Book: Participative Urban Health and Healthy Aging in the Age of AI
Print ISBN: 978-3-031-09592-4

Electronic ISBN: 978-3-031-09593-1

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-3-031-09593-1_25

Springer Professional