1 Introduction
2 Related work on social touch recognition
2.1 Touch surface and sensors
Paper | Touch surface | Sensor(s) | Touch recognition of... | n | Classifier | Design | Accuracy |
---|---|---|---|---|---|---|---|
Altun and MacLean [1] | Haptic Creature | Force sensing resistors, accelerometer | 26 Gestures | 31 | Random forest | Between-subjects |
\(33~\%\)
|
Altun and MacLean [1] | Haptic Creature | Force sensing resistors, accelerometer | 9 Emotions | 31 | Random forest | Between-subjects |
\(36~\%\)
|
Altun and MacLean [1] | Haptic Creature | Force sensing resistors, accelerometer | 9 Emotions | 31 | Random forest | Within-subjects |
\(48~\%\)
|
Altun and MacLean [1] | Haptic Creature | Force sensing resistors, accelerometer | 9 Emotions using gesture recog. | 31 | Random forest | Between-subjects |
\(36~\%\)
|
Bailenson et al. [2] | Force-feedback joystick | 2d accelerometer | 7 Emotions | 16 | Classification by human | 1 Subject rates 1 other |
\(33~\%\)
|
Bailenson et al. [2] | Force-feedback joystick | 2d accelerometer | 7 Emotions | 16 | SVM\(^a\) RBF\(^b\) kernel | Between-subjects |
\(36~\%\)
|
Bailenson et al. [2] | Other subject’s hand | / | 7 Emotions | 16 | Classification by human | 1 Subject rates 1 other |
\(51~\%\)
|
Chang et al. [5] | Haptic Creature | Force sensing resistors | 4 Gestures | 1 | Custom recognition software | Real-time | Up to \(77~\%\)
|
Cooney et al. [6] | Sponge (humanoid) robot | Accelerometer, gyro sensor | 13 Full-body gestures | 21 | SVM\(^a\) RBF\(^b\) kernel | Between-subjects |
\(77~\%\)
|
Cooney et al. [7] | Humanoid robot ‘mock-up’ | Photo-interrupters | 20 Full-body gestures | 17 | k-NN\(^c\)
| Between-subjects |
\(63~\%\)
|
Cooney et al. [7] | Humanoid robot ‘mock-up’ | Photo-interrupters | 20 Full-body gestures | 17 | SVM\(^a\) RBF\(^b\) kernel | Between-subjects |
\(72~\%\)
|
Cooney et al. [7] | Humanoid robot ‘mock-up’ | Microsoft Kinect | 20 Full-body gestures | 17 | k-NN\(^c\)
| Between-subjects |
\(67~\%\)
|
Cooney et al. [7] | Humanoid robot ‘mock-up’ | Microsoft Kinect | 20 Full-body gestures | 17 | SVM\(^a\) RBF\(^b\) kernel | Between-subjects |
\(78~\%\)
|
Cooney et al. [7] | Humanoid robot ‘mock-up’ | Photo-interrupters, Microsoft Kinect | 20 Full-body gestures | 17 | k-NN\(^c\)
| Between-subjects |
\(82~\%\)
|
Cooney et al. [7] | Humanoid robot ‘mock-up’ | Photo-interrupters, Microsoft Kinect | 20 Full-body gestures | 17 | SVM\(^a\) RBF\(^b\) kernel | Between-subjects |
\(91~\%\)
|
Flagg et al. [9] | Furry lap pet | Conductive fur sensor, piezoresistive fabric pressure sensor | 9 Gestures | 16 | Neural network | Between-subjects |
\(75~\%\)
|
Flagg et al. [9] | Furry lap pet | Conductive fur sensor, piezoresistive fabric pressure sensor | 9 Gestures | 16 | Logistic regression | Between-subjects |
\(72~\%\)
|
Flagg et al. [9] | Furry lap pet | Conductive fur sensor, piezoresistive fabric pressure sensor | 9 Gestures | 16 | Bayes network | Between-subjects |
\(68~\%\)
|
Flagg et al. [9] | Furry lap pet | Conductive fur sensor, piezoresistive fabric pressure sensor | 9 Gestures | 16 | Random forest | Between-subjects |
\(86~\%\)
|
Flagg et al. [9] | Furry lap pet | Conductive fur sensor, piezoresistive fabric pressure sensor | 9 Gestures | 16 | Random forest | Within-subjects |
\(94~\%\)
|
Flagg et al. [10] | Fur sensor | Conductive fur sensor | 3 Gestures | 7 | Linear regression | Between-subjects |
\(82~\%\)
|
Ji et al. [20] | KASPAR (hand section) | Capacitive pressure sensors | 4 Gestures | 1 | SVM\(^a\) intersection kernel | Within-subject | Up to \(96~\%\)
|
Ji et al. [20] | KASPAR (hand section) | Capacitive pressure sensors | 4 Gestures | 1 | SVM\(^a\) RBF\(^b\) kernel | Within-subject | Up to \(93~\%\)
|
Jung [22] | Mannequin arm | Piezoresistive fabric pressure sensors | 14 Gestures | 31 | Bayesian classifier | Subject-independent |
\(53~\%\)
|
Jung [22] | Mannequin arm | Piezoresistive fabric pressure sensors | 14 Gestures | 31 | SVM\(^a\) linear kernel | Subject-independent |
\(46~\%\)
|
Jung et al. [23] | Mannequin arm | Piezoresistive fabric pressure sensors | 14 Rough gestures | 31 | Bayesian classifier | Subject-independent |
\(54~\%\)
|
Jung et al. [23] | Mannequin arm | Piezoresistive fabric pressure sensors | 14 Rough gestures | 31 | SVM\(^a\) linear kernel | Subject-independent |
\(53~\%\)
|
Kim et al. [26] | KaMERo | Charge-transfer touch sensors, accelerometer | 4 Gestures | 12 | Temporal decision tree | Real-time |
\(83~\%\)
|
Knight et al. [27] | Sensate bear | Electric field sensor, capacitive sensors | 4 Gestures | 11 | Bayesian networks + k-NN\(^c\)
| Real-time | 20–\(100~\%\)
|
Nakajima et al. [29] | Emoballoon | Barometric pressure sensor, microphone | 6 Gestures + ‘no touch’ | 9 | SVM\(^a\) RBF\(^b\) kernel | Between-subjects |
\(75~\%\)
|
Nakajima et al. [29] | Emoballoon | Barometric pressure sensor, microphone | 6 Gestures + ‘no touch’ | 9 | SVM\(^a\) RBF\(^b\) kernel | Within-subjects |
\(84~\%\)
|
Naya et al. [30] | Sensor sheet | Pressure-sensitive conductive ink | 5 Gestures | 11 | k-NN\(^c\) + Fisher’s linear discriminant | Between-subjects |
\(87~\%\)
|
Silvera-Tawil et al. [31] | Sensor sheet | Pressure sensing based on EIT\(^d\)
| 6 Gestures | 1 | Logitboost algorithm | Within-subject |
\(91~\%\)
|
Silvera-Tawil et al. [31] | Sensor sheet | Pressure sensing based on EIT\(^d\)
| 6 Gestures | 35 | Logitboost algorithm | Between-subjects |
\(74~\%\)
|
Silvera-Tawil et al. [31] | Experimenter’s back | / | 6 Gestures | 35 | Classification by human | Between-subjects |
\(86~\%\)
|
Silvera-Tawil et al. [32] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 8 Gestures + ‘no touch’ | 2 | Logitboost algorithm | Within-subjects |
\(88~\%\)
|
Silvera-Tawil et al. [32] | Experimenter’s arm | / | 8 Gestures | 2 | Classification by human | Within-subjects |
\(75~\%\)
|
Silvera-Tawil et al. [32] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 8 Gestures + ‘no touch’ | 40 | Logitboost algorithm | Subject-independent |
\(71~\%\)
|
Silvera-Tawil et al. [32] | Other subject’s arm | / | 8 Gestures | 40 | Classification by human | 1 Subject rates 1 other |
\(90~\%\)
|
Silvera-Tawil et al. [33] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 6 Emotions + ‘no touch’ | 2 | Logitboost algorithm | Within-subjects |
\(88~\%\)
|
Silvera-Tawil et al. [33] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 6 Social messages + ‘no touch’ | 2 | Logitboost algorithm | Within-subjects |
\(84~\%\)
|
Silvera-Tawil et al. [33] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 6 Emotions + ‘no touch’ | 2 | Logitboost algorithm | Between-subjects |
\(32~\%\)
|
Silvera-Tawil et al. [33] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 6 Social messages + ‘no touch’ | 2 | Logitboost algorithm | Between-subjects |
\(51~\%\)
|
Silvera-Tawil et al. [33] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 6 Emotions + ‘no touch’ | 42 | Logitboost algorithm | Subject-independent |
\(47~\%\)
|
Silvera-Tawil et al. [33] | Other subject’s arm | / | 6 Emotions | 42 | Classification by human | 1 Subject rates 1 other |
\(52~\%\)
|
Silvera-Tawil et al. [33] | Mannequin arm | Pressure sensing based on EIT\(^d\), force sensor | 6 Social messages + ‘no touch’ | 42 | Logitboost algorithm | Subject-independent |
\(50~\%\)
|
Silvera-Tawil et al. [33] | Other subject’s arm | / | 6 Social messages | 42 | Classification by human | 1 Subject rates 1 other |
\(62~\%\)
|
Stiehl et al. [35] | The Huggable (arm section) | Electric field sensor, force sensors, thermistors | 8 Gestures (disregarding ‘slap’) | 1 | Neural network | Within-subject |
\(79~\%\)
|
van Wingerden et al. [38] | Mannequin arm | Piezoresistive fabric pressure sensors | 14 Rough gestures | 31 | Neural network | Between-subjects |
\(64~\%\)
|
2.2 Touch recognition
3 CoST: corpus of social touch
3.1 Touch gestures
Gesture label | Gesture definition |
---|---|
Grab | Grasp or seize the arm suddenly and roughly |
Hit | Deliver a forcible blow to the arm with either a closed fist or the side or back of your hand |
Massage | Rub or knead the arm with your hands |
Pat | Gently and quickly touch the arm with the flat of your hand |
Pinch | Tightly and sharply grip the arm between your fingers and thumb |
Poke | Jab or prod the arm with your finger |
Press | Exert a steady force on the arm with your flattened fingers or hand |
Rub | Move your hand repeatedly back and forth on the arm with firm pressure |
Scratch | Rub the arm with your fingernails |
Slap | Quickly and sharply strike the arm with your open hand |
Squeeze | Firmly press the arm between your fingers or both hands |
Stroke | Move your hand with gentle pressure over arm, often repeatedly |
Tap | Strike the arm with a quick light blow or blows using one or more fingers |
Tickle | Touch the arm with light finger movements |
3.2 Pressure sensor grid
3.3 Data acquisition
3.3.1 Setup
3.3.2 Procedure
3.3.3 Participants
3.4 Data preprocessing
Variant | Gentle | Normal | Rough | All |
---|---|---|---|---|
Mean pressure (g/cm\(^2\)) | 115 (61) | 136 (82) | 189 (157) | 147 (112) |
Max pressure (g/cm\(^2\)) | 894 (511) | 1260 (629) | 1983 (813) | 1379 (802) |
Contact area (% of sensor) | .21 (.16) | .22 (.18) | .26 (.21) | .23 (.19) |
Duration (ms) | 1385 (1303) | 1377 (1257) | 1500 (1351) | 1421 (1305) |
Gesture | Grab | Hit | Massage | Pat | Pinch | Poke | Press | Rub | Scratch | Slap | Squeeze | Stroke | Tap | Tickle |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean pressure | 349 (191) | 101 (32) | 172 (77) | 100 (33) | 126 (45) | 95 (27) | 188 (99) | 131 (45) | 106 (28) | 95 (30) | 286 (180) | 116 (35) | 92 (30) | 96 (26) |
Max pressure | 1774 (919) | 1643 (854) | 1621 (800) | 1057 (568) | 1701 (892) | 1258 (793) | 1660 (802) | 1282 (671) | 1064 (524) | 1165 (557) | 1980 (946) | 1135 (623) | 1055 (610) | 911 (497) |
Contact area | .59 (.17) | .15 (.05) | .36 (.20) | .15 (.07) | .12 (.08) | .08 (.08) | .23 (.16) | .21 (.10) | .17 (.08) | .15 (.06) | .47 (.24) | .20 (.08) | .12 (.07) | .18 (.09) |
Duration | 1373 (715) | 337 (403) | 3538 (1898) | 709 (753) | 1132 (597) | 650 (502) | 1181 (608) | 2170 (1142) | 2205 (1268) | 321 (462) | 1502 (813) | 1722 (829) | 564 (486) | 2491 (1446) |
3.5 Descriptive statistics
3.6 Self reports
4 Recognition of social touch gestures
Variant | Gentle | Normal | Rough | All |
---|---|---|---|---|
Male | ||||
Mean pressure (g/cm\(^2\)) | 117 (63) | 137 (85) | 193 (163) | 149 (117) |
Max pressure (g/cm\(^2\)) | 885 (518) | 1245 (629) | 1981 (828) | 1370 (811) |
Contact area (% of sensor) | .21 (.16) | .22 (.18) | .27 (.22) | .23 (.19) |
Duration (ms) | 1358 (1296) | 1349 (1249) | 1491 (1357) | 1399 (1303) |
Female | ||||
Mean pressure (g/cm\(^2\)) | 112 (50) | 130 (72) | 175 (133) | 139 (96) |
Max pressure (g/cm\(^2\)) | 925 (485) | 1310 (624) | 1990 (763) | 1409 (772) |
Contact area (% of sensor) | .20 (.15) | .21 (.17) | .24 (.20) | .21 (.17) |
Duration (ms) | 1477 (1325) | 1476 (1281) | 1528 (1330) | 1494 (1312) |
4.1 Feature extraction
-
Mean pressure is the mean over channels and time (1).
-
Maximum pressure is the maximum value over channels and time (2).
-
Pressure variability is the mean over time of the sum over channels of the absolute value of difference between two consecutive frames (3).
-
Mean pressure per row is the mean over columns and time resulting in one feature per row which are in the direction of the mannequin arm’s length (from top to bottom, 4–11).
-
Mean pressure per column is the mean over rows and time resulting in one feature per column which are in the direction of the mannequin arm’s width (from left to right, 12–19).
-
Contact area per frame is the fraction of channels with a value above \(50~\%\) of the maximum value. Mean contact area is the mean over time of contact area (20) and the maximum pressure contact area is the contact area of the frame with the highest mean pressure over channels (21). The size of the contact area indicated whether the whole hand was used for a touch gesture, as would be expected for grab, or for example only one finger, as would be expected for a poke.
-
Temporal peak count indicated how many times there was a significant increase in pressure level that is, whether a touch gesture consisted of continuous touch contact as would be expected for grab or alternating pressure levels which would be expected for a tickle. One feature counts the number of frames for which the average pressure of a frame was larger than that of its neighboring frames (22). (This feature replaced the previous version of features 22 from [22, 23]). The other feature was calculated as the number of positive crossings of the threshold. The threshold was the mean over time of the pressure summed over all channels (23).
-
Traveled distance (previously called ‘displacement’ in [22, 23]) indicated the amount of movement of the hand across the contact area. For example, for a squeeze less movement across the sensor grid would be expected than for a stroke. Center of mass (i.e., the average channel weighted by pressure) was used to calculate the movement on the contact surface in both the row and column direction. Two features were calculated in the row direction: the mean traveled distance of the center of mass over time (24) and the summed absolute difference of the center of mass over time (25). The same features were calculated for the column direction (26–27).
-
Duration of the gesture measured in frames (28).
-
Pressure distribution (previously called ‘histogram-based features’ in [38]) is the normalized histogram over all channels and time of the pressure values. The histogram contains eight bins equally spaced between 0 and 1023 (29–36).
-
Spatial peaks (previously called ‘motion-based features’ in [38]). Spatial peaks A spatial peak in a frame is a local maximum with a value higher than 0.75 of the the maximum pressure (see feature 2). The following features were derived from the local peaks; the mean (37) and variance (38) over time of the number of spatial peaks per frame. Also the mean over all spatial peaks and time of the distance of the spatial peak to the center of mass is a feature (39). The last feature based on spatial peaks is the mean over time and spatial peaks of the chance in distance of each peak w.r.t. the center of mass (40).
-
Derivatives were calculated as the mean absolute pressure differences within the rows and columns between frames. Features were derived from the mean over time and rows or columns of the above values (41–42). Also the mean absolute pressure difference for all channels was calculated. The last feature was based on the mean over time and channels (43).
-
Variance over channels and time (44).
-
Direction of movement indicated the angle in which the center of mass was moving between frames. These angle values were divided into quadrants of \(90{^{\circ }}\) each. For example, if the hand moves from the middle of the sensor grid to the upper right corner of the sensor grid, the center of mass moves at a \(45{^{\circ }}\) angle which falls within the upper right quadrant (i.e., the first quadrant). To deal with vectors that were close to the edge of two quadrants two points around the vector were evaluated, each weighting 0.5. A histogram represented the percentage of frames that fell into each quadrant (45–48).
-
Magnitude of movement indicated the amount of movement of the center of mass. Statistics on the magnitude were calculated per gesture consisting of the mean, standard deviation, sum, and the range (49–52).
-
Periodicity was the frequency with the highest amplitude in the frequency spectrum of the movement of the center of mass in the row and column direction, respectively (53–54).
4.2 Classification experiments
4.2.1 Bayesian classifier
4.2.2 Decision tree
4.2.3 Support vector machine
4.2.4 Neural network
4.3 Results
Variant | ||||
---|---|---|---|---|
All | Normal | Gentle | Rough | |
Classifier | ||||
Bayesian | .57 (.11) | .59 (.13) | .52 (.14) | .58 (.12) |
Decision tree | .48 (.10) | .49 (.13) | .43 (.10) | .52 (.10) |
SVM linear | .59 (.11) | .60 (.11) | .54 (.13) | .62 (.13) |
SVM RBF | .60 (.11) | .60 (.11) | .54 (.13) | .62 (.12) |
Neural network | .59 (.12) | .58 (.13) | .52 (.13) | .59 (.13) |
5 Discussion
5.1 Classification results and touch gesture confusion
Participant | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bayesian | .42 |
.58
|
.78
|
.70
|
.71
| .45 |
.58
|
.66
|
.56
|
.75
|
.66
|
.54
|
.60
| .37 | .47 | .41 |
.65
|
.58
|
.50
|
.51
|
.56
|
.58
| .34 |
.66
| .41 |
.69
|
.60
|
.54
|
.55
|
.58
|
.58
|
Decision tree | .39 | .48 |
.65
|
.63
|
.58
| .43 |
.52
|
.55
|
.52
|
.67
|
.54
| .47 | .48 | .30 | .39 | .29 |
.62
| .48 | .45 | .41 |
.50
|
.52
| .23 |
.56
| .36 |
.54
| .48 | .42 |
.55
|
.50
|
.51
|
SVM linear |
.52
|
.58
|
.83
|
.73
|
.76
|
.52
|
.60
|
.65
|
.58
|
.78
|
.71
|
.63
|
.59
| .41 | .48 | .44 |
.58
|
.65
|
.57
|
.52
|
.56
|
.67
| .33 |
.65
| .44 |
.70
|
.62
| .44 |
.55
|
.63
|
.63
|
SVM RBF |
.51
|
.58
|
.82
|
.72
|
.76
|
.50
|
.61
|
.68
|
.56
|
.80
|
.72
|
.63
|
.62
| .42 | .42 | .47 |
.65
|
.65
|
.60
|
.52
|
.56
|
.66
| .37 |
.65
| .44 |
.72
|
.65
| .46 |
.58
|
.65
|
.65
|
Neural network | .48 |
.59
|
.79
|
.74
|
.73
| .47 |
.63
|
.66
|
.59
|
.82
|
.71
|
.61
|
.62
| .44 | .40 | .47 |
.61
|
.63
|
.57
| .49 |
.56
|
.65
| .32 |
.67
| .46 |
.74
|
.64
| .49 |
.54
|
.63
|
.62
|
Actual class | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Grab | Hit | Massage | Pat | Pinch | Poke | Press | Rub | Scratch | Slap | Squeeze | Stroke | Tap | Tickle | |
Predicted class | ||||||||||||||
Grab |
397
| 0 | 17 | 1 | 11 | 1 | 31 | 4 | 4 | 0 |
177
| 2 | 0 | 0 |
Hit | 1 |
317
| 0 | 45 | 1 | 15 | 1 | 0 | 0 |
77
| 3 | 1 | 45 | 0 |
Massage | 4 | 0 |
386
| 2 | 1 | 1 | 0 |
63
| 26 | 1 | 14 | 11 | 1 | 26 |
Pat | 8 |
58
| 1 |
268
| 1 | 2 | 4 | 1 | 22 |
59
| 0 | 12 |
149
| 18 |
Pinch | 3 | 4 | 6 | 1 |
398
| 27 | 25 | 8 | 8 | 0 |
66
| 1 | 6 | 3 |
Poke | 1 | 27 | 0 | 11 |
68
|
438
| 50 | 0 | 2 | 4 | 3 | 1 | 40 | 5 |
Press | 19 | 4 | 0 | 7 | 30 | 25 |
374
| 17 | 1 | 7 | 23 | 6 | 8 | 2 |
Rub | 0 | 0 |
78
| 2 | 1 | 0 | 8 |
239
|
56
| 0 | 2 |
98
| 0 | 40 |
Scratch | 6 | 0 | 7 | 5 | 0 | 0 | 2 | 50 |
274
| 0 | 0 | 12 | 0 |
92
|
Slap | 0 |
77
| 0 |
70
| 2 | 0 | 0 | 2 | 1 |
358
| 0 | 14 | 44 | 1 |
Squeeze |
117
| 0 | 15 | 0 | 38 | 0 | 50 | 1 | 2 | 0 |
268
| 1 | 0 | 0 |
Stroke | 0 | 1 | 28 | 8 | 1 | 0 | 2 |
125
| 34 | 3 | 0 |
383
| 4 | 15 |
Tap | 0 |
68
| 0 |
131
| 6 | 46 | 11 | 2 | 1 | 48 | 0 | 2 |
248
| 16 |
Tickle | 2 | 2 | 19 | 6 | 0 | 3 | 0 | 45 |
127
| 1 | 1 | 12 | 13 |
339
|
Sum | 558 | 558 | 557 | 557 | 558 | 558 | 558 | 557 | 558 | 558 | 557 | 556 | 558 | 557 |
Feature (no.) | Frequency |
---|---|
Mean pressure of the 7th sensor row (10) | 31 |
Summed traveled in column direction (27) | 30 |
Average spatial peak distance to center off mass (39) | 30 |
Overall mean pressure difference between frames (43) | 30 |
Highest pressure contact area (21) | 27 |