Poses and gestures are an important part of the nonverbal inter-human communication. In the last years many different methods for estimating poses and gestures in the field of Human-Machine-Interfaces were developed. In this paper for the first time we present an experimental comparison of several re-implemented Neural Network based approaches for a demanding visual instruction task on a mobile system. For the comparison we used several Neural Networks (Neural Gas, SOM, LLM, PSOM and MLP) and a k-Nearest-Neighbourhood classificator on a common data set of images, which we recorded on our mobile robot
under real world conditions. For feature extraction we use Gaborjets and the features of a special histogram on the image. We also compare the results of the different approaches with the results of human subjects who estimated the target point of a pointing pose. The results obtained demonstrate that a cascade of MLPs is best suited to cope with the task and achieves results equal to human subjects.