The ability to distinguish faces is natural and essential for humans. Through our rapid biological analysis, we can determine in fractions of a second whether a person is well-intentioned, make an assessment of their mood, or evaluate their emotional state. Face recognition is most commonly associated with applications in the optical field. In recent years, facial analysis has become an active research area in the field of machine vision. In the framework of security technology, facial recognition is a popular feature that is used on the computer as well as on the smartphone and in other areas. In social networks, it is used to tag faces on photos and to link images with the corresponding person. With extracted features, neural networks are also trained to make accurate predictions about age classification, gender, and more. Further fields of application are emotion research, but can also be used to provide conclusions about human diseases. Common and popular data sets on which networks can be trained for the aforementioned tasks in the optical area are e.g. 300-W [
1] with 68 landmark annotations and bounding box initialization. Faces in the UTK dataset vary in pose and expression. The age range of the persons in the dataset varies from 0 to the age of 116 [
2]. Also noteworthy is the Wider Facial Landmarks in-the-wild (WFLW) dataset with 10,000 faces each with 98 fully manual annotated landmarks [
3]. The labeling of the data is a very time-consuming process as the data is usually labelled manually in the first step. RGB images are often used in the data sets, but face analysis in the optical field can also be done with gray images. Various studies have already tried to transfer existing approaches from the optical spectrum to the infrared spectrum. For example, in 2008, Hizem et al. tried to extract facial contours in the near infrared range (NIR) using classical edge detection methods [
4]. Kopaczka et al. have worked with facial landmark detection and face tracking in long wave infrared (LWIR) images. For this purpose, they created an annotated database of thermal face image with 68 facial landmarks. Their face tracking method was then based on an active appearance model (AAM) [
5]. The same author shows in [
6] that a number of introduced algorithms for face detection in the visual spectrum can be trained to work in the thermal spectrum. In [
7] they describe the used manual annotated database and provide an extension for emotion labels for LWIR images. Keong et al. describe a multi-spectral facial landmark detection approach based on a modified version of U-Net. Both face boundary and 68 landmark points could be trained simultaneously [
8]. Our approach differs from the existing methods in that a segmentation approach was chosen for pixel precise detection. Image segmentation is the process of analyzing each individual pixel to identify whether it belongs to a defined class. Therefore, semantic segmentation is the process of combining homogeneous pixels into a defined group. The aim is to divide the image into subareas. It is therefore a matter of separating the image objects from the background and separating the image classes from each other. Due to the predicted result image, we can now examine the found class in more detail by overlaying it over the original image data. In the biomedical field, segmentation can be used for the detection of tumors, tissue damage, or for segmentation of computer tomography images [
9]. In the automotive sector a common task is vehicle, lane and traffic shield detection, and segmentation. Classical methods are edge detection via e.g. Canny or Sobel filtering. A further classical method is the Hough transform for recognizing geometric figures such as circles, ellipses, and straight lines. However, by using artificial neural networks for this task, the probability of failure could be significantly reduced and the process of object recognition could be accelerated considerably. Face segmentation means to divide a face into several categorical classes which can then be addressed individually. The separation of a face into individual subareas enables an area-related and pixel accurate analysis. Especially in the context of the Covid-19 pandemic, the approach could be used as an alternative detection method for temperature extraction in relevant areas of the face (e.g. fever detection in the forehead area). Other approaches work with ROI for automatic face recognition, but calculate the average value over a total area and thus achieve an inaccurate value for further temperature analysis. Using the landmark point methods, only areas where landmarks are defined can be analyzed. In order to determine further points later, the position of the new coordinate would have to be calculated for each analysis in a complex way. The neural approach for infrared analysis presented in this work is based on the image to image translation problem. The trained network works both with single images and in video sequence analysis. The evaluation takes place with appropriate computer hardware in the microsecond range, therefore a live image analysis is also provided. Defined subareas can be detected with high precision and the reference to the face surface temperature can be made.