Skip to main content

2022 | Buch

Image Analysis and Processing. ICIAP 2022 Workshops

ICIAP International Workshops, Lecce, Italy, May 23–27, 2022, Revised Selected Papers, Part II

herausgegeben von: Pier Luigi Mazzeo, Emanuele Frontoni, Prof. Stan Sclaroff, Cosimo Distante

Verlag: Springer International Publishing

Buchreihe: Lecture Notes in Computer Science


Über dieses Buch

The two-volume set LNCS 13373 and 13374 constitutes the papers of several workshops which were held in conjunction with the 21st International Conference on Image Analysis and Processing, ICIAP 2022, held in Lecce, Italy, in May 2022.
The 96 revised full papers presented in the proceedings set were carefully reviewed and selected from 157 submissions.
ICIAP 2022 presents the following Sixteen workshops:
Volume I:
GoodBrother workshop on visual intelligence for active and assisted livingParts can worth like the Whole - PART 2022Workshop on Fine Art Pattern Extraction and Recognition - FAPERWorkshop on Intelligent Systems in Human and Artificial Perception - ISHAPE 2022Artificial Intelligence and Radiomics in Computer-Aided Diagnosis - AIRCADDeep-Learning and High Performance Computing to Boost Biomedical Applications - DeepHealth
Volume II:
Human Behaviour Analysis for Smart City Environment Safety - HBAxSCESBinary is the new Black (and White): Recent Advances on Binary Image ProcessingArtificial Intelligence for preterm infants’ healthCare - AI-careTowards a Complete Analysis of People: From Face and Body to Clothes - T-CAPArtificial Intelligence for Digital Humanities - AI4DHMedical Transformers - MEDXFLearning in Precision Livestock Farming - LPLFWorkshop on Small-Drone Surveillance, Detection and Counteraction Techniques - WOSDETCMedical Imaging Analysis For Covid-19 - MIACOVID 2022Novel Benchmarks and Approaches for Real-World Continual Learning - CL4REAL



Human Behaviour Analysis for Smart City Environment Safety - HBAxSCES

A Framework for Forming Middle Distance Routes Based on Spatial Guidelines, Perceived Accessibility and Visual Cues in Smart City

This study is devoted to ways of forming routes taking into account external natural and artificial factors, and the perception of such factors by the traveler. The route is built based on preliminary knowledge of an area and is updated with information obtained during travel. Orientation along the route is carried out using signs (waymarks, billboards) and other natural and artificial references which clarify preliminary knowledge of the area. The final and intermediate targets along the route are determined by spatial objects—“points of interest” or “points of attraction”—which are either chosen in advance or occur unpredictably during movement along the route. At the same time, the available accuracy, completeness and degree of relevance of local maps do not always provide the information necessary for travelers. The interface of route creation acts as an intermediary between the preliminary idea of the route and the observable external environment. The interface can supplement incomplete or unavailable information; it helps to search for appropriate objects based on given attributes. Currently digital applications are often used as such interfaces. Objects on the route are constantly changing their properties over time—both according to a previously known schedule and as a result of random events. The appearance of unexpected obstacles, and sudden changes in lighting and weather conditions, force travelers to significantly change their routes and choose new route options. The framework can be used both for optimizing navigation and tourism services and for preparing project designs for landscaping and development of suburban terrains.

Margarita Zaleshina, Alexander Zaleshin
A Survey on Few-Shot Techniques in the Context of Computer Vision Applications Based on Deep Learning

This review article about Few-Shot Learning techniques is focused on Computer Vision Applications based on Deep Convolutional Neural Networks. A general discussion about Few-Shot Learning is given, featuring a context-constrained description, a short list of applications, a description of a couple of commonly used techniques and a discussion of the most used benchmarks for FSL computer vision applications. In addition, the paper features a few examples of recent publications in which FSL techniques are used for training models in the context of Human Behaviour Analysis and Smart City Environment Safety. These examples give some insight about the performance of state-of-the-art FSL algorithms, what metrics do they achieve, and how many samples are needed for accomplishing that.

Miguel G. San-Emeterio
Decision-Support System for Safety and Security Assessment and Management in Smart Cities

Counter-terrorism and its preventive and response actions are crucial factors in security planning and protection of mass events, soft targets and critical infrastructures in urban environments. This paper presents a comprehensive Decision Support System developed under the umbrella of the S4AllCitites project, that can be integrated with legacy systems deployed in the Smart Cities. The system includes urban pedestrian and vehicular evacuation, considering ad-hoc predictive models of the evolution of incendiary and mass shooting attacks in conjunction with a probabilistic model for threat assessment in case of improvised explosive devices. The main objective of the system is to provide decision support to public or private security operators in the planning and real time phases in the prevention or intervention against a possible attack, providing information on evacuation strategies, the probability or expected impact of terrorist threats and the state of the traffic network in normal or unusual conditions allowing the emergency to be managed throughout its evolution.

Javier González-Villa, Arturo Cuesta, Marco Spagnolo, Marisa Zanotti, Luke Summers, Alexander Elms, Anay Dhaya, Karel Jedlička, Jan Martolos, Deniz Cetinkaya
Embedded Intelligence for Safety and Security Machine Vision Applications

Artificial intelligence (AI) has experienced a recent increase in use across a wide variety of domains, such as image processing for security applications. Deep learning, a subset of AI, is particularly useful for those image processing applications. Deep learning methods can achieve state-of-the-art results on computer vision for image classification, object detection, and face recognition applications. This allows to automate video surveillance reducing human intervention.At the same time, although deep learning is a very intensive task in terms of computing resources, hardware and software improvements have emerged, allowing embedded systems to implement sophisticated machine learning algorithms at the edge. Hardware manufacturers have developed powerful co-processors specifically designed to execute deep learning algorithms. But also, new lightweight open-source middleware for constrained resources devices such as EdgeX foundry have emerged to facilitate the collection and processing of data at sensor level, with communication capabilities to cloud enterprise applications.The aim of this work is to show and describe the development of Smart Camera Systems within S4AllCities H2020 project, following the edge approach.

Panagiotis Lioupis, Aris Dadoukis, Evangelos Maltezos, Lazaros Karagiannidis, Angelos Amditis, Maite Gonzalez, Jon Martin, David Cantero, Mikel Larrañaga
Supporting Energy Digital Twins with Cloud Data Spaces: An Architectural Proposal

The concept of Digital Twins offers the possibility of moving work from a physical environment to a virtual or digital environment and the ability to predict asset conditions in the future, or when it is physically undesirable, by exploiting the digital model. This in turn leads to significant reductions in the resources required to design, produce and maintain assets and resources. In the field of energy management, DTs are also starting to be considered as valuable analysis tools, as a digital twin facilitates real-time synchronisation between a real-world model (physical model) and its virtual copy for improved energy monitoring, prediction, and efficiency enhancement; thus, it can significantly reduce the overall energy consumption. A typical problem of DTs is the management of the data to be fed from the physical twin to the DT (and possibly the other way around), as one has to decide whether to store them within the DT or not, and one also has to decide whether to use different (depending on the data sources) or unified data governance models. To this end, an energy data space is proposed to allow the management of the necessary data in a way that is more functional to the DT concept.

Chiara Rucco, Antonella Longo, Marco Zappatore
High-Level Feature Extraction for Crowd Behaviour Analysis: A Computer Vision Approach

The advent of deep learning has brought in disruptive techniques with unprecedented accuracy rates in so many fields and scenarios. Tasks such as the detection of regions of interest and semantic features out of images and video sequences are quite effectively tackled because of the availability of publicly available and adequately annotated datasets. This paper describes a use case scenario with a deep learning models’ stack being used for crowd behaviour analysis. It consists of two main modules preceded by a pre-processing step. The first deep learning module relies on the integration of YOLOv5 and DeepSORT to detect and track down pedestrians from CCTV cameras’ video sequences. The second module ingests each pedestrian’s spatial coordinates, velocity, and trajectories to cluster groups of people using the Coherent Neighbor Invariance technique. The method envisages the acquisition of video sequences from cameras overlooking pedestrian areas, such as public parks or squares, in order to check out any possible unusualness in crowd behaviour. Due to its design, the system first checks whether some anomalies are underway at the microscale level. Secondly, It returns clusters of people at the mesoscale level depending on velocity and trajectories. This work is part of the physical behaviour detection module developed for the S4AllCities H2020 project.

Alessandro Bruno, Marouane Ferjani, Zoheir Sabeur, Banafshe Arbab-Zavar, Deniz Cetinkaya, Liam Johnstone, Muntadher Sallal, Djamel Benaouda

Binary is the New Black (and White): Recent Advances on Binary Image Processing

A Simple yet Effective Image Repairing Algorithm

A 2D binary image is well-composed if it does not contain $$2\times 2$$ 2 × 2 blocks of two diagonal black and two diagonal white pixels, called critical configurations. Some image processing algorithms are simpler on well-composed images. The process of transforming an image into a well-composed one is called repairing.We propose a new topology-preserving approach, which produces two well-composed images starting from an image I depending on the chosen adjacency (vertex or edge adjacency), in the same original square grid space as I. The size of the repaired images depends on the number and distribution of the critical configurations. A well-composed image I is not changed, while in the worst case the size increases at most two times (or four times if we want to preserve the aspect ratio). The advantage of our approach is in the small size of the repaired images, with a positive impact on the execution time of processing tasks. We demonstrate this experimentally by considering two classical image processing tasks: contour extraction and shrinking.

Lidija Čomić, Paola Magillo
A Novel Method for Improving the Voxel-Pattern-Based Euler Number Computing Algorithm of 3D Binary Images

As an important topological property of a 3D binary image, the Euler number can be calculated by counting certain 2  ×  2  ×  2 voxel patterns in the image. This paper presents a novel method for improving the voxel-pattern-based Euler number computing algorithm of 3D binary images. In the proposed method, by changing the accessing order of voxels in 2 ×  2 × 2 voxel patterns and combining the voxel patterns which provide the same Euler number increments for the given image, the average numbers of voxels to be accessed for processing a 2 × 2 × 2 voxel pattern can be decreased from 8 to 4.25, which will lead to an efficient processing. Experimental results demonstrated that the proposed method is much more efficient than the conventional voxel-pattern-based Euler number computing algorithm.

Bin Yao, Dianzhi Han, Shiying Kang, Yuyan Chao, Lifeng He
Event-Based Object Detection and Tracking - A Traffic Monitoring Use Case -

Traffic monitoring is an important task in many scenarios, in urban roads to identify dangerous behavior and on-highway to check for vehicles moving in the wrong direction. This task is usually performed using conventional cameras but these sensors suffer from fast illumination changes, particularly at night, and extreme weather conditions. This paper proposes a solution for object detection and tracking using event-based cameras. This new technology presents many advantages to address traditional cameras limitations; the most evident are the high dynamic range and temporal resolution. However, due to the different nature of the provided data, solutions need to be implemented to process them in an efficient way. In this work, we propose two solutions for object detection, one based on standard geometrical approaches and one using a deep learning framework. We also release a novel dataset for this task, and present a complete application for road monitoring using event cameras (Dataset available at: ).

Simone Mentasti, Abednego Wamuhindo Kambale, Matteo Matteucci
Quest for Speed: The Epic Saga of Record-Breaking on OpenCV Connected Components Extraction

Connected Components Labeling (CCL) represents an essential part of many Image Processing and Computer Vision pipelines. Given its relevance on the field, it has been part of most cutting-edge Computer Vision libraries. In this paper, all the algorithms included in the OpenCV during the years are reviewed, from sequential to parallel/GPU-based implementations. Our goal is to provide a better understanding of what has changed and why one algorithm should be preferred to another both in terms of memory usage and execution speed.

Federico Bolelli, Stefano Allegretti, Costantino Grana
An Efficient Run-Based Connected Component Labeling Algorithm for Processing Holes

This article introduces a new connected component labeling and analysis algorithm framework that is able to compute in one pass the foreground and the background labels as well as the adjacency tree. The computation of features (bounding boxes, first statistical moments, Euler number) is done on-the-fly. The transitive closure enables an efficient hole processing that can be filled while their features are merged with the surrounding connected component without the need to rescan the image. A comparison with State-of-the-Art shows that this new algorithm can do all these computations faster than all existing algorithms processing foreground and background connected components or holes.

Florian Lemaitre, Nathan Maurice, Lionel Lacassagne
LSL3D: A Run-Based Connected Component Labeling Algorithm for 3D Volumes

Connect Component Labeling (CCL) has been a fundamental operation in Computer Vision for decades. Most of the literature deals with 2D algorithms for applications like video surveillance or autonomous driving. Nonetheless, the need for 3D algorithms is rising, notably for medical imaging.While 2D CCL algorithms already generate large amounts of memory accesses and comparisons, 3D ones are even worse. This is the curse of dimensionality. Designing an efficient algorithm should address this problem. This paper introduces a segment-based algorithm for 3D labeling that uses a new strategy to accelerate label equivalence processing to mitigate the impact of higher dimensions. We claim that this new algorithm outperforms State-of-the-Art algorithms by a factor from $$\times $$ × 1.5 up to $$\times $$ × 3.1 for usual medical datasets and random images.

Nathan Maurice, Florian Lemaitre, Julien Sopena, Lionel Lacassagne

Artificial Intelligence for Preterm Infants’ HealthCare - AI-Care

Deep-Learning Architectures for Placenta Vessel Segmentation in TTTS Fetoscopic Images

Twin-to-Twin Transfusion Syndrome (TTTS) is a rare pregnancy pathology affecting identical twins, which share both the placenta and a network of blood vessels. Sharing blood vessels implies an unbalanced oxygen and nutrients supply between one twin (the donor) and the other (the recipient). Endoscopic laser ablation, a fetoscopic minimally invasive procedure, is performed to treat TTTS by restoring a physiological blood supply to both twins lowering mortality and morbidity rates. TTTS is a challenging procedure, where the surgeons have to recognize and ablate pathological vessels having a very limited view of the surgical size. To provide TTTS surgeons with context awareness, in this work, we investigate the problem of automatic vessel segmentation in fetoscopic images. We evaluated different deep-learning models currently available in the literature, including U-Net, U-Net++ and Feature Pyramid Networks (FPN). We tested several backbones (i.e. ResNet, DenseNet and DPN), for a total of 9 experiments. With a comprehensive evaluation on a novel dataset of 18 videos (1800 frames) from 18 different TTTS surgeries, we obtained a mean intersection-over-union of $$0.63 \pm 0.19$$ 0.63 ± 0.19 using U-Net++ model with DPN backbone. Such results suggest that deep-learning may be a valuable tool for supporting surgeons in vessel identification during TTTS.

Alessandro Casella, Sara Moccia, Ilaria Anita Cintorrino, Gaia Romana De Paolis, Alexa Bicelli, Dario Paladini, Elena De Momi, Leonardo S. Mattos
An Advanced Tool for Semi-automatic Annotation for Early Screening of Neurodevelopmental Disorders

Non-invasive solutions (no sensors nor markers) appear the most appealing for assessment of body movements and facial dynamics in order to predict Neurodevelopmental disorders (NDD) even in the first days of life. To this aim, recent advances in machine learning applied could be effectively exploited on visual data framing the children, but they suffer from the scarcity of annotated data for training the algorithms. In order to fill this gap, in this paper, a semi-automatic tool specifically designed for labelling videos of children in cribs is introduced. It consists of a Graphical User Interface allowing to select: 1) videos, or static images, to be processed and 2) the desired annotation goal achieved by state-of-the-art deep learning based neural architectures.

Giuseppe Massimo Bernava, Marco Leo, Pierluigi Carcagnì, Cosimo Distante
Some Ethical Remarks on Deep Learning-Based Movements Monitoring for Preterm Infants: Green AI or Red AI?

Preterm infants’ spontaneous movements monitoring is a valuable ally to early recognise neuro-motor impairments, especially common in infants born before term. Currently, highly-specialized clinicians assess the movements quality on the basis of subjective, discontinuous, and time-consuming observations. To support clinicians, automatic monitoring systems have been developed, among which Deep Learning algorithms (mainly Convolutional Neural Networks (CNNs)) are up-to-date the most suitable and less invasive ones. Indeed, research in this field has devised highly reliable models, but has shown a tendency to neglect their computational costs. In fact, these models usually require massive computations, which, in turn, require expensive hardware and are environmentally unsustainable. As a consequence, the costs of these models risk to make their application to the actual clinical practice a privilege. However, the ultimate goal of research, especially in healthcare, should be designing technologies that are fairly accessible to as many people as possible. In light of this, this work analyzes three CNNs for preterm infants’ movements monitoring on the basis of their computational requirements. The two best-performing networks achieve very similar accuracy (Dice Similarity Coefficient around 0.88) although one of them, which was designed by us following the principles of Green AI, requires half as many Floating Point Operations ( $$47\times 10^9$$ 47 × 10 9 vs $$101\times 10^9$$ 101 × 10 9 ). Our research show that it is possible to design highly-performing and cost-efficient Convolutional Neural Networks for clinical applications .

Alessandro Cacciatore, Lucia Migliorelli, Daniele Berardini, Simona Tiribelli, Stefano Pigliapoco, Sara Moccia

Towards a Complete Analysis of People: From Face and Body to Clothes - T-CAP

Effect of Gender, Pose and Camera Distance on Human Body Dimensions Estimation

Human Body Dimensions Estimation (HBDE) is a task that an intelligent agent can perform to attempt to determine human body information from images (2D) or point clouds or meshes (3D). More specifically, if we define the HBDE problem as inferring human body measurements from images, then HBDE is a difficult, inverse, multi-task regression problem that can be tackled with machine learning techniques, particularly convolutional neural networks (CNN). Despite the community’s tremendous effort to advance human shape analysis, there is a lack of systematic experiments to assess CNNs estimation of human body dimensions from images. Our contribution lies in assessing a CNN estimation performance in a series of controlled experiments. To that end, we augment our recently published neural anthropometer dataset by rendering images with different camera distance. We evaluate the network inference absolute and relative mean error between the estimated and actual HBDs. We train and evaluate the CNN in four scenarios: (1) training with subjects of a specific gender, (2) in a specific pose, (3) sparse camera distance and (4) dense camera distance. Not only our experiments demonstrate that the network can perform the task successfully, but also reveal a number of relevant facts that contribute to better understand the task of HBDE .

Yansel González Tejeda, Helmut A. Mayer
StyleTrendGAN: A Deep Learning Generative Framework for Fashion Bag Generation

Dealing with fashion multimedia big data with Artificial Intelligence (AI) algorithms has become an appealing challenge for computer scientists, since it can serve as inspiration for fashion designers and can also allow to predict the next trendy items in the fashion industry. Moreover, with the global spread of COVID-19 pandemic, social media contents have achieved an increasingly crucial factor in driving retail purchase decisions, thus it has become mandatory for fashion brand analysing social media pictures. In this light, this paper aims at presenting StyleTrendGAN, a novel custom deep learning framework that has the ability to generate fashion items. StyleTrendGAN combines a Dense Extreme Inception Network (DexiNed) for sketches extraction and Pix2Pix for the transformation of the input sketches into the new handbag models. StyleTrendGAN increases the efficiency and accuracy of the creation of new fashion models compared to previous ones and to the classic human approach; it aims to stimulate the creativity of designers and the visualization of the results of a production process without actually putting it into practice. The approach was applied and tested on a newly collected dataset, “MADAME” (iMage fAshion Dataset sociAl MEdia) of images collected from Instagram. The experiments yield high accuracy, demonstrating the effectiveness and suitability of the proposed approach.

Laura Della Sciucca, Emanuele Balloni, Marco Mameli, Emanuele Frontoni, Primo Zingaretti, Marina Paolanti
Gender Recognition from 3D Shape Parameters

Gender recognition from images is generally approached by extracting the salient visual features of the observed subject, either focusing on the facial appearance or by analyzing the full body. In real-world scenarios, image-based gender recognition approaches tend to fail, providing unreliable results. Face-based methods are compromised by environmental conditions, occlusions (presence of glasses, masks, hair), and poor resolution. Using a full-body perspective leads to other downsides: clothing and hairstyle may not be discriminative enough for classification, and background cluttering could be problematic. We propose a novel approach for body-shape-based gender classification. Our contribution consists in introducing the so-called Skinned Multi-Person Linear model (SMPL) as 3D human mesh. The proposed solution is robust to poor image resolution and the number of features for the classification is limited, making the recognition task computationally affordable, especially in the classification stage, where less complex learning architectures can be easily trained. The obtained information is fed to an SVM classifier, trained and tested using three different datasets, namely (i) FVG, containing videos of walking subjects (ii) AMASS, collected by converting MOCAP data of people performing different activities into realistic 3D human meshes, and (iii) SURREAL, characterized by synthetic human body models. Additionally, we demonstrate that our approach leads to reliable results even when the parametric 3D mesh is extracted from a single image. Considering the lack of benchmarks in this area, we trained and tested the FVG dataset with a pre-trained Resnet50, for comparing our model-based method with an image-based approach.

Giulia Martinelli, Nicola Garau, Nicola Conci
Recognition of Complex Gestures for Real-Time Emoji Assignment

Gesture recognition allows humans to interface and interact naturally with the machine. This paper presents analytical and algebraic methods to recognize specific combinations of facial expressions and hand gestures, including interactions between hands and face. The methodologies for extracting the features for both faces and hands were implemented starting from landmarks identified in real-time by the MediaPipe framework. To benchmark our approach, we selected a large set of emoji and designed a system capable of associating chosen emoji to facial expressions and/or hand gestures recognized. Complex poses and gestures combinations have been selected and assigned to specific emoji to be recognized by the system. Furthermore, the Web Application we created demonstrates that our system is able to quickly recognize facial expressions and complex poses from a video sequence from standard camera. The experimental results show that our proposed methods are generalizable, robust and achieve on average 99,25% of recognition accuracy.

Rosa Zuccarà, Alessandro Ortis, Sebastiano Battiato
Generating High-Resolution 3D Faces Using VQ-VAE-2 with PixelSNAIL Networks

The realistic generation of synthetic 3D faces is an open challenge due to the complexity of the geometry and the lack of large and diverse publicly available datasets. Generative models based on convolutional neural networks (CNNs) have recently demonstrated great ability to produce novel synthetic high-resolution images indistinguishable from the original pictures by an expert human observer. However, applying them to non-grid-like data like 3D meshes presents many challenges. In our work, we overcome the challenges by first reducing the face mesh to a 2D regular image representation and then exploiting one prominent state-of-the-art generative approach. The approach uses a Vector Quantized Variational Autoencoder VQ-VAE-2 to learn a latent discrete representation of the 2D images. Then, the 3D synthesis is achieved by fitting the latent space and sampling it with an autoregressive model, PixelSNAIL. The quantitative and qualitative evaluation demonstrate that synthetic faces generated with our method are statistically closer to the real faces when compared to a classical synthesis approach based on Principal Component Analysis (PCA).

Alessio Gallucci, Dmitry Znamenskiy, Nicola Pezzotti, Milan Petkovic

Artificial Intelligence for Digital Humanities - AI4DH

The Morra Game: Developing an Automatic Gesture Recognition System to Interface Human and Artificial Players

Morra is an ancient hand game still played nowadays. In its more popular variant, two players simultaneously extend one hand in front of the opponent to show a number of fingers, while uttering a number from 2 to 10. The player who successfully guesses the total number of fingers scores a point. Morra can be defined as a serious game, as it has the potential to positively affect cognition and to improve cognitive and perceptual skills. Moreover, with its involvement of many perceptual, cognitive and motor skills, morra is ideal to test several cognitive processes. This paper describes aspects of Gavina 2121, an artificial Morra player that successfully predicts the numbers of human opponents taking advantage of the limited ability of humans in random sequence generation. This study focuses on automatic gesture recognition. We developed and tested a system to allow Gavina 2121 to detect and count in real time the number of extended fingers in a human hand. The system is based on the open source MediaPipe Hand framework developed by Google. Our tests indicate that the system is able to accurately recognize the number of fingers extended by a human hand in real time, both in prone and supine positions. The system is still imprecise in semi-naturalistic conditions of an actual morra game, where the fingers of two hands need to be computed simultaneously. Our test, still in its pilot phase, shows promising results towards a flexible implementation of an artificial morra player that can sensibly expand the educational, rehabilitation and research applications of Morra.

Franco Delogu, Francesco De Bartolomeo, Sergio Solinas, Carla Meloni, Beniamina Mercante, Paolo Enrico, Rachele Fanari, Antonello Zizi
Integration of Point Clouds from 360° Videos and Deep Learning Techniques for Rapid Documentation and Classification in Historical City Centers

Digital metric documentation of historical city centers is challenging because of the complexity of the buildings and monuments, which feature different geometries, construction technologies, and materials. We propose a solution for rapid documentation and classification of such complex spaces using 360° video cameras, which can capture the entire scene and can be pointed in any direction, making data acquisition rapid and straightforward. The high framerate during image acquisition allows users to capture overlapping images that can be used for photogrammetric applications. This paper aims to quickly capture 360° videos with low-cost cameras and then generate dense point clouds using the photogrammetric/structure from motion pipeline for 3D modeling. Point cloud classification is the prerequisite for such applications. Numerous deep learning methods (DL) have been developed to classify point clouds due to the expansion of artificial intelligence (AI) capabilities. We aim to pave the way toward utilizing the convolutional neural network (CNN) to classify point clouds generated by 360° videos of historic cities. A preliminary case study in a historic city center demonstrates that our method achieves promising results in the generation and classification of point clouds, with an overall classification accuracy of 96% using the following categories: ground, buildings, poles, bollards, cars, and natural.

Yuwei Cao, Mattia Previtali, Luigi Barazzetti, Marco Scaioni
Towards the Creation of AI-powered Queries Using Transfer Learning on NLP Model - The THESPIAN-NER Experience

Tools for HEritage Science Processing, Integration, and ANalysis (THESPIAN) is a cloud system that offers multiple web services to the researchers of INFN-CHNet, from storing their raw data to reusing them by following the FAIR principles for establishing integration and interoperability among shared information.The injection in the CHNet cloud database of data and metadata (the latter modelled on a CIDOC-based ontology called CRMhs [20]) is performed by using the cloud service THESPIAN-Mask.THESPIAN-NER is a tool based on a deep neural network for Named Entity Recognition (NER), which will ease the data extraction from the database, enabling users to upload .pdf or .txt files and obtain named entities and keywords to be fetched in the metadata entries of the database.The neural network, on which THESPIAN-NER relies, is based on a set of open-source NLP models; transfer learning was employed to customise the Named Entity Recognition output of the models to match the CRMhs ontology properties.The service is now available in alpha version to researchers on the CHNet cloud.

Alessandro Bombini, Lisa Castelli, Achille Felicetti, Franco Niccolucci, Anna Reccia, Francesco Taccetti
Detecting Fake News in MANET Messaging Using an Ensemble Based Computational Social System

Mobile Adhoc Networks (MANETs) are utilised in a variety of mission-critical situations and as such, it is important to detect any fake news that exists in such networks. This research proposes an Ensemble Based Computational Social System for fake news detection in MANET messaging. As such this research combines the power of Veracity, a unique, computational social system with that of Legitimacy, a dedicated ensemble learning technique, to detect fake news in MANET messaging. Veracity uses five algorithms namely, VerifyNews, CompareText, PredictCred, CredScore and EyeTruth for the capture, computation and analysis of the credibility and content data features using computational social intelligence. To validate Veracity, a dataset of publisher credibility-based and message content-based features is generated to predict fake news. To analyse the data features, Legitimacy, a unique ensemble learning prediction model is used. Four analytical methodologies are used to analyse these experimental results. The analysis of the results reports a satisfactory performance of the Veracity architecture combined with the Legitimacy model for the task of fake news detection in MANET messaging.

Amit Neil Ramkissoon, Wayne Goodridge
PergaNet: A Deep Learning Framework for Automatic Appearance-Based Analysis of Ancient Parchment Collections

Archival institutions and program worldwide work to ensure that the records of governments, organizations, communities and individuals be preserved for the next generations as cultural heritage, as sources of rights, and to hold the past accountable. The digitalization of ancient written documents made of parchment were an important communication mean to humankind and have an invaluable historical value to our culture heritage (CH). Automatic analysis of parchments has become an important research topic in fields of image and pattern recognition. Moreover, Artificial Intelligence (AI) and its subset Deep Learning (DL) have been receiving increasing attention in pattern representation. Interest in applying AI to ancient image data analysis is becoming mandatory, and scientists are increasingly using it as a powerful, complex, tool for statistical inference. In this paper it is proposed PergaNet a lightweight DL-based system for historical reconstructions of ancient parchments based on appearance-based approaches. The aim of PergaNet is the automatic analysis and processing of huge amount of scanned parchments. This problem has not been properly investigated by the computer vision community yet due to the parchment scanning technology novelty, and it is extremely important for effective data recovery from historical documents whose content is inaccessible due to the deterioration of the parchment. The proposed approach aims at reducing hand-operated analysis and at the same time at using manual annotation as a form of continuous learning. PergaNet comprises three important phases: classification of parchments recto/verso, the detection of text, then the detection and recognition of the “signum tabellionis”. PergaNet concerns not only the recognition and classification of the objects present in the images, but also the location of each of them. The analysis is based on data from the ordinary use and does not involve altering or manipulating techniques in order to generate data.

Marina Paolanti, Rocco Pietrini, Laura Della Sciucca, Emanuele Balloni, Benedetto Luigi Compagnoni, Antonella Cesarini, Luca Fois, Pierluigi Feliciati, Emanuele Frontoni
Transformers with YOLO Network for Damage Detection in Limestone Wall Images

Cultural heritage buildings damage detection is of a great significance for planning restoration operations. However, the buildings analysis is generally performed by experts through on-site qualitative visual assessments. A highly time-consuming task, hardly possible at the scale of large historical buildings.This paper proposes a new neural network architecture for automatic detection of spalling zones in limestone walls with color images. This architecture consists of the latest YOLO network, enhanced with layers of transformers encoder providing more comprehensive features. The performances of the proposed network improve significantly those of the YOLO core network on our dataset of over 1000 high resolution images from the Renaissance style Château de Chaumont in the Loire Valley (France).

Koubouratou Idjaton, Xavier Desquesnes, Sylvie Treuillet, Xavier Brunetaud

Medical Transformers - MEDXF

On the Effectiveness of 3D Vision Transformers for the Prediction of Prostate Cancer Aggressiveness

Prostate cancer is the most frequent male neoplasm in European men. To date, the gold standard for determining the aggressiveness of this tumor is the biopsy, an invasive and uncomfortable procedure. Before the biopsy, physicians recommend an investigation by multiparametric magnetic resonance imaging, which may serve the radiologist to gather an initial assessment of the tumor. The study presented in this work aims to investigate the role of Vision Transformers in predicting prostate cancer aggressiveness based only on imaging data. We designed a 3D Vision Transformer able to process volumetric scans, and we optimized it on the ProstateX-2 challenge dataset by training it from scratch. As a term of comparison, we also designed a 3D Convolutional Neural Network, and we optimized it in a similar fashion. The results obtained by our preliminary investigations show that Vision Transformers, even without extensive optimization and customization, can ensure an improved performance with respect to Convolutional Neural Networks and might be comparable with other more fine-tuned solutions.

Eva Pachetti, Sara Colantonio, Maria Antonietta Pascali
Exploring a Transformer Approach for Pigment Signs Segmentation in Fundus Images

Over the past couple of years, Transformers became increasingly popular within the deep learning community. Initially designed for Natural Language Processing tasks, Transformers were then tailored to fit to the Image Analysis field. The self-attention mechanism behind Transformers immediately appeared a promising, although computationally expensive, learning approach. However, Transformers do not adapt as well to tasks involving large images or small datasets. This propelled the exploration of hybrid CNN-Transformer models, which seemed to overcome those limitations, thus sparkling an increasing interest also in the field of medical imaging. Here, a hybrid approach is investigated for Pigment Signs (PS) segmentation in Fundus Images of patients suffering from Retinitis Pigmentosa, an eye disorder eventually leading to complete blindness. PS segmentation is a challenging task due to the high variability of their size, shape and colors and to the difficulty to distinguish between PS and blood vessels, which often overlap and display similar colors. To address those issues, we use the Group Transformer U-Net, a hybrid CNN-Transformer. We investigate the effects, on the learning process, of using different losses and choosing an appropriate parameter tuning. We compare the obtained performances with the classical U-Net architecture. Interestingly, although the results show margins for a consistent improvement, they do not suggest a clear superiority of the hybrid architecture. This evidence raises several questions, that we address here but also deserve to be further investigated, on how and when Transformers are really the best choice to address medical imaging tasks.

Mara Sangiovanni, Maria Frucci, Daniel Riccio, Luigi Di Perna, Francesca Simonelli, Nadia Brancati
Transformer Based Generative Adversarial Network for Liver Segmentation

Automated liver segmentation from radiology scans (CT, MRI) can improve surgery and therapy planning and follow-up assessment in addition to conventional use for diagnosis and prognosis. Although convolution neural networks (CNNs) have became the standard image segmentation tasks, more recently this has started to change towards Transformers based architectures because Transformers are taking advantage of capturing long range dependence modeling capability in signals, so called attention mechanism. In this study, we propose a new segmentation algorithm using a hybrid approach combining the Transformer(s) with the Generative Adversarial Network (GAN) approach. The premise behind this choice is that the self-attention mechanism of the Transformers allows the network to aggregate the high dimensional feature and provide global information modeling. This mechanism provides better segmentation performance compared with traditional methods. Furthermore, we encode this generator into the GAN based architecture so that the discriminator network in the GAN can classify the credibility of the generated segmentation masks compared with the real masks coming from human (expert) annotations. This allows us to extract the high dimensional topology information in the mask for biomedical image segmentation and provide more reliable segmentation results. Our model achieved a high dice coefficient of 0.9433, recall of 0.9515, and precision of 0.9376 and outperformed other Transformer based approaches.

Ugur Demir, Zheyuan Zhang, Bin Wang, Matthew Antalek, Elif Keles, Debesh Jha, Amir Borhani, Daniela Ladner, Ulas Bagci

Learning in Precision Livestock Farming - LPLF

Suggestions for the Environmental Sustainability from Precision Livestock Farming and Replacement in Dairy Cows

The livestock sector, like other sectors, has a high environmental impact and we must find solutions to reduce it to accomplish the requirements for a more sustainable production system in line with the European Green Deal requirements.The aim of this paper is to show a case study in which it is evaluated the effect of PLF technology on the environmental impact of dairy cattle farming by using simulations of Life Cycle Assessment (LCA). This case study involves the use of pedometers for an improved detection of oestrus events in order to make more efficient the livestock activities and the related environmental impact. The results show that the application of LCA can work as a feasible approach to get insight in the significance of the environmental benefit of applying PLF tools on farms.

Lovarelli Daniela, Berckmans Daniel, Bacenetti Jacopo, Guarino Marcella
Intelligent Video Surveillance for Animal Behavior Monitoring

The behavior of animals reflects their internal state. Changes in behavior, such as a lack of sleep, can be detected as early warning signs of health issues. Zoologists are often required to use video recordings to study animal activity. These videos are generally not sufficiently indexed, so the process is long and laborious, and the observation results may vary between the observers. This study looks at the difficulty of measuring elephant sleep stages from surveillance videos of the elephant bran at night. To assist zoologists, we propose using deep learning techniques to automatically locate elephants in each camera surveillance, then mapping the elephants detected onto the barn plan. Instead of watching all of the videos, zoologists will examine the mapping history, allowing them to measure elephant sleeping stages faster. Overall, our approach monitors elephants in their barn with a high degree of accuracy.

Souhaieb Aouayeb, Xavier Desquesnes, Bruno Emile, Baptiste Mulot, Sylvie Treuillet
Quick Quality Analysis on Cereals, Pulses and Grains Using Artificial Intelligence

Our purpose is to design a system for Quick Quality analysis of cereals, pulses, and grains using Artificial Intelligence implementing hardware and software technologies, which could swiftly analyze the type and quality of bulk grains without human intervention. Our aim is detecting and image processing to assess the grain's quality, which is also used as food for livestock, that is an important problem since high food quality has a great impact on animal health, overcoming the limitations of prior work in this field. In our methodology, the complete grain is analyzed to determine its quality by placing it in a controlled environment with a 13 Megapixel, 4K Camera Module to perform the initial step for image processing procedures, before implementing a Neural network.Surpassing the hurdles of samples to estimate the quality of the complete grains, we demonstrated a full-grain scan technique, resulting in a unique hardware system that can more efficiently estimate the quality of the entire grains. Furthermore, our technology yields faster results since it captures the moving grains on the conveyor belt using a precise, fast camera module, which is analyzed by expeditious NVidia Jetson Nano. We applied automation in every step of this process by using vacuum tubes to collect the grains, a filter to align them, conveyor belt with a variable notch for optimum grain disposal. In summary, our study describes innovative developments stemming from a system that provides proper analysis of bulk grains, cereals, and pulses as well as automation of the system.

Bendadi Prayuktha, Mankina Vishali, Distante Alessandro, Guzzi Rodolfo
Label a Herd in Minutes: Individual Holstein-Friesian Cattle Identification

We describe a practically evaluated approach for training visual cattle ID systems for a whole farm requiring only ten minutes of labelling effort. In particular, for the task of automatic identification of individual Holstein-Friesians in real-world farm CCTV, we show that self-supervision, metric learning, cluster analysis, and active learning can complement each other to significantly reduce the annotation requirements usually needed to train cattle identification frameworks. Evaluating the approach on the test portion of the publicly available Cows2021 dataset, for training we use 23,350 frames across 435 single individual tracklets generated by automated oriented cattle detection and tracking in operational farm footage. Self-supervised metric learning is first employed to initialise a candidate identity space where each tracklet is considered a distinct entity. Grouping entities into equivalence classes representing cattle identities is then performed by automated merging via cluster analysis and active learning. Critically, we identify the inflection point at which automated choices cannot replicate improvements based on human intervention to reduce annotation to a minimum. Experimental results show that cluster analysis and a few minutes of labelling after automated self-supervision can improve the test identification accuracy of 153 identities to 92.44% (ARI = 0.93) from the 74.9% (ARI = 0.754) obtained by self-supervision only. These promising results indicate that a tailored combination of human and machine reasoning in visual cattle ID pipelines can be highly effective whilst requiring only minimal labelling effort. We provide all key source code and network weights with this paper for easy result reproduction.

Jing Gao, Tilo Burghardt, Neill W. Campbell

Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques - WOSDETC

DroBoost: An Intelligent Score and Model Boosting Method for Drone Detection

Drone detection is a challenging object detection task where visibility conditions and quality of the images may be unfavorable, and detections might become difficult due to complex backgrounds, small visible objects, and hard to distinguish objects. Both provide high confidence for drone detections, and eliminating false detections requires efficient algorithms and approaches. Our previous work, which uses YOLOv5, uses both real and synthetic data and a Kalman-based tracker to track the detections and increase their confidence using temporal information. Our current work improves on the previous approach by combining several improvements. We used a more diverse dataset combining multiple sources and combined with synthetic samples chosen from a large synthetic dataset based on the error analysis of the base model. Also, to obtain more resilient confidence scores for objects, we introduced a classification component that discriminates whether the object is a drone or not. Finally, we developed a more advanced scoring algorithm for object tracking that we use to adjust localization confidence. Furthermore, the proposed technique won 1st Place in the Drone vs. Bird Challenge (Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques at ICIAP 2021).

Ogulcan Eryuksel, Kamil Anil Ozfuttu, Fatih Cagatay Akyon, Kadir Sahin, Efe Buyukborekci, Devrim Cavusoglu, Sinan Altinuc
Drone-vs-Bird Detection Challenge at ICIAP 2021

This paper reports the results of the 5th edition of the “Drone-vs-Bird” detection challenge, organized within the 21st International Conference on Image Analysis and Processing (ICIAP). By taking as input video samples recorded by common cameras, the aim of the challenge is to devise advanced approaches aimed at spotlighting the presence of drones flying in the monitored area, while limiting the number of wrong alarms raised when similar flying entities such as birds suddenly appear in the scene. To this end, a number of important issues such as the dynamic variations in the scene and the background/foreground motion effects should be carefully considered, so as to allow the proposed solutions to correctly identify drones only when they are actually present. The paper summarizes the novel algorithms proposed by the four participating teams that succeeded in providing satisfactory detection performance on the 2022 challenge dataset.

Angelo Coluccia, Alessio Fascista, Arne Schumann, Lars Sommer, Anastasios Dimou, Dimitrios Zarpalas, Nabin Sharma, Mrunalini Nalamati, Ogulcan Eryuksel, Kamil Anil Ozfuttu, Fatih Cagatay Akyon, Kadir Sahin, Efe Buyukborekci, Devrim Cavusoglu, Sinan Altinuc, Daitao Xing, Halil Utku Unlu, Nikolaos Evangeliou, Anthony Tzes, Abhijeet Nayak, Mondher Bouazizi, Tasweer Ahmad, Artur Gonçalves, Bastien Rigault, Raghvendra Jain, Yutaka Matsuo, Helmut Prendinger, Edmond Jajaga, Veton Rushiti, Blerant Ramadani, Daniel Pavleski
An Image-Based Classification Module for Data Fusion Anti-drone System

Means of air attack are pervasive in all modern armed conflict or terrorist action. We present the results of a NATO-SPS project that aims to fuse data from a network of optical sensors and low-probability-of-intercept mini radars. The requirements of the image-based module aim to differentiate between birds and drones, then between different kind of drones: copters, fixed wings, and finally the presence or not of payload. In this paper, we outline the experimental results of the deep learning model for differentiating drones from birds. Based on the trade-off between speed and accuracy, the YOLO v4 was chosen. A dataset refine process for YOLO-based approaches is proposed. The experimental results verify that such an approach provide a reliable source for situational awareness in a data fusion platform. However, the analysis indicates the necessity of enriching the dataset with more images with complex backgrounds as well as different target sizes.

Edmond Jajaga, Veton Rushiti, Blerant Ramadani, Daniel Pavleski, Alessandro Cantelli-Forti, Biljana Stojkovska, Olivera Petrovska
Evaluation of Fully Convolutional One-Stage Object Detection for Drone Detection

In this paper, we present our approach for drone detection which we submitted for the Drone-Vs-Bird Detection Challenge. In our work, we used the Fully Convolutional One-Stage Object Detection (FCOS) approach tuned to detect drones. Throughout our experiments, we opted for a simple data augmentation technique to reduce the amount of False Positives (FPs). Upon observing the results of our early experiments, our technique for data augmentation incorporates adding extra samples to the training sets including the object which generated the most number of FPs, namely other flying objects, leaves and objects with sharp edges. With the newly introduced data to the training set, our results for drone detection on the validation set are as follows: AP scores of 0.16, 0.34 and 0.65 for small-sized, medium-sized and large drones respectively.

Abhijeet Nayak, Mondher Bouazizi, Tasweer Ahmad, Artur Gonçalves, Bastien Rigault, Raghvendra Jain, Yutaka Matsuo, Helmut Prendinger
Drone Surveillance Using Detection, Tracking and Classification Techniques

In this work, we explore the process of designing a long-term drone surveillance system by fusing object detection, tracking and classification methods. Given a video stream from an RGB-camera, a detection module based on YOLOV5 is trained for finding drones within its field of view. Although in drone detection, high accuracy and robustness is achieved with the underlying complex architecture, the detection speed is hindered on ultra HD-streams. To solve this problem, we integrate a high efficient object tracker to update target status while avoiding running the detection at each frame. Benefited from lightweight backbone networks with powerful Transformer design, the object tracker achieves real-time speed on standalone CPU devices. Moreover, a drone classification model is applied on the output of the detection and tracking mechanisms to further distinguish drones from other background distractors (birds, balloons). By leveraging inference optimization with TensorRT and ONNX, our system achieves extremely high inference speed on NVIDIA GPUs. A ROS package is designed to integrate the aforementioned components together and provide a flexible, end-to-end drone surveillance tool for real-time applications. Comprehensive experiments on both standard benchmarks and field tests demonstrate the effectiveness and stability of proposed system.

Daitao Xing, Halil Utku Unlu, Nikolaos Evangeliou, Anthony Tzes

Medical Imaging Analysis for Covid-19 - MIACOVID 2022

ILC-Unet++ for Covid-19 Infection Segmentation

Since the appearance of Covid-19 pandemic, in the end of 2019, Medical Imaging has been widely used to analysis this disease. In fact, CT-scans of the Lung can help to diagnosis, detect and quantify Covid-19 infection. In this paper, we address the segmentation of Covid-19 infection from CT-scans. In more details, we propose a CNN-based segmentation architecture named ILC-Unet++. The proposed ILC-Unet++ architecture, which is trained for both Covid-19 Infection and Lung Segmentation. The proposed architecture were tested using three datasets with two scenarios (intra and cross datasets). The experimental results showed that the proposed architecture performs better than three baseline segmentation architectures (Unet, Unet++ and Attention-Unet) and two Covid-19 infection segmentation architectures (SCOATNet and nCoVSegNet).

Fares Bougourzi, Cosimo Distante, Fadi Dornaika, Abdelmalik Taleb-Ahmed, Abdenour Hadid
Revitalizing Regression Tasks Through Modern Training Procedures: Applications in Medical Image Analysis for Covid-19 Infection Percentage Estimation

In order to establish the correct protocol for COVID-19 treatment, estimating the percentage of COVID-19 specific infection within the lung tissue can be an important tool. This article describes the approach we used in order to estimate the COVID-19 infection percentage on lung CT scan slices within the Covid-19-Infection-Percentage-Estimation-Challenge. Our method frames the regression problem as a multi-tasking process and is based on modern training pipelines and architectures that correspond to state of the art models on image classification tasks. It obtained the best score on the validation dataset and ranked third in the testing phase within the competition.

Radu Miron, Mihaela Elena Breaban
Res-Dense Net for 3D Covid Chest CT-Scan Classification

One of the most contentious areas of research in Medical Image Preprocessing is 3D CT-scan. With the rapid spread of COVID-19, the function of CT-scan in properly and swiftly diagnosing the disease has become critical. It has a positive impact on infection prevention. There are many tasks to diagnose the illness through CT-scan images, include COVID-19. In this paper, we propose a method that using a Stacking Deep Neural Network to detect the Covid 19 through the series of 3D CT-scans images. In our method, we experiment with two backbones are DenseNet 121 and ResNet 101. This method achieves a competitive performance on some evaluation metrics.

Quoc-Huy Trinh, Minh-Van Nguyen, Thien-Phuc Nguyen-Dinh
Deep Regression by Feature Regularization for COVID-19 Severity Prediction

During the COVID-19 worldwide pandemic, CT scan emerged as one of the most precise tool for identification and diagnosis of affected patients. With the increase of available medical imaging, Artificial Intelligence powered methods arisen to aid the detection and classification of COVID-19 cases. In this work, we propose a methodology to automatically inspect CT scan slices assessing the related disease severity. We competed in the ICIAP2021 COVID-19 infection percentage estimation competition, and our method scored in the top-5 at both the Validation phase ranking, with MAE = 4.912%, and Testing phase ranking, with MAE = 5.020%.

Davide Tricarico, Hafiza Ayesha Hoor Chaudhry, Attilio Fiandrotti, Marco Grangetto
Mixup Data Augmentation for COVID-19 Infection Percentage Estimation

The outbreak of the COVID-19 pandemic considerably increased the workload in hospitals. In this context, the availability of proper diagnostic tools is very important in the fight against this virus. Scientific research is constantly making its contribution in this direction. Actually, there are many scientific initiatives including challenges that require to develop deep algorithms that analyse X-ray or Computer Tomography (CT) images of lungs. One of these concerns a challenge whose topic is the prediction of the percentage of COVID-19 infection in chest CT images. In this paper, we present our contribution to the COVID-19 Infection Percentage Estimation Competition organised in conjunction with the ICIAP 2021 Conference. The proposed method employs algorithms for classification problems such as Inception-v3 and the technique of data augmentation mixup on COVID-19 images. Moreover, the mixup methodology is applied for the first time in radiological images of lungs affected by COVID-19 infection, with the aim to infer the infection degree with slice-level precision. Our approach achieved promising results despite the specific constrains defined by the rules of the challenge, in which our solution entered in the final ranking.

Maria Ausilia Napoli Spatafora, Alessandro Ortis, Sebastiano Battiato
Swin Transformer for COVID-19 Infection Percentage Estimation from CT-Scans

Coronavirus disease 2019 (COVID-19) is an infectious disease that has spread globally, disrupting the health care system and claiming millions of lives worldwide. Because of the high number of Covid-19 infections, it has been challenging for medical professionals to manage this crisis. Estimating the Covid-19 percentage can help medical staff categorize patients by severity and prioritize accordingly. With this approach, the intensive care unit (ICU) can free up resuscitation beds for the critical cases and provide other treatments for less severe cases to efficiently manage the healthcare system during a crisis. In this paper, we present a transformer-based method to estimate covid-19 infection percentage for monitoring the evolution of the patient state from computed tomography scans (CT-scans). We used a particular Transformer architecture called Swin Transformer as a backbone network to extract the feature from the CT slice and pass it through multi-layer perceptron (MLP) to obtain covid-19 infection percentage. We evaluated our approach on the covid-19 infection percentage estimation challenge dataset, annotated by two expert radiologists. The experimental results show that the proposed method achieves promising performance with a mean absolute error (MAE) of 4.5042, Pearson correlation coefficient (PC) of 0.9490, root mean square error (RMSE) of 8.0964 on the given Val set leaderboard and a MAE of 3.5569, PC of 0.8547 and RMSE of 7.5102 on the given Test set Leaderboard. These promising results demonstrate the high potential of Swin Transformer architecture for this image regression task of covid-19 infection percentage estimation from CT-scans. The source code of this project can be found at: .

Suman Chaudhary, Wanting Yang, Yan Qiang
COVID-19 Infection Percentage Prediction via Boosted Hierarchical Vision Transformer

A better backbone network usually benefits the performance of various computer vision applications. This paper aims to introduce an effective solution for infection percentage estimation of COVID-19 for the computed tomography (CT) scans. We first adopt the state-of-the-art backbone, Hierarchical Visual Transformer, as the backbone to extract the effective and semantic feature representation from the CT scans. Then, the non-linear classification and the regression heads are proposed to estimate the infection scores of COVID-19 symptoms of CT scans with the GELU activation function. We claim that multi-tasking learning is beneficial for better feature representation learning for the infection score prediction. Moreover, the maximum-rectangle cropping strategy is also proposed to obtain the region of interest (ROI) to boost the effectiveness of the infection percentage estimation of COVID-19. The experiments demonstrated that the proposed method is effective and efficient.

Chih-Chung Hsu, Sheng-Jay Dai, Shao-Ning Chen

Novel Benchmarks and Approaches for Real-World Continual Learning - CL4REAL

Catastrophic Forgetting in Continual Concept Bottleneck Models

Almost all Deep Learning models are dramatically affected by Catastrophic Forgetting when learning over continual streams of data. To mitigate this problem, several strategies for Continual Learning have been proposed, even though the extent of the forgetting is still unclear. In this paper, we analyze Concept Bottleneck (CB) models in the Continual Learning setting and we investigate the effect of high-level features supervision on Catastrophic Forgetting at the representation layer. Consequently, we introduce two different metrics to evaluate the loss of information on the learned concepts as new experiences are encountered. We also show that the obtained Saliency maps remain more stable with the attributes supervision. The code is available at

Emanuele Marconato, Gianpaolo Bontempo, Stefano Teso, Elisa Ficarra, Simone Calderara, Andrea Passerini
Practical Recommendations for Replay-Based Continual Learning Methods

Continual Learning requires the model to learn from a stream of dynamic, non-stationary data without forgetting previous knowledge. Several approaches have been developed in the literature to tackle the Continual Learning challenge. Among them, Replay approaches have empirically proved to be the most effective ones [16]. Replay operates by saving some samples in memory which are then used to rehearse knowledge during training in subsequent tasks. However, an extensive comparison and deeper understanding of different replay implementation subtleties is still missing in the literature. The aim of this work is to compare and analyze existing replay-based strategies and provide practical recommendations on developing efficient, effective and generally applicable replay-based strategies. In particular, we investigate the role of the memory size value, different weighting policies and discuss about the impact of data augmentation, which allows reaching better performance with lower memory sizes.

Gabriele Merlin, Vincenzo Lomonaco, Andrea Cossu, Antonio Carta, Davide Bacciu
Image Analysis and Processing. ICIAP 2022 Workshops
herausgegeben von
Pier Luigi Mazzeo
Emanuele Frontoni
Prof. Stan Sclaroff
Cosimo Distante
Electronic ISBN
Print ISBN

Premium Partner