Skip to main content
main-content
Top

About this book

This book constitutes the refereed proceedings of the 26th International Conference on Information and Software Technologies, ICIST 2020, held in Kaunas, Lithuania, in October 2020.

The 23 full papers and 7 short papers presented were carefully reviewed and selected from 78 submissions. The papers are organized in topical sections on ​business intelligence for information and software system; software engineering; information technology applications.

Table of Contents

Frontmatter

Business Intelligence for Information and Software Systems - Special Session on Intelligent Methods for Data Analysis and Computer Aided Software Engineering

Frontmatter

Survey of Open-Source Clouds Capabilities Extension

Abstract
In this paper, we present a survey, which would be beneficial to anybody considering private or hybrid cloud solution implementation or implementing custom scheduling optimization algorithm on popular cloud computing platforms. The cloud computing platform can be deployed as a private cloud on-premises or in dedicated data centre space, as a public service or as a combination of both. In this survey, we review similarities and differences of most popular private cloud implementation platforms and their compatibility with public cloud solutions. The survey reveales the prevalence of resource scheduling and service deployment algorithms within some popular open-source clouds.
Rita Butkiene, Jaroslav Karpovic, Ricardas Sabaliauskas, Laurynas Sriupsa, Mindaugas Vaitkunas, Gytis Vilutis

A Novel Model Driven Framework for Image Enhancement and Object Recognition

Abstract
Modern technological trends like Internet of Things (IoT’s) essentially require prompt development of software systems. To manage this, Model Driven Architecture (MDA) is frequently applied for development of different systems like industry automation, medical, surveillance, tracking and security etc. Image processing is an integral part of such systems. Particularly, image enhancement and classification operations are mandatory in order to effectively recognize objects for different purposes. Currently, such critical image processing operations are not managed through MDA and low level implementations are performed distinctly during system development. This severely delays the system development due to integration issues. Furthermore, system testing becomes problematic as few components of systems are developed through MDA and image processing operations are implemented in isolation. This article introduces a novel framework i.e. MIEORF – Model-driven Image Enhancement and Object Recognition Framework. Particularly, a meta-model is proposed, that allows modeling and visualization of complex image processing and object recognition tasks. Subsequently, an open source customized tree editor (developed using Eclipse Modeling Framework (EMF)) and graphical modeling tool/workbench (developed using Sirius) have been developed (both distributable via eclipse plugin). Consequently, the proposed framework allows modeling and graphical visualization of major image processing operations. Moreover, it provides strong grounds for model transformation operations e.g. Model to Text Transformations (M2T) using Acceleo for generating executable Matlab code. Furthermore, it systematically combines MDA and image processing concepts which are detailed enough to be easily integrated into wide variety of systems such as industrial automation, medical, surveillance, security and biometrics etc. The feasibility of proposed framework is demonstrated via real world medical imagery case study. The results prove that the proposed framework provides a complete solution for modeling and visualization of image processing tasks and highly effective for MDA based systems development.
Yawar Rasheed, Muhammad Abbas, Muhammad Waseem Anwar, Wasi Haider Butt, Urooj Fatima

Knowledge-Based Generation of the UML Dynamic Models from the Enterprise Model Illustrated by the Ticket Buying Process Example

Abstract
The main scope of this paper is to introduce knowledge-based Enterprise model as sufficient data storage for different Unified Modelling Language (UML) models generation, by using all collected data. UML models can be generated from the Enterprise Model by using certain transformation algorithms presented in previous researches. Generation process from the Enterprise model is illustrated by a particular Ticket Buying example. Generated UML dynamic Use Case, Sequence, State and Activity models of the Ticket buying process demonstrate fullness of stored information in the Enterprise model.
Ilona Veitaite, Audrius Lopata

Standardised Questionnaires in Usability Evaluation. Applying Standardised Usability Questionnaires in Digital Products Evaluation

Abstract
The usability evaluation plays a crucial role in the human-computer interaction. It is one of the basic elements used to verify the user interface quality and also, the quality of the system, as a whole. The goals of usability testing can vary by study, but usually they include: identifying problems in the design of a product or service, uncovering opportunities to improve, learning about the target user’s behaviour and preferences. In this paper, we will present an analysis over the most commonly used standardised questionnaires. Also, based on a comparison between them, we will present the results of an analysis done by a group of students, who were asked to compare and decide which standardised questionnaire would be appropriate for their usability evaluation over a certain project. The students were split in teams, each team having a different project to analyse. Their activity is part of the “Interactivity and Usability” subject of Multimedia Technologies Master Degree Program of Politehnica University of Timișoara. We will also present a score of choosing some of the surveys and emphasise the pros and cons of the preferred questionnaires.
Oana Alexandra Rotaru, Silviu Vert, Radu Vasiu, Diana Andone

Research of Semi-automated Database Development Using Data Model Patterns

Abstract
The paper focuses on the idea to semi-automate relational database development. Various approaches to ease, automate conceptual data modeling discussed. A chosen method to semi-automate conceptual data model development was pattern based-approach. This paper introduces a data model patterns library and a CASE tool to use it. Furthermore, an experiment was conducted to test the abilities of a CASE tool. The purpose of the experiment was to test the coverage and time aspects of an actual database schema reproduction using a CASE tool. Experiment results showed that patterns cover a large portion of a conceptual data model, and a new CASE tool reduces the time required to develop a conceptual data model by hand.
Vytautas Volungevičius, Rita Butkienė

Decision-Making Model at the Management of Hybrid Power Grid

Abstract
This paper is devoted to developing a model of decision-making regarding the optimal control of the Hybrid Grid mode parameters, considering the forecast of changes in the parameters of generation and electricity consumption in the Microgrid. The decision-making problem at the management of Hybrid Power Grids is formed in the conditions of uncertainty and incompleteness of the input information. It cannot be considered as an optimization problem but should be evaluated as a multidimensional and multiscale problem. In this research the information support components, arrays of alternative possible regimes have been formed, based on expert evaluation, the decision selection criteria have been defined, fuzzy production rules have been formulated, which consider the operational logic of the hybrid grid. The developed system of fuzzy production rules allows in the functioning process of the Hybrid Power Consumption System to make changes in the mode of operation in order to increase energy savings, resource components of the power consumption system and electricity quality.
Sergii Tymchuk, Sergii Shendryk, Vira Shendryk, Anton Panov, Anastasia Kazlauskaite, Tetiana Levytska

Mining Data with Many Missing Attribute Values Using Global and Saturated Probabilistic Approximations Based on Characteristic Sets

Abstract
In this paper, incomplete data sets have missing attribute values of two types: lost values and “do not care” conditions. Our algorithm of data mining, based on rule induction, uses two types of probabilistic approximations, called global and saturated. Thus, we use four different ways of rule induction, applying two types of missing attribute values with two types of probabilistic approximations. We used ten-fold cross validation to estimate an error rate. Previous results, with data sets with 35% of missing attribute values, show that there is no universally best way of rule induction. Therefore, in our current experiments, we use data sets with many missing attribute values. As follows from our new results, the best way of data mining should be selected by running experiments taking into account all four possibilities.
Patrick G. Clark, Jerzy W. Grzymala-Busse, Teresa Mroczek, Rafal Niemiec

Analytical Model of Design Workflows Organization in the Automated Design of Complex Technical Products

Abstract
Authors have developed a new analytical model for organizing design workflows, including orchestration and choreography compositions of hybrid dynamic diagrammatic models of design workflows in computer-aided systems (CAD) and computer-aided systems of production preparation (CAPP). Their analysis, control, synthesis, transformation and interpretation in different graphic language bases, designed to increase the hybrid dynamic diagrammatic design workflows models interoperability degree in CAD and CAPP on the basis of the ensemble principle. The model differs from analogues in that it provides the system functions and communication protocol definition, which increases their interconnection.
Nikolay Voit, Sergey Kirillov, Semen Bochkov, Irina Ionova

A Model-Driven Framework for Optimum Application Placement in Fog Computing Using a Machine Learning Based Approach

Abstract
The pervasiveness of ubiquitously connected smart devices are the main factors in shaping the computing. With the advent of Internet of things (IoTs), massive amount of data is being generated from different sources. The centralized architecture of cloud has become inefficient for the services provision to IoT enabled applications. For better support and services, fog layer is introduced in order to manage the IoT applications demands like latency, responsiveness, deadlines, resource availability and access time etc. of the fog nodes. However, there are some issues related to resource management and fog nodes allocation to the requesting application based on user expectations in the fog layer that need to be addressed. In this paper, we have proposed a Framework, based on Model Driven Software Engineering (MDSE) that practices Machine Learning algorithms and places fog enabled IoT applications at a most suitable fog node. MDSE is meant to develop software by exploiting the problem at domain model level. It is the abstract representation of knowledge that enhances productivity by maximization of compatibility between the systems. The proposed framework is a meta-model that prioritizes the placement requests of applications based on their required expectations and calculates the abilities of the fog nodes for different application placement requests. Rules based machine learning methods are used to create rules based on user’s requirements metrics and then results are optimized to get requesting device placement in the fog layer. At the end, a case study is conducted that uses fuzzy logic for application mapping and shows how the actual application placement will be done by the framework. The proposed meta-model reduces complexity and provides flexibility to make further enhancements according to the user’s requirement to use any of the Machine Learning approaches.
Madeha Arif, Farooque Azam, Muhammad Waseem Anwar, Yawar Rasheed

Diffusion of Knowledge in the Supply Chain over Thirty Years - Thematic Areas and Sources of Publications

Abstract
The subject of consideration is the diffusion of knowledge about supply chain management analyzed through the prism of countries, journals, scientific carriers of knowledge and detailed analysis of keywords.
The research aims to diagnose which thematic areas of the supply chain have dominated in the last three decades, i.e. since 2019, and therefore in which direction the diffusion of knowledge has developed.
In total, almost 80,000 literary items were generated from SCOPUS. The author’s program was used for some research stages.
As a result of the research, it was found, among other things, that in the initial stage of development of management sciences most of the works were published in the field of inventory management, with time the focus was on the costs of supply chain management, and nowadays the topics related to the sustainable supply chain are dominant. At the same time, the topics that are constantly in the spotlight have been identified as well as topics where knowledge diffusion is growing rapidly.
In the future, by adopting a very short analysis time series, it is possible to identify likely new dynamic research foci such as supply chain 4.0
Anna Maryniak, Yuliia Bulhakova, Włodzimierz Lewoniewski, Monika Bal

Software Engineering - Special Session on Intelligent Systems and Software Engineering Advances

Frontmatter

Genetic Optimization Approach to Construct Schedule for Service Staff

Abstract
Rostering is a complex problem widely analyzed in the optimization area in order to create proper solutions in acceptable duration. After examination of the existing solutions, genetic optimization with greedy approach for schedule construction was proposed for the real-life staff timetable-scheduling problem. The algorithm consists of two steps. In the first step, the greedy approach is used to create an initial in polynomial time depending on the numbers of workers and tasks. In the second step, the genetic optimization is performed with respect to the schedules created initially. Using the proposed approach, it is possible to consider hard and soft requirements, such as staff overtime, preferable but optional tasks, free-time periods etc., as a weighted combination of them by defining weights in the evaluation function next to the proper parameter. The cascaded task assignments enable to consider hard constraints such as workers’ holidays or short non-working periods, minimum break requirements, obligatory working periods and other constraints which appear in real life. The dataset of more than 2000 tasks and 50 flight service staff has been used for testing. The analysis showed that the proposed algorithm can be easily parallelized and adopted to big datasets.
Dalia Čalnerytė, Andrius Kriščiūnas, Rimantas Barauskas

Sigma Key Agreement Protocol for e-Banking System

Abstract
In this paper the solution of preventing active adversary attack, namely Man-in-the-Middle (MiM) attack in e-Banking system is presented. The vulnerable part of communications between user and Bank is the poor authentication level at the user’s side. Therefore, it is a challenge to provide users by the modern means of authentication using e.g. smart phones.
The conjunction of Diffie-hellman key agreement protocol and Schnorr identification protocol is presented by transforming Schnorr identification protocol to Sigma protocol.
It is proved that proposed protocol is secure against active adversary attack, namely against MiM attack under the discrete logarithm assumption.
Donatas Bartkus, Ausrys Kilciauskas, Eligijus Sakalauskas

Table Header Correction Algorithm Based on Heuristics for Improving Spreadsheet Data Extraction

Abstract
A spreadsheet is one of the most commonly used forms of representation for datasets of similar type. Spreadsheets provide considerable flexibility for data structure organisation. As a result of this flexibility, tables with very complex data structures could be created. In turn, such complexity makes automatic table processing and data extraction a challenging task. Therefore, table preproccessing step is often required in the data extraction pipeline. This paper proposes a heuristic algorithm for the correction of a table header in a spreadsheet. The aim of the proposed algorithm is to transform a machine-readable structure of the table header into its visual representation. The algorithm achieves this aim by iterating through table header cells and merging some of them according to proposed heuristics. The transformed structure, in turn, allows to improve quality of spreadsheet understanding and data extraction further in the pipeline. The proposed algorithm was implemented in the TabbyXL toolset.
Viacheslav Paramonov, Alexey Shigarov, Varvara Vetrova

A Review of Self-balancing Robot Reinforcement Learning Algorithms

Abstract
We analyse reinforcement learning algorithms for self balancing robot problem. This is the inverted pendulum principle of balancing robots. Various algorithms and their training methods are briefly described and a virtual robot is created in the simulation environment. The simulation-generated robot seeks to maintain the balance using a variety of incentive training methods that use non-model-based algorithms. The goal is for the robot to learn the balancing strategies itself and successfully maintain its balance in a controlled position. We discuss how different algorithms learn to balance the robot, how the results depend on the learning strategy and the number of steps. We conclude that different algorithms result in different performance and different strategies of keeping the robot balanced. The results also depend on the model training policy. Some of the balancing methods can be difficult to implement in real world.
Aistis Raudys, Aušra Šubonienė

Exploring Web Service QoS Estimation for Web Service Composition

Abstract
Web development, machine ubiquity, and the availability of communication networks impacted device design, replacing the idea of an isolated personal computer with one of distributed and connected computers. A web service is a component of software which provides a specific functionality that can be accessed over the Internet. Software development through the assembly of independent services follows the Service-Oriented Computing (SOC) paradigm. One key in the SOC model is that third parties provide resources by presenting only external access interfaces. In this context, the analysis of issues related to the quality of service (QoS) becomes crucial for several development activities related to web services, spanning the discovery of services, their selection, composition and their adaptation in client systems. As far as we know, little has been done in terms of estimation of unknown quality attribute levels when those attributes have high priority in client systems. In this study, a linear regression-based statistical approach is explored to evaluate the relationship between the quality attributes provided by Web services and the metrics related to their interfaces defined in WSDL. This issue is a cornerstone in web service composition for verifying and ascertaining the levels of quality attributes provided by candidate services when QoS data is missing. Finally, we illustrate the approach by performing experiments with public QoS web service datasets and service interface metrics, explore its limitations, and delineate future steps.
Guillermo Rodríguez, Cristian Mateos, Sanjay Misra

Random Forests and Homogeneous Granulation

Abstract
The work is a continuation of our research on the application of the newly discovered homogeneous granulation technique. The method gives the possibility to reduce the size of decision-making systems while maintaining their classification efficiency without the need to estimate the optimal approximation radii. The level of system approximation depends on the level of homogeneity of decision classes. That is, the tolerance of modification of objects with their preservation in a given class. Being motivated by effectiveness of our recently developed Ensemble model of Random Granular Reflections - where the homogeneous granulation technique was used to select objects for individual learning iterations - we have checked the effectiveness of the Random Forest in the context of boosting the classification on granular data. In the applied technique, an appropriate subset of attributes and objects is used in individual learning iterations. This means that training data is reduced in two ways. The results of experiments carried out on selected data from the UCI repository show reasonable efficiency on significantly reduced training systems.
Krzysztof Ropiak, Piotr Artiemjew

Orchestration Security Challenges in the Fog Computing

Abstract
Fog Computing is a new paradigm which is meant to solve some new challenges in IoT like a wide-spread geographical distribution and mobility of the devices, multiple nodes, heterogeneity of the hardware capabilities and communication technologies. A Fog Computing Orchestration enables the control of multiple devices connected to the Fog Computing network. It offers some new application areas like a smart home, smart grid, smart vehicles, or health data management. Since security issues of both the Fog Computing and Orchestration are not fully explored yet, it poses different challenges. This review paper firstly aims to identify the Fog Computing security challenges as it is the environment for an Orchestration. It reviews some proposed Orchestration solutions as well. Secondly, Orchestration challenges are identified themselves by reviewing over 150 papers. The results suggest that security/privacy is among the top concerns.
Nerijus Šatkauskas, Algimantas Venčkauskas, Nerijus Morkevičius, Agnius Liutkevičius

A Novel Edge Detection Operator for Identifying Buildings in Augmented Reality Applications

Abstract
Augmented Reality is an environment-enhancing technology, widely applied in many domains, such as tourism and culture. One of the major challenges in this field is precise detection and extraction of building information through Computer Vision techniques. Edge detection is one of the building blocks operations for many feature extraction solutions in Computer Vision. AR systems use edge detection for building extraction or for extraction of facade details from buildings. In this paper, we propose a novel filter operator for edge detection that aims to extract building contours or facade features better. The proposed filter gives more weight for finding vertical and horizontal edges that is an important feature for our aim.
Ciprian Orhei, Silviu Vert, Radu Vasiu

Military Vehicle Recognition with Different Image Machine Learning Techniques

Abstract
Different neural network training systems are studied for image recognition of military vehicles, variable start layer transfer training models and own convolutional neural networks training from scratch. Since, there is limited openly available military recordings, labeled social media images are used for training. Furthermore, expanding the image-set by random data transformation. An implementation is made in terms of image augmentation handling as an internal loop that freezes all numerical parameters of the neural network training, while selecting continuously a slightly larger section of the training set including an increment part of artificial images added to the system. All models where trained for three vehicle and two situational environment classification cases. The transfer learning is based on two of the most widely used recognition networks, ResNet50 and Xception, with a variable number of last trained layers to max. twenty. The first being successfully transfer-trained with validation accuracy values of \({\approx }\)88%. In contrast Xception resulted on a over-fitted neural network with low validation accuracy and large loss values. Neither of the transferred schemes benefit from image augmentation. Moreover, in variable architecture training of convolutional networks, it was corroborated that different configurations of layers numbers/type/neurons adapt differently. Thus, a tailor-fit neural network combined with data augmentation strategy is the best approach with validation accuracy of \({\approx }\)86.4%, comparable to large transferred networks with a \({\approx }\)40 times smaller network architecture. Hence, requiring less computational resources. Data augmentation influenced an increment of validation accuracy values of \({\approx }\)9.2%, with the least accurate network trained gaining up to 20% on accuracy due inclusion of artificial images.
Daniel Legendre, Jouko Vankka

Run-Time Class Generation: Algorithm for Decomposition of Homogeneous Classes

Abstract
An ability to change the internal structure and to correct the behaviour adapting to such change of the working environment as heterogeneity of data is an important feature of modern knowledge-based systems. One of the approaches for achieving the goal is to develop tools for dynamic analysis, modification and generation of knowledge structures and program codes as structural parts of intelligent systems. Therefore, analysis of the class structure in object-oriented programming as well as in object-oriented knowledge representation is presented in the paper. The main result of the paper is developed algorithm for dynamic creation of new classes of objects via decomposition of homogeneous classes of objects to the subclasses. The algorithm performs the decomposition of homogeneous classes of objects creating the set of their semantically correct subclasses via solving corresponding constraint satisfaction problem. It can be adapted and integrated into particular knowledge representation model or programming language.
Dmytro O. Terletskyi

Twitter Based Classification for Personal and Non-personal Heart Disease Claims

Abstract
The popularity of Twitter has created a massive social interaction between users that generates a large amount of data containing their opinions and feelings in different subjects including their health conditions, these data contain important information that can be used in disease monitoring and detection, therefore, Twitter has attracted the attention of many researchers as it has proven to be an important source of health information on the Internet.
In this work, we conducted a systematic literature review to discover state-of-the-art methods used in the analysis of Twitter posts related to health, then we proposed an approach based on machine learning, sentiment analysis methods and Big Data technologies to ensure optimal classification of the health status of a population related to cardiovascular diseases in a Twitter environment.
Ghita Amrani, Fadoua Khennou, Nour El Houda Chaoui

Information Technology Applications - Special Session on Smart e-Learning Technologies and Applications

Frontmatter

Escape the Lab: Chemical Experiments in Virtual Reality

Abstract
Virtual Reality (VR) technology introduce new ways to teach students about STEM subjects. Using developed virtual environments students can experience things that would otherwise be dangerous to showcase. We’ve developed a virtual reality educational escape room game in which the player solves problems based on realistic chemical experiments to advance in the game. The game was showcased in a study fair event where people of various ages and backgrounds had an opportunity to test and complete one of the game’s levels. Based on the observations made during the study fair, the overall conclusion is that the VR technology can be a useful tool in education bringing more entertainment and engagement into the learning and teaching processes.
Airidas Janonis, Eligijus Kiudys, Martynas Girdžiūna, Tomas Blažauskas, Lukas Paulauskas, Aleksandras Andrejevas

The VOIL Digital Transformation Competence Framework. Evaluation and Design of Higher Education Curricula

Abstract
This paper presents a framework to evaluate and develop curricula for higher education in the context of digital transformation. Developing well guided learning journeys for the digital transformation is still a major challenge for educators. The proposed VOIL competence framework is grounded in dynamic capability theory. The VOIL competence framework has been developed by relating the DIGROW digital maturity framework to the European e-competence framework. The foundational architecture and rationale of the VOIL competence framework link learning objectives to the specific challenges of digital transformation of small and medium businesses. The authors also discuss the application of the VOIL competence model for evaluating and designing self-directed and personalized learning journeys.
Klaus North, Andreas Hermann, Isabel Ramos, Nekane Aramburu, Daina Gudoniene

Gamified Evaluation in Game-Based Learning

Abstract
Gamification is the processes of introducing game-specific elements into a non-game context. It allows the application of the Game-based Learning approach in traditional educational contexts. This paper presents our efforts in gamification of students’ evaluation. The learning environment Meiro, used for demonstration and exploration in the domain of Computer Graphics, is extended with modules for students’ evaluation. The paper presents these models and discusses the preliminary results of end-users tests.
Pavel Boytchev, Svetla Boytcheva

Hyperparameter Tuning Using Automated Methods to Improve Models for Predicting Student Success

Abstract
Predicting student failure is an important task for educators and a popular application in Educational Data Mining. However, building prediction models is not an easy task and requires time and expertise for feature engineering, model selection, and hyperparameters tuning. In this paper, a strategy of automatic machine learning is used to assess the impact on the performance of prediction models. A previous experiment was modified to include hyperparameter tuning with an autoML method for hyperparameters tuning. The data cleaning, preprocessing, feature engineering and time segmentation approach part of the experiment remained unchanged. With this approach, the correct impact on model performance by hyperparameter tuning can be measured on models that were carefully built. The results show improved performance especially for Decision Tree, Extra Tree, Random Forest Classifiers. This study shows that even carefully planned educational prediction models can benefit for the use of autoML methods and could help non-expert users in the field of EDM to achieve accurate results.
Bogdan Drăgulescu, Marian Bucos

A Case Study of Applying Gamification in Teaching Project Management

Abstract
Project management subject encompasses several project execution and control techniques which are used to ensure successful project delivery. One of such techniques is Earned Value Analysis. Teaching students of information system engineering the principles of Earned Value Analysis is quite challenging, as mastering Earned Value Analysis requires a thorough understanding of the metrics, repetitive calculations and application of the knowledge to various project situations. Therefore, gamification principles were applied and Earned Value Analysis learning game was implemented. The EVA game is an online board game which also incorporates such game elements as rewards, leaderboard, badges, points, levels and feedback. These game elements aim to stimulate the competition among students, increase motivation and level of engagement and make the learning process more interesting. Although the first experimental assessment of the EVA game involved a relatively small number of participants, it demonstrated that students positively evaluate the introduction of gamification elements into the study process.
Kristina Magylaitė, Lina Čeponienė, Mantas Jurgelaitis, Tomas Danikauskas

Design of the Platform Solutions to Increase the Employability and E-Learning Opportunities for Low Skilled Women

Abstract
Personal behavioral skills combined with specific technical knowledge are a must in order to assess the labor market in the twenty first century. Individuals, like young women who are not employed, completed compulsory education or assessed any trainings are the ones who have the urgent need to enter labor market as fast as possible and not be excluded out of it. The aim of this paper is to present the best profiling tool approach to improve women employability by assessing alternative and integrated approach. The presented platform is accessible at any time from any of the devices and will direct users towards existing training offers online and face to face. It will identify existing user skills and competencies against identified digital jobs profile to place user in an employment matrix.
Danguole Rutkauskiene, Greta Volodzkaite

Information Technology Applications - Special Session on Language Technologies

Frontmatter

Cross-lingual Metaphor Paraphrase Detection – Experimental Corpus and Baselines

Abstract
Correct understanding to metaphors is an integral part of natural language understanding. It requires, among other issues, the ability to decide whether a given pair of sentences – such that the first one contains a metaphor – form a paraphrase pair. Although this decision task is formally analogous to a “traditional paraphrase detection” task, it requires a different approach. Recently, a first monolingual corpus (in English) for metaphor paraphrasing was released – together with several baselines. In this work we are going to shift this task to a cross-lingual level: we state a task of cross-lingual metaphor paraphrase detection, introduce a corresponding experimental cross-lingual corpus (English-Czech) and present several approaches to this problem and set the baselines to this challenging problem. This cross-lingual approach may allow us to deal with tasks like multi-document summarization involving texts in different languages as well as enable us to improve information retrieval tools.
Martin Víta

Deep Learning-Based Part-of-Speech Tagging of the Tigrinya Language

Abstract
Deep Neural Networks have demonstrated the great efficiency in many NLP task for various languages. Unfortunately, some resource-scarce languages as, e.g., Tigrinya still receive too little attention, therefore many NLP applications as part-of-speech tagging are in their early stages. Consequently, the main objective of this research is to offer the effective part-of-speech tagging solutions for the Tigrinya language having rather small training corpus.
In this paper the Deep Neural Network classifiers (i.e., Feed Forward Neural Network, Long Short-Term Memory, Bidirectional LSTM and Convolutional Neural Network) are investigated by applying them on a top of trained distributional neural word2vec embeddings. Seeking for the most accurate solutions, DNN models are optimized manually and automatically. Despite automatic hyper-parameter optimization demonstrates a good performance with the Convolutional Neural Network, the manually tested Bidirectional Long Short – Term Memory method achieves the highest overall accuracy equal to 0.91%.
Senait Gebremichael Tesfagergish, Jurgita Kapociute-Dzikiene

Tag Me If You Can: Insights into the Challenges of Supporting Unrestricted P2P News Tagging

Abstract
Peer-to-Peer news portals allow Internet users to write news articles and make them available online to interested readers. Despite the fact that authors are free in their choice of topics, there are a number of quality characteristics that an article must meet before it is published. In addition to meaningful titles, comprehensibly written texts and meaningful images, relevant tags are an important criteria for the quality of such news. In this case study, we discuss the challenges and common mistakes that Peer-to-Peer reporters face when tagging news and how incorrect information can be corrected through the orchestration of existing Natural Language Processing services. Lastly, we use this illustrative example to give insight into the challenges of dealing with bottom-up taxonomies.
Frederik S. Bäumer, Joschka Kersting, Bianca Buff, Michaela Geierhos

Backmatter

Additional information

Premium Partner

    Image Credits