In this paper, we present an overview of the MOVING platform, a user-driven approach that enables young researchers, decision makers, and public administrators to use machine learning and data mining tools to search, organize, and manage large-scale information sources on the web such as scientific publications, videos of research talks, and social media. In order to provide a concise overview of the platform, we focus on its front end, which is the MOVING web application. By presenting the main components of the web application, we illustrate what functionalities and capabilities the platform offer its end-users, rather than delving into the data analysis and machine learning technologies that make these functionalities possible.
1 Introduction
Scholars and professionals in various sectors of the economy, including public administrators, corporate compliance officers, and auditors, deal with an ever-increasing flow of information (new scientific publications, business documents and multimedia files, laws, etc.). They need sophisticated tools to evaluate all this information fast and accurately and to visualize the analysis results. Specifically this means that, on the one hand, they need tools that enable state-of-the-art search and semantic analysis of large digital contents, by providing: (i) access to an extensive source inventory, (ii) advanced search and visualization methods, and (iii) functionalities for generating new knowledge from these digital assets. On the other hand, these tools need to be reasonably easy for their users to understand and support them through: (i) a detailed and scientifically proven help system (tutorials, guidance), individually configurable training programmes (learning modules, videos), and a lively community of people that have similar interests or problems to be solved. To face these challenges, the interdisciplinary trans-European project called MOVING (“TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation”) (Vagliano et al. 2018) has built an innovative training platform that enables users from various societal sectors to fundamentally improve their information literacy by training in how to choose, use, and evaluate data mining methods in their daily research and business tasks, and to become data-savvy information professionals.
2 Digitized Science
Initiatives by the European Union (which has long been pursuing a digital agenda) to support research in the field of digitized science illustrate the need to investigate related change processes (European Commission 2016). Obviously, empirical and theoretical justification is needed to develop the practice of science. The innovative approach dealt with here was developed in the MOVING project, which offers an innovative training platform to support scientists and other users from all areas of society to fundamentally improve their information literacy in research-oriented contexts.1 The project is about training users to select, apply, and evaluate technologies and data mining methods, so that the relevant research staff can develop into ‘data-savvy’ information professionals in their daily research routines (Scherp et al. 2016; Köhler et al. 2016a, b).
Advertisement
In terms of content, the research methodological changes in scientific action cannot easily be explained as domain-specific activities. This requires analyses of both current technological developments and the changes in how scientists use these technologies (or methods). The eScience Saxony research network provides statements on both perspectives (see, e.g., [Pscheida et al. 2013, 2014]). The network has observed the following:
there is great potential for the use of new digital tools in research;
preferred topics for development are scientist collaboration and the visualization of (often large or new) databases;
transitions between the subject areas of research and teaching can also be observed in technology development;
almost all scientists do most of their work using computer-based technologies and have access to appropriate infrastructures;
scientists sometimes find it difficult to adopt new media technologies in research and teaching (e.g. social media), although there are also subject-specific differences;
there is still uncertainty regarding the requirements, possibilities, and assumed risks of open-access publishing;
research methodology has not been fully systematically discussed and is often inadequately implemented;
there are no clear standards for high-quality research technology and no recognizable institutionalization to support open-access trends in science, so these still need to be worked out together;
digital change in science is comparatively rapid from an individual (scientist) perspective, the outcome is not known, especially regarding location-determining infrastructures.
Indeed the listing matches to a larger proportion with the demands of these cases addressed by the MOVING project. Nevertheless MOVING did set focus on two more main characteristics. First there was a serious interest to address research activity not only in academia but as well in public administration and industry. Second, when developing the approach the project consortium decided to include as well a direct focus on the related skill development, i.e. include a serious effort on innovation in the educational dimension (the Online Literacy Training and Learning) that needs to go along with any new technology in every sector.
3 Overview of the MOVING Platform
An overview of the MOVING platform architecture is illustrated in Fig. 1, which shows the most important components and their relationships. The main component blocks are (i) data acquisition, (ii) data processing, (iii) back-end data storage, user tracking, search and recommendation, and (iv) the MOVING web application that includes the front-end search. In this section, we briefly describe the overall platform.
A block diagram depicts the framework of the moving web application. It consists of data acquisition, data processing, moving web applications with a front-end search, and adaptive training support.
Fig. 1
MOVING platform architecture
×
The MOVING web application is the core of the platform and the interface to the user. The main entry points to the web application are the community section, the learning environment, and the search interface. The search interface offers different visual representations of search results. These visualizations allow the user to explore the search results in various ways. For this purpose, four visualizations have been added to the MOVING platform, namely: (i) the Concept Graph, which displays the search results as an interactive network, (ii) uRank, a dynamic document ranking view, (iii) Top Properties, a bar chart visualization that aggregates the results based on their properties, and (iv) a Tag Cloud, showing the most frequently occurring keywords. Moreover, the Adaptive Training Support (ATS) widget supports users learning how to search and provides material suited to their needs (Fessl et al. 2018) and the Recommender System (RS) widget (bridging the front and back ends of the platform) points users to potentially relevant documents by evaluating their last search queries. Thanks to its responsive design, all the views adapt to different screen sizes, automatically changing the layout according to the capabilities of the device.
Advertisement
Private user data and public documents are stored in three separate databases: The web application database holds the data for the communities, the learning environment, and the ATS. The index holds the public documents and generated metadata information such as topics, authors, and extracted entities. The user-interaction tracking captures user interactions with the web application and stores them securely in a third database. User tracking provides additional data for both the ATS and the RS, which form the basis for user support by these two widgets.
The index used by the search interface is populated by various data acquisition components (e.g. web crawlers and a Bibliographic Metadata Injection service), to increase the amount of data accessible through the MOVING platform. To date, it hosts over 22 million documents and metadata records. These records include books, scientific articles, laws and regulations, documents about funding opportunities, videos (e.g. of lectures and tutorials), and social media posts. Data processing components have been incorporated into and applied to these records, to improve the quality of data and make it easier to search. Additional features, the Data Integration Service, Author Name Disambiguation, Deduplication, Named Entity Recognition and Linking, and Video Analysis, all refine and enrich the documents stored in the index.
Author name disambiguation addresses the problem that many author names belong to different real-world authors. To deal with this problem, a novel method (Backes 2018a, b) has been developed which applies, for a given author name, agglomerative clustering on features extracted from documents containing the author mention in question, such as affiliation, co-authors, referenced authors, email addresses, keywords, and publication years. The disambiguation procedure calculates the probability with which author mentions with the same name belong to the same person. Name mentions having a high probability to belong to the same author are assigned a unique internal authorID. By this, authors with the same name are distinguished if they refer to different real-world persons. As a result, users who click on the name of an author of a document in the result list of a search will only see documents from authors who have the same author ID as the selected author (instead of showing all documents authored by any person with that name). A modified version of this method has been applied for document deduplication.
In the following, we present the front end of the MOVING platform in detail, in order to provide a concise summary of what a user can do with it. For details on how individual data processing, data acquisition, and other back-end components work, the interested reader is referred to the relevant publications, such as (Nishioka and Scherp 2016; Galanopoulos and Mezaris 2019; Tzelepis et al. 2018), as well as the documentation available on the MOVING project web site.2
4 The MOVING Web Application
4.1 Search
Search is a key functionality in the MOVING web application. At the back end, the MOVING search engine is based on Elasticsearch,3 given appropriate parameters, and fine-tuned to efficiently index dozens of millions of documents. At the front end, the user sees a search page (Fig. 2), with various search options and filters on the left, visualizations of the results in the centre of the window, and training functionalities such as ATS on the right. The search history of the current user can also be viewed, to support future searches.
An interface of the Moving platform depicts the filter options on the left, search results in the form of a tree diagram in the middle, and training functionalities on the right side of the page.
Fig. 2
MOVING search and results page
×
To enable platform users to view and replicate their previous searches, the search history view is connected with WevQuery (Apaolaza and Vigo 2017). WevQuery serves as an interface to the data generated by UCIVIT (Apaolaza et al. 2013), the tracking tool of which logs user-interaction data. From WevQuery, we get the information about the previous user searches, time when the user performed the search query, and the number of documents retrieved. This information is then utilized to build the search history view, an example of which is shown in Fig. 3.
An interface of the Moving platform depicts the result page for the recent search. The result displays a table with columns titled id, name, query, document, and date last run.
Fig. 3
Search history view
×
To present the results of a user query effectively, several visualizations have been implemented. Four characteristic ones are:
Concept Graph. For the discovery and exploration of relationships between documents and their properties.
uRank. A tool for the interest-driven exploration of search results.
Top Properties. A bar chart displaying aggregated information about the properties of the retrieved documents.
Tag Cloud. A visualization for the analysis of keyword frequency in the retrieved documents.
Concept Graph: an interactive network visualization the Concept Graph (Fig. 4) visualizes direct and indirect connections between retrieved search results. For example, a single, disambiguated author of two different publications is visualized as a node in the graph connecting the corresponding publications. Further extracted and disambiguated entities are visualized in a way that users can grasp, quickly, such as research networks. The initial graph visualization starts with a few collapsed nodes. These nodes can be expanded to visualize initially hidden nodes and to incrementally add more information to the graph. Thus, users are not overwhelmed with too much information when they start their search.
An interface depicts the result of the concept graph. It provides an option to filter by text, edges, node types, and year and displays the two different publications in the form of nodes.
Fig. 4
Concept Graph with opened filter menu
×
uRank: interest-based result set exploration. Based on the search query the top 100 retrieved results are displayed as a ranked list. The keywords extracted from the results are presented in the Tag Cloud in the right sidebar of uRank (Fig. 5, point A). By selecting keywords of interest, the results in the list (Fig. 5, point C) are re-ranked in such a way that the results containing the selected keyword move to the top. The ranking view (Fig. 5, point D) provides visual feedback on the relevance of the result. It is possible to select multiple keywords and even fine-tune their importance by using the slider under the selected words (Fig. 5, point B). Clicking on a result opens a dialogue box, which presents additional information about the retrieved document. The user can export the current view of uRank, with the current search configuration, by clicking on the export button, which initiates the download of a zip file containing an image and a report text file.
An interface depicts the search result in the form of a ranked list. On the left side, it displays document titles and on the right side of the page, it depicts search keywords.
Fig. 5
uRank and its components—(A) tag cloud, (B) tag box, (C) result list, (D) ranking view
×
Top Properties: the Top Properties visualization uses 100 of the most relevant results from the current search query. It shows a bar chart visualization presenting one of the following properties of the available results: Authors, Keywords, Concepts, Sources, and Year of Publication. The results are ordered according to the most frequent values of the selected property, as can be seen in Fig. 6. When the publication year is selected, the sorting order changes so that the years are displayed in chronological order to make it easier to identify year-on-year changes. Clicking on one of the bars shows the results associated with this property in a small dialogue box. The results in this dialogue are sorted in the order provided originally by the search engine. The Top Properties visualization also supports an export functionality, which exports the current view of the visualization with its search configuration.
An interface depicts the result for the occurrence of the selected property for the 100 most relevant results in the form of a horizontal bar chart.
Fig. 6
The Top Properties visualization with the dialogue box showing the result list for a bar of interest
×
Tag Cloud: the Tag Cloud visualization (Fig. 7) retrieves the 100 most relevant results from the search query and displays them by showing the most frequent keywords that occur in the corresponding titles and abstracts. The displayed keywords are initially sorted by their frequency and can be filtered by occurrence, year, or text. Clicking on one of the keywords shows the results associated with this property. The results are sorted in the order provided originally by the search engine.
An interface depicts the result for the tag cloud for the 100 most relevant results. The horizontal scroll bar at the bottom exhibits the year range from 1086 through 2018.
Fig. 7
Tag Cloud visualization with a dialogue box showing the result list for a keyword
×
4.2 Recommender System
The RS widget, depicted in Fig. 8, is part of the search page. It gives users additional suggestions for resources of which they may not be aware. The RS interacts with the search engine, user-interaction tracking, and dashboard (WevQuery), hence bridging the back and front ends of the MOVING platform. To build user profiles, it obtains the search history from the user data previously logged through UCIVIT and then retrieves the documents to suggest from the index, depending on the user’s profile. The MOVING RS is based on HCF-IDF (Nishioka and Scherp 2016), a novel semantic profiling approach that can exploit a thesaurus or ontology to provide better recommendations. Further information on the MOVING RS is available elsewhere (Vagliano and Nazir 2019).
An interface depicts the list of recommended documents. It displays the result for an audio file, a pdf file, and a javascript file.
Fig. 8
The Recommender System widget suggesting three new items to the user: a video, an article, and a web page (Vagliano and Nazir 2019)
×
4.3 Communities
Open collaboration and communication are the foundations of open innovation and open science. MOVING communities offer users a powerful tool to organize group collaboration and communities of practice on the MOVING platform (see Fig. 9). MOVING communities are part of the working environment of the platform and offer a range of social technologies with knowledge and information management, including wikis, forums, blog functions, and group news. MOVING communities are based on the project management tools and technologies of the eScience platform on which the MOVING platform is based. The existing eScience modules, which enabled cooperation in closed teams of researchers, were adapted to the goals of the MOVING platform to provide an open innovation environment and foster open collaboration, communication, and knowledge exchange between its users.
An interface depicts the result for the community page of the platform. It displays the options for the net community, curriculum implementation, digital auditing, and eye tracking in visual analytics recommendations.
Fig. 9
MOVING communities
×
Registered users who want to create a new community are offered different options. First, users can create public communities that are visible to everyone in the MOVING platform and can be accessed and edited by anyone interested in the topic. Second, users who want to organize specific project teams or research groups can create private communities that users have to join before they can access and edit content. Private communities are not visible to other users but can be shared with collaborators via email.
The MOVING CK Editor4 enables the creation of formatted text and the integration of multimedia content in HTML pages that are created by users in the MOVING communities. Videos, pictures, GIFs or documents, and social media content from Twitter5 and YouTube6 can all be easily integrated. Features like the accordion and the option to include expandable items make it easy to structure content in the page. It is a WYSIWYG editor (What You See Is What You Get) so even users that are not familiar with HTML can use it easily to create and edit web-based content within MOVING communities.
The wiki module is useful for creating and collaboratively managing large knowledge repositories with a community. The forum module provides space for open communication and information exchange—a precondition for open innovation processes. The forum module contains a user rating functionality that allows the community to publicly rate the content of individual forum entries. Users can vote posts and replies up and down, based on the quality of the contribution. The highest-rated input is highlighted to help users find the best response in a thread, and the summarized score for all received votes is shown on each user profile. The ranking functionality helps communities self-organize and peer assess user-generated content. Community administrators can also choose to assign badges to reward users or motivate them to get actively engaged. Badges can be assigned automatically or manually.
The ease of user-generated content creation and integration combined with the social features of MOVING communities open up a wide range of possible applications. Users can organize group work in small project teams, or create open communities around scientific or technical topics to discuss research or ask questions to an expert community. MOVING communities can be organized as an open innovation tool but also as a learning management system, as the following example shows.
One practical application of MOVING communities is the four-week MOVING MOOC (massive open online course) Science 2.0 and open research methods that was organized on the MOVING platform (see Fig. 10).7 The MOOC is organized on the platform as a private team community, so that participants have to register to gain access to the learning materials and the forums. For each week of the MOOC, we created a sub-community containing learning materials in different media formats as well as weekly assignments. The forums were used to organize group communication and allow users to share their assignment results. A wiki was created and contained additional information about the course, learning goals, and technical details about using the editor or the MOOC badges that users can earn on the course (Fig. 11). Badges are displayed on the user’s profile, My page, along with their personal and contact details (profile picture, science field, skills, hometown, institution, email, ORCID8).
An interface depicts the result for the Mooc community page. It displays the options to register or sign in in the top right corner. In the middle, it provides the option to enroll.
Fig. 10
MOVING MOOC community
An illustration depicts the badges of the moving platform for all the moving MOOC participants at the top. At the bottom, it displays a badge for open science aficionado.
Fig. 11
MOVING MOOC badges
×
×
4.4 Learning Environment
MOVING offers a unique combination of working and training features in one platform. The heart of the training programme is the MOVING learning environment. Here, all the learning content is organized and directly accessible to the users. The landing page (Fig. 12) gives an overview of the learning materials including the platform demo videos and video tutorials, the Learning Tracks for Information Literacy 2.0, and the MOVING MOOC that was discussed in the previous subsection, Science 2.0 and open research methods. The platform demos are videos hosted on videolectures.net and are embedded in the learning environment so that users can learn about the different platform features and technologies developed within the MOVING project. Users can improve their data and information literacy as well as digital competences through Learning Tracks for Information Literacy 2.0 (Fig. 13).
An interface reads welcome to the moving learning platform. It displays three blocks for moving learning tracks, platform demos, and moving Mooc science and open result methods.
Fig. 12
MOVING learning environment
A welcome page of the moving platform depicts search, communities, learning, contacts, Mooc, my page, and the sign-out option on the top. It displays several other options on the left.
Fig. 13
Start page of Learning Tracks for Information Literacy 2.0
×
×
4.5 Adaptive Training Support
The ATS (Fessl et al. 2018) comprises two widgets for learning how to search and curriculum reflection.
The Learning-how-to-search (Fig. 14) widget visualizes information about the use of features provided by the MOVING platform. The widget presents to users how they used the features of the platform in a bar chart to motivate them to explore new features and reflect about their usage behaviour. More information about the widget and its evaluation can be found in (Fessl et al. 2019).
An interface depicts the bar chart for comparison of the search results for the input interface and result presentation. At the bottom, it displays the button to submit an answer for experience using the result list feature.
Fig. 14
Learning-how-to-search widget: The tracked features are separated into features of the search input interface and search result presentation
×
The curriculum reflection widget (Fessl et al. 2019) consists of two parts: the curriculum learning and reflection and the overall progress. The first part consists of two main areas. The upper area either contains a learning prompt (suggesting that the user learn more about the next topic in the current sub-module) and a button which opens the respective learning unit in a new tab (Fig. 15 left), or it presents a reflective question that motivates the user to think about the current topic of their learning (Fig. 15 right). The user’s progress in the current sub-module is displayed at the bottom of the widget.
An interface depicts the widget to evaluate the information for learning on the left. On the right, the interface depicts the reasons to stop the progress of evaluating information. Both interfaces contain a progress % bar at the bottom.
Fig. 15
Curriculum reflection widget: curriculum learning (left) and reflection (right)
×
The overall progress part of the widget shows the user’s learning progress through the curriculum using a sunburst visualization. Figure 16 shows that the curriculum is divided into three modules. Each module is represented as a section in the inner circle of the visualization and divided into three sub-modules in the outer circle. Every time a user completes a new learning unit, the percentage in the respective section in the sunburst diagram is updated. Progress in each sub-module is encoded by colour. If the user has not completed any learning units in a sub-module (0%), the respective section will be red. Making progress in a sub-module will turn the section yellow (50%) and completing it will turn the section green (100%).
An interface depicts a doughnut chart for overall progress. the chart contains the information for content creation, information and data literacy, and communication and collaboration.
Fig. 16
Overall progress widget: The first module was completed and the second module is in progress
×
This is also explained by the legend below the visualization. Moreover, the sections in the sunburst diagram are ordered to mirror the structure of the curriculum. Starting from the top, the sub-modules are completed clockwise, gradually turning the visualization green.
5 Conclusion
In this chapter, we presented the MOVING platform, focusing on the MOVING web application with its search interface and novel results visualizations, community features and learning environment, and components such Adaptive Training Support. These functionalities help users to not only search within and visualize a large multimedia collection using various advanced tools and functionalities, but also to explore the platform more easily, e.g. by showing statistics about their platform use or providing learning guidance. Productive use of the prototype platform in real educational environments, such as the MOVING MOOC, showed how its integrated training and working environment contributes to making information professionals data-savvy and improving users’ information literacy skills.
Acknowledgements
This work was supported by the EU’s Horizon 2020 programme under grant agreement H2020-693092 MOVING. The mentioned eScience Saxony research network has been supported by the Saxon State Ministry for Science and Art. The Know-Center is funded within the Austrian COMET Programme, Competence Centers for Excellent Technologies, under the auspices of the Austrian Federal Ministry of Transport, Innovation and Technology, the Austrian Federal Ministry of Economy, Family and Youth and by the State of Styria. COMET is managed by the Austrian Research Promotion Agency FFG.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.