Top

2021 | Book

Read chapter Read first chapter

Artificial Intelligence in Music, Sound, Art and Design

10th International Conference, EvoMUSART 2021, Held as Part of EvoStar 2021, Virtual Event, April 7–9, 2021, Proceedings

Editors: Juan Romero, Tiago Martins, Nereida Rodríguez-Fernández

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the refereed proceedings of the 10th European Conference on Artificial Intelligence in Music, Sound, Art and Design, EvoMUSART 2021, held as part of Evo* 2021, as Virtual Event, in April 2021, co-located with the Evo* 2021 events, EvoCOP, EvoApplications, and EuroGP.
The 24 revised full papers and 7 short papers presented in this book were carefully reviewed and selected from 66 submissions. They cover a wide range of topics and application areas, including generative approaches to music and visual art, deep learning, and architecture.

Frontmatter

Long Talks

Frontmatter

Sculpture Inspired Musical Composition

One Possible Approach

In this paper, we present an inspirational system that takes a 3D model of a sculpture as starting point to compose music. It is considered that cross-domain mapping can be an approach to model inspiration. Our approach does not consider the interpretation of the sculpture but rather looks at it abstractly. The results were promising: the majority of the participants gave a classification of 4 out of 5 to the preferred interpretations of the compositions and related them to the respective sculpture. This is a step to a possible model for inspiration.

Francisco Braga, H. Sofia Pinto

Network Bending: Expressive Manipulation of Deep Generative Models

We introduce a new framework for manipulating and interacting with deep generative models that we call network bending. We present a comprehensive set of deterministic transformations that can be inserted as distinct layers into the computational graph of a trained generative neural network and applied during inference. In addition, we present a novel algorithm for analysing the deep generative model and clustering features based on their spatial activation maps. This allows features to be grouped together based on spatial similarity in an unsupervised fashion. This results in the meaningful manipulation of sets of features that correspond to the generation of a broad array of semantically significant features of the generated images. We outline this framework, demonstrating our results on state-of-the-art deep generative models trained on several image datasets. We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.

Terence Broad, Frederic Fol Leymarie, Mick Grierson

SyVMO: Synchronous Variable Markov Oracle for Modeling and Predicting Multi-part Musical Structures

We present SyVMO, an algorithmic extension of the Variable Markov Oracle algorithm, to model and predict multi-part dependencies from symbolic music manifestations. Our model has been implemented as a software application named INCITe for computer-assisted algorithmic composition. It learns variable amounts of musical data from style-agnostic music represented as multiple viewpoints. To evaluate the SyVMO model within INCITe, we adopted the Creative Support Index survey and semi-structured interviews. Four expert composers participated in the evaluation using both personal and exogenous music corpus of variable size. The results suggest that INCITe shows great potential to support creative music tasks, namely in assisting the composition process. The use of SyVMO allowed the creation of polyphonic music suggestions from style-agnostic sources while maintaining a coherent melodic structure.

Nádia Carvalho, Gilberto Bernardes

Identification of Pure Painting Pigment Using Machine Learning Algorithms

This paper reports the implementation of machine learning techniques in the identification of pure painting pigments applying spectral data obtained from both the paint tubes used and the paintings produced by Portuguese artist Amadeo de Souza Cardoso. It illustrates the rationales and advantages behind the application of more accurate artificial mixing by subtractive mixing on the reference pigments as well as the use of Root Mean Square Error (RMSE) for distinguishing especially the mixtures that contain white and black, so that a more holistic machine learning approach can be applied; notably, the experiment of neural network for discerning black and white pigments, which later could be applied for both pure and mixed pigment identification. Other machine learning techniques like Decision Tree and Support Vector Machine are also exploited and compared in terms of the identification of pure pigments. In addition, this paper proposes the solution to the common problem of highly-imbalanced and limited data in the analysis of historical artwork field.

Ailin Chen, Rui Jesus, Márcia Vilarigues

Evolving Neural Style Transfer Blends

Neural style transfer is an image filtering technique used in both digital art practice and commercial software. We investigate blending the styles afforded by neural models via interpolation and overlaying different stylisations. In order to produce preset stylisation filters for the development of a casual creator app, we experiment with various MAP-Elites quality/diversity approaches to evolving style transfer blends with particular properties, while maintaining diversity in the population.

Simon Colton

Evolving Image Enhancement Pipelines

Image enhancement is an image processing procedure in which the original information of the image is improved. It alters an image in several different ways, for instance, by highlighting a specific feature in order to ease post-processing analyses by a human or machine. In this work, we show our approach to image enhancement for digital real-estate-marketing. The aesthetic quality of the images for real-estate marketing is critical since it is the only input clients have once browsing for options. Thus, improving and ensuring the aesthetic quality of the images is crucial for marketing success. The problem is that each set of images, even for the same real-estate item, is often taken under diverse conditions making it hard to find one solution that fits all. State of the art image enhancement pipelines applies a set of filters that solve specific issues, so it is still hard to generalise that solves all types of issues encountered. With this in mind, we propose a Genetic Programming approach for the evolution of image enhancement pipelines, based on image filters from the literature. We report a set of experiments in image enhancement of real state images and analysed the results. The overall results suggest that it is possible to attain suitable pipelines that visually enhance the image and according to a set of image quality assessment metrics. The evolved pipelines show improvements across the validation metrics, showing that it is possible to create image enhancement pipelines automatically. Moreover, during the experiments, some of the created pipelines create non-photorealistic rendering effects in a moment of computational serendipity. Thus, we further analysed the different evolved non-photorealistic solutions, showing the potential of applying the evolved pipelines in other types of images.

João Correia, Leonardo Vieira, Nereida Rodriguez-Fernandez, Juan Romero, Penousal Machado

Genre Recognition from Symbolic Music with CNNs

In this work we study the use of convolutional neural networks for genre recognition in symbolically represented music. Specifically, we explore the effects of changing network depth, width and kernel sizes while keeping the number of trainable parameters and each block’s receptive field constant. We propose an architecture for handling MIDI data which makes use of multiple resolutions of the input, called MuSeReNet - Multiple Sequence Resolution Network. Through our experiments we significantly outperform the state-of-the-art for MIDI genre recognition on the topMAGD and MASD datasets.

Edmund Dervakos, Natalia Kotsani, Giorgos Stamou

Axial Generation: A Concretism-Inspired Method for Synthesizing Highly Varied Artworks

Automated computer generation of aesthetically pleasing artwork has been the subject of research for several decades. The unsolved problem of interest is how to automatically please any audience without too much involvement of the said audience in the process of creation. Two-dimensional pictures have received a lot of attention however, 3D artwork has remained relatively unexplored. This paper introduces the Axial Generation Process (AGP), a versatile generation algorithm that can be employed to create both 2D and 3D items within the Concretism art style. A range of items generated through the AGP were evaluated against a set of formal aesthetic measures. This evaluation shows that the process is capable of generating visually varied items which generally exhibit a diverse range of values across the measures used, in both two and three dimensions.

Edward Easton, Anikó Ekárt, Ulysses Bernardet

Interactive, Efficient and Creative Image Generation Using Compositional Pattern-Producing Networks

In contrast to most recent models that generate an entire image at once, the paper introduces a new architecture for generating images one pixel at a time using a Compositional Pattern-Producing Network (CPPN) as the generator part in a Generative Adversarial Network (GAN), allowing for effective generation of visually interesting images with artistic value, at arbitrary resolutions independent of the dimensions of the training data. The architecture, as well as accompanying (hyper-) parameters, for training CPPNs using recent GAN stabilisation techniques is shown to generalise well across many standard datasets. Rather than relying on just a latent noise vector (entangling various features with each other), mutual information maximisation is utilised to get disentangled representations, removing the requirement to use labelled data and giving the user control over the generated images. A web application for interacting with pre-trained models was also created, unique in the offered level of interactivity with an image-generating GAN.

Erlend Gjesteland Ekern, Björn Gambäck

Aesthetic Evaluation of Cellular Automata Configurations Using Spatial Complexity and Kolmogorov Complexity

This paper addresses the computational notion of aesthetics in the framework of multi-state two-dimensional cellular automata (2D CA). The measure of complexity is a core concept in computational approaches to aesthetics. Shannon’s information theory provided an objective measure of complexity, which led to the emergence of various informational theories of aesthetics. However, entropy fails to take into account the spatial characteristics of 2D patterns; these characteristics are fundamental in addressing the aesthetic problem in general, and of CA-generated patterns in particular. We propose two empirically evaluated alternative measures of complexity, taking into account the spatial characteristics of 2D patterns along with experimental studies on human aesthetic perception in the visual domain. The first model, spatial complexity, is based on the probabilistic spatial distribution of neighbouring cells over the lattice of a multi-state 2D cellular automaton. The second model is based on algorithmic information theory (Kolmogorov complexity) which is extended to estimate the complexity of 2D patterns. The spatial complexity measure presents performance advantage over information-theoretic models enabling more accurate measurement of complexity in relation to aesthetic evaluations of 2D patterns. The results of experimentation demonstrate the presence of correlation between the models and aesthetic judgements of experimental 2D patterns.

Mohammad Ali Javaheri Javid

Auralization of Three-Dimensional Cellular Automata

An auralization tool for exploring three-dimensional cellular automata is presented. This proof-of-concept allows the creation of a sound field comprising individual sound events associated with each cell in a three-dimensional grid. Each sound-event is spatialized depending on the orientation of the listener relative to the three-dimensional model. Users can listen to all cells simultaneously or in sequential slices at will. Conceived to be used as an immersive Virtual Reality (VR) scene, this software application also works as a desktop application for environments where the VR infrastructure is missing. Subjective evaluations indicate that the proposed sonification increases the perceived quality and immersability of the system with respect to a visualization-only system. No subjective differences between the sequential or simultaneous presentations were found.

Yuta Kariyado, Camilo Arevalo, Julián Villegas

Chord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute Prediction

Natural language processing methods have been applied in a variety of music studies, drawing the connection between music and language. In this paper, we expand those approaches by investigating chord embeddings, which we apply in two case studies to address two key questions: (1) what musical information do chord embeddings capture?; and (2) how might musical applications benefit from them? In our analysis, we show that they capture similarities between chords that adhere to important relationships described in music theory. In the first case study, we demonstrate that using chord embeddings in a next chord prediction task yields predictions that more closely match those by experienced musicians. In the second case study, we show the potential benefits of using the representations in tasks related to musical stylometrics.

Allison Lahnala, Gauri Kambhatla, Jiajun Peng, Matthew Whitehead, Gillian Minnehan, Eric Guldan, Jonathan K. Kummerfeld, Anıl Çamcı, Rada Mihalcea

Convolutional Generative Adversarial Network, via Transfer Learning, for Traditional Scottish Music Generation

The concept of a Binary Multi-track Sequential Generative Adversarial Network (BinaryMuseGAN) used for the generation of music has been applied and tested for various types of music. However, the concept is yet to be tested on more specific genres of music such as traditional Scottish music, for which extensive collections are not readily available. Hence exploring the capabilities of a Transfer Learning (TL) approach on these types of music is an interesting challenge for the methodology. The curated set of MIDI Scottish melodies was preprocessed in order to obtain the same number of tracks used in the BinaryMuseGAN model; converted into pianoroll format and then used as a training set to fine tune a pretrained model, generated from the Lakh MIDI dataset. The results obtained have been compared with the results obtained by training the same GAN model from scratch on the sole Scottish music dataset. Results are presented in terms of variation and average performances achieved at different epochs for five performance metrics, three adopted from the Lakh dataset (qualified note rate, polyphonicity, tonal distance) and two custom defined to highlight Scottish music characteristics (dotted rhythm and pentatonic note). From these results, the TL method shows to be more effective, with lower number of epochs, to converge stably and closely to the original dataset reference metrics values.

Francesco Marchetti, Callum Wilson, Cheyenne Powell, Edmondo Minisci, Annalisa Riccardi

The Enigma of Complexity

In this paper we examine the concept of complexity as it applies to generative art and design. Complexity has many different, discipline specific definitions, such as complexity in physical systems (entropy), algorithmic measures of information complexity and the field of “complex systems”. We apply a series of different complexity measures to three different generative art datasets and look at the correlations between complexity and individual aesthetic judgement by the artist (in the case of two datasets) or the physically measured complexity of 3D forms. Our results show that the degree of correlation is different for each set and measure, indicating that there is no overall “better” measure. However, specific measures do perform well on individual datasets, indicating that careful choice can increase the value of using such measures. We conclude by discussing the value of direct measures in generative and evolutionary art, reinforcing recent findings from neuroimaging and psychology which suggest human aesthetic judgement is informed by many extrinsic factors beyond the measurable properties of the object being judged.

Jon McCormack, Camilo Cruz Gambardella, Andy Lomas

SerumRNN: Step by Step Audio VST Effect Programming

Learning to program an audio production VST synthesizer is a time consuming process, usually obtained through inefficient trial and error and only mastered after years of experience. As an educational and creative tool for sound designers, we propose SerumRNN: a system that provides step-by-step instructions for applying audio effects to change a user’s input audio towards a desired sound. We apply our system to Xfer Records Serum: currently one of the most popular and complex VST synthesizers used by the audio production community. Our results indicate that SerumRNN is consistently able to provide useful feedback for a variety of different audio effects and synthesizer presets. We demonstrate the benefits of using an iterative system and show that SerumRNN learns to prioritize effects and can discover more efficient effect order sequences than a variety of baselines.

Christopher Mitcheltree, Hideki Koike

Parameter Tuning for Wavelet-Based Sound Event Detection Using Neural Networks

Wavelet-based audio processing is used for sound event detection. The low-level audio features (timbral or temporal features) are found to be effective to differentiate between different sound events and that is why frequency processing algorithms have become popular in recent times. Wavelet based sound event detection is found effective to detect sudden onsets in audio signals because it offers unique advantages compared to traditional frequency-based sound event detection using machine learning approaches. In this work, wavelet transform is applied to the audio to extract audio features which can predict the occurrence of a sound event using a classical feedforward neural network. Additionally, this work attempts to identify the optimal wavelet parameters to enhance classification performance. 3 window sizes, 6 wavelet families, 4 wavelet levels, 3 decomposition levels and 2 classifier models are used for experimental analysis. The UrbanSound8k data is used and a classification accuracy up to 97% is obtained. Some major observations with regard to parameter-estimation are as follows: wavelet level and wavelet decomposition level should be low; it is desirable to have a large window; however, the window size is limited by the duration of the sound event. A window size greater than the duration of the sound event will decrease classification performance. Most of the wavelet families can classify the sound events; however, using Symlet, Daubechies, Reverse biorthogonal and Biorthogonal families will save computational resources (lesser epochs) because they yield better accuracy compared to Fejér-Korovkin and Coiflets. This work conveys that wavelet-based sound event detection seems promising, and can be extended to detect most of the common sounds and sudden events occurring at various environments.

Pallav Raval, Jabez Christopher

Raga Recognition in Indian Classical Music Using Deep Learning

Raga is central to Indian Classical Music in both Carnatic Music as well as Hindustani Music. The benefits of identifying raga from audio are related but not limited to the fields of Music Information Retrieval, content-based filtering, teaching-learning and so on. A deep learning and signal processing based approach is presented in this work in order to recognise the raga from the audio source using raw spectrograms. The proposed preprocessing steps and models achieved 98.98% testing accuracy on a subset of 10 ragas in CompMusic dataset. A thorough study on the effect of various hyperparameters, sound source separation and silent part removal is carried out and is listed with results and findings of the same. A discussion on the predictions and behavior of deep learning models on audio apart from the dataset is also carried out. This new approach yields promising results and gives real time predictions from raw audio or recordings.

Devansh P. Shah, Nikhil M. Jagtap, Prathmesh T. Talekar, Kiran Gawande

The Simulated Emergence of Chord Function

In this paper, we propose an autonomous, unsupervised learning of chord classification, based on the Neural Hidden Markov Model (HMM), and extend it to the Semi-Makov Model (HSMM) to integrate such additional contexts as the pitch-class histogram, the beat positions, and the preceding chord sequences. We train our model on a minimally pre-processed dataset in a mixture of major/minor pieces, expecting the models to learn the chord clusters in accordance with the contexts without assignment of tonality. Thereafter, we evaluate their performance by perplexity, and show that the added contexts would considerably improve the efficiency. In addition, we show that the proposed model reflects the context of major and minor in its state transitions, even though trained in mixed tonality.

Yui Uehara, Satoshi Tojo

Incremental Evolution of Stylized Images

This paper examines and showcases a framework to generate artworks using evolutionary algorithms. Based on the idea of an abstract artistic process stylized images are generated from different input images without human supervision. After explaining the underlying concept, the solution space of different styles is explored and its properties are discussed. Given this insights into the framework, current shortcomings are evaluated and improvements are discussed.

Florian Uhde

Dissecting Neural Networks Filter Responses for Artistic Style Transfer

Current developments in the field of Artistic Style Transfer use the information encoded in pre-trained neural networks to extract properties from images in an unsupervised process. This neural style transfer works well with art and paintings but only produces limited results when dealing with highly structured data. Characteristics of the extracted information directly define the quality of the generated artifact and traditionally require the user to do manual fine-tuning. This paper uses current methods of deep learning to analyze the properties embedded in the network, group filter responses into semantic classes and extract an optimized layer set for artistic style transfer, to improve the artifact generation with a potentially unsupervised preprocessing step.

Florian Uhde, Sanaz Mostaghim

A Fusion of Deep and Shallow Learning to Predict Genres Based on Instrument and Timbre Features

Deep neural networks have recently received a lot of attention and have very successfully contributed to many music classification tasks. However, they have also drawbacks compared to the traditional methods: a very high number of parameters, a decreased performance for small training sets, lack of model interpretability, long training time, and hence a larger environmental impact with regard to computing resources. Therefore, it can still be a better choice to apply shallow classifiers for a particular application scenario with specific evaluation criteria, like the size of the training set or a required interpretability of models. In this work, we propose an approach based on both deep and shallow classifiers for music genre classification: The convolutional neural networks are trained once to predict instruments, and their outputs are used as features to predict music genres with a shallow classifier. The results show that the individual performance of such descriptors is comparable to other instrument-related features and they are even better for more than half of 19 genre categories.

Igor Vatolkin, Benedikt Adrian, Jurij Kuzmic

A Multi-objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation

The goal of automatic music segmentation is to calculate boundaries between musical parts or sections that are perceived as semantic entities. Such sections are often characterized by specific musical properties such as instrumentation, dynamics, tempo, or rhythm. Recent data-driven approaches often phrase music segmentation as a binary classification problem, where musical cues for identifying boundaries are learned implicitly. Complementary to such methods, we present in this paper an approach for identifying relevant audio features that explain the presence of musical boundaries. In particular, we describe a multi-objective evolutionary feature selection strategy, which simultaneously optimizes two objectives. In a first setting, we reduce the number of features while maximizing an F-measure. In a second setting, we jointly maximize precision and recall values. Furthermore, we present extensive experiments based on six different feature sets covering different musical aspects. We show that feature selection allows for reducing the overall dimensionality while increasing the segmentation quality compared to full feature sets, with timbre-related features performing best.

Igor Vatolkin, Marcel Koch, Meinard Müller

Exploring the Effect of Sampling Strategy on Movement Generation with Generative Neural Networks

When using generative deep neural networks for creative applications it is common to explore multiple sampling approaches. This sampling stage is a crucial step, as choosing suitable sampling parameters can make or break the realism and perceived creative merit of the output. The process of selecting the correct sampling parameters is often task-specific and under-reported in many publications, which can make the reproducibility of the results challenging. We explore some of the most common sampling techniques in the context of generating human body movement, specifically dance movement, and attempt to shine a light on their advantages and limitations. This work presents a Mixture Density Recurrent Neural Network (MDRNN) trained on a dataset of improvised dance motion capture data from which it is possible to generate novel movement sequences. We outline several common sampling strategies for MDRNNs and examine these strategies systematically to further understand the effects of sampling parameters on motion generation. This analysis provides evidence that the choice of sampling strategy significantly affects the output of the model and supports the use of this model in creative applications. Building an understanding of the relationship between sampling parameters and creative machine-learning outputs could aid when deciding between different approaches in generation of dance motion and other creative applications.

Benedikte Wallace, Charles P. Martin, Jim Tørresen, Kristian Nymoen

“A Good Algorithm Does Not Steal – It Imitates”: The Originality Report as a Means of Measuring When a Music Generation Algorithm Copies Too Much

Research on automatic music generation lacks consideration of the originality of musical outputs, creating risks of plagiarism and/or copyright infringement. We present the originality report – a set of analyses for measuring the extent to which an algorithm copies from the input music on which it is trained. First, a baseline is constructed, determining the extent to which human composers borrow from themselves and each other in some existing music corpus. Second, we apply a similar analysis to musical outputs of runs of MAIA Markov and Music Transformer generation algorithms, and compare the results to the baseline. Third, we investigate how originality varies as a function of Transformer’s training epoch. Results from the second analysis indicate that the originality of Transformer’s output is below the 95%-confidence interval of the baseline. Musicological interpretation of the analyses shows that the Transformer model obtained via the conventional stopping criteria produces single-note repetition patterns, resulting in outputs of low quality and originality, while in later training epochs, the model tends to overfit, producing copies of excerpts of input pieces. We recommend the originality report as a new means of evaluating algorithm training processes and outputs in future, and question the reported success of language-based deep learning models for music generation. Supporting materials (code, dataset) will be made available via https://osf.io/96emr/ .

Zongyu Yin, Federico Reuben, Susan Stepney, Tom Collins

Short Talks

Frontmatter

From Music to Image a Computational Creativity Approach

In this paper we propose a possible approach for a cross-domain association between the musical and visual domains. We present a system that generates abstract images having as inspiration music files as the basis for the creative process. The system extracts available features from a MIDI music file given as input, associating them to visual characteristics, thus generating three different outputs. First, the Random and Associated Images - that result from the application of our approach considering different shape’s distribution - and second, the Genetic Image, that is the result of the application of one Genetic Algorithm that considers music and color theory while searching for better results. The results of our evaluation conducted through online surveys demonstrate that our system is capable of generating abstract images from music, since a majority of users consider the images to be abstract, and that they have a relation with the music that served as the basis for the association process. Moreover, the majority of the participants ranked highest the Genetic Image.

Luís Aleixo, H. Sofia Pinto, Nuno Correia

“What Is Human?” A Turing Test for Artistic Creativity

This paper presents a study conducted in naturalistic setting with data collected from an interactive art installation. The audience is challenged in a Turing Test for artistic creativity involving recognising human-made versus AI-generated drawing strokes. In most cases, people were able to differentiate human-made strokes above chance. An analysis conducted on the images at the pixel level shows a significant difference between the symmetry of the AI-generated strokes and the human-made ones. However we argue that this feature alone was not key for the differentiation. Further behavioural analysis indicates that people judging more quickly were able to differentiate human-made strokes significantly better than the slower ones. We point to theories of embodiment as a possible explanation of our results.

Antonio Daniele, Caroline Di Bernardi Luft, Nick Bryan-Kinns

Mixed-Initiative Level Design with RL Brush

This paper introduces RL Brush, a level-editing tool for tile-based games designed for mixed-initiative co-creation. The tool uses reinforcement-learning-based models to augment manual human level-design through the addition of AI-generated suggestions. Here, we apply RL Brush to designing levels for the classic puzzle game Sokoban. We put the tool online and tested it in 39 different sessions. The results show that users using the AI suggestions stay around longer and their created levels on average are more playable and more complex than without.

Omar Delarosa, Hang Dong, Mindy Ruan, Ahmed Khalifa, Julian Togelius

Creating a Digital Mirror of Creative Practice

This paper describes an ongoing project to create a “digital mirror” to my practice as a composer of contemporary classical music; that is, a system that takes descriptions (in code) of aspects of that practice, and reflects them back as computer-generated realisations. The paper describes the design process of this system, explains how it is implemented, and gives some examples of the material that it generates. The paper further discusses some broader issues about the technological approach to building creative systems, in particular the advantages and disadvantages of building bespoke algorithms for generating creative content vs. the use of optimisation or learning from examples.

Colin G. Johnson

An Application for Evolutionary Music Composition Using Autoencoders

This paper presents a new interactive application that can generate music according to a user’s preferences inspired by the process of biological evolution. The application composes sets of songs that the user can choose from as a basis for the algorithm to evolve new music. By selecting preferred songs over successive generations, the application allows the user to explore an evolutionary musical space. The system combines autoencoder neural networks and evolution with human feedback to produce music. The autoencoder component is used to capture the essence of musical structure from a known sample of songs in a lower-dimensional space. Evolution is then applied over this representation to create new pieces based upon previously generated songs the user enjoys. In this research, we introduce the application design and explore and analyse the autoencoder model. The songs produced by the application are also analysed to confirm that the underlying model has the ability to create a diverse range of music. The application can be used by composers working with dynamically generated music, such as for video games and interactive media.

Robert Neil McArthur, Charles Patrick Martin

A Swarm Grammar-Based Approach to Virtual World Generation

In this work we formulate and propose an extended version of the multi-agent Swarm Grammar (SG) model for the generation of virtual worlds. It unfolds a comparatively small database into a complex world featuring terrain, vegetation and bodies of water. This approach allows for adaptivity of generated assets to their environment, unbounded worlds and interactivity in their generation. In order to evaluate the model, we conducted sensitivity analyses at a local interaction scale. In addition, at a global scale, we investigated two virtual environments, discussing notable interactions, recurring configuration patterns, and obstacles in working with SGs. These analyses showed that SGs can create visually interesting virtual worlds, but require further work in ease of use. Lastly we identified which future extensions might shrink required database sizes.

Yasin Raies, Sebastian von Mammen

Co-creative Drawing with One-Shot Generative Models

This paper presents and evaluates co-creative drawing scenarios in which a user is asked to provide a small hand-drawn pattern which then is interactively extended with the support of a trained neural model. We show that it is possible to use one-shot trained Transformer Neural Networks to generate stroke-based images and that these trained models can successfully be used for design assisting tasks.

Sabine Wieluch, Friedhelm Schwenker

Backmatter

Title: Artificial Intelligence in Music, Sound, Art and Design
Editors: Juan Romero
Tiago Martins
Nereida Rodríguez-Fernández
Publisher: Springer International Publishing
Electronic ISBN: 978-3-030-72914-1
Print ISBN: 978-3-030-72913-4
DOI: https://doi.org/10.1007/978-3-030-72914-1

Springer Professional

About this book

Table of Contents

Frontmatter

Long Talks

Frontmatter

Sculpture Inspired Musical Composition

Network Bending: Expressive Manipulation of Deep Generative Models

SyVMO: Synchronous Variable Markov Oracle for Modeling and Predicting Multi-part Musical Structures

Identification of Pure Painting Pigment Using Machine Learning Algorithms

Evolving Neural Style Transfer Blends

Evolving Image Enhancement Pipelines

Genre Recognition from Symbolic Music with CNNs

Axial Generation: A Concretism-Inspired Method for Synthesizing Highly Varied Artworks

Interactive, Efficient and Creative Image Generation Using Compositional Pattern-Producing Networks

Aesthetic Evaluation of Cellular Automata Configurations Using Spatial Complexity and Kolmogorov Complexity

Auralization of Three-Dimensional Cellular Automata

Chord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute Prediction

Convolutional Generative Adversarial Network, via Transfer Learning, for Traditional Scottish Music Generation

The Enigma of Complexity

SerumRNN: Step by Step Audio VST Effect Programming

Parameter Tuning for Wavelet-Based Sound Event Detection Using Neural Networks

Raga Recognition in Indian Classical Music Using Deep Learning

The Simulated Emergence of Chord Function

Incremental Evolution of Stylized Images

Dissecting Neural Networks Filter Responses for Artistic Style Transfer

A Fusion of Deep and Shallow Learning to Predict Genres Based on Instrument and Timbre Features

A Multi-objective Evolutionary Approach to Identify Relevant Audio Features for Music Segmentation

Exploring the Effect of Sampling Strategy on Movement Generation with Generative Neural Networks

“A Good Algorithm Does Not Steal – It Imitates”: The Originality Report as a Means of Measuring When a Music Generation Algorithm Copies Too Much

Short Talks

Frontmatter

From Music to Image a Computational Creativity Approach

“What Is Human?” A Turing Test for Artistic Creativity

Mixed-Initiative Level Design with RL Brush

Creating a Digital Mirror of Creative Practice

An Application for Evolutionary Music Composition Using Autoencoders

A Swarm Grammar-Based Approach to Virtual World Generation

Co-creative Drawing with One-Shot Generative Models

Backmatter

Premium Partner