Artificial Intelligence in Music, Sound, Art and Design
12th International Conference, EvoMUSART 2023, Held as Part of EvoStar 2023, Brno, Czech Republic, April 12–14, 2023, Proceedings
- 2023
- Buch
- Herausgegeben von
- Colin Johnson
- Nereida Rodríguez-Fernández
- Sérgio M. Rebelo
- Buchreihe
- Lecture Notes in Computer Science
- Verlag
- Springer Nature Switzerland
Über dieses Buch
Dieses Buch stellt die referierten Vorträge der 12. Europäischen Konferenz über künstliche Intelligenz in Musik, Sound, Kunst und Design EvoMUSART 2023 dar, die im Rahmen von Evo * 2023 im April 2023 stattfand und mit den Veranstaltungen Evo * 2023, EvoCOP, EvoApplications und EuroGP.Die 20 vollständigen und 7 kurzen Vorträge in diesem Buch wurden sorgfältig geprüft und aus 55 Einreichungen ausgewählt. Sie decken ein breites Spektrum an Themen und Anwendungsbereichen künstlicher Intelligenz ab, darunter generative Ansätze zu Musik und bildender Kunst, Deep Learning und Architektur.
Mit KI übersetzt
Über dieses Buch
This book constitutes the refereed proceedings of the 12th European Conference on Artificial Intelligence in Music, Sound, Art and Design, EvoMUSART 2023, held as part of Evo* 2023, in April 2023, co-located with the Evo* 2023 events, EvoCOP, EvoApplications, and EuroGP.
The 20 full papers and 7 short papers presented in this book were carefully reviewed and selected from 55 submissions. They cover a wide range of topics and application areas of artificial intelligence, including generative approaches to music and visual art, deep learning, and architecture.
Anzeige
Inhaltsverzeichnis
-
Frontmatter
-
Long Talks
-
Frontmatter
-
LooperGP: A Loopable Sequence Model for Live Coding Performance Using GuitarPro Tablature
Sara Adkins, Pedro Sarmento, Mathieu BarthetAbstractDespite their impressive offline results, deep learning models for symbolic music generation are not widely used in live performances due to a deficit of musically meaningful control parameters and a lack of structured musical form in their outputs. To address these issues we introduce LooperGP, a method for steering a Transformer-XL model towards generating loopable musical phrases of a specified number of bars and time signature, enabling a tool for live coding performances. We show that by training LooperGP on a dataset of 93,681 musical loops extracted from the DadaGP dataset [22], we are able to steer its generative output towards generating 3x as many loopable phrases as our baseline. In a subjective listening test conducted by 31 participants, LooperGP loops achieved positive median ratings in originality, musical coherence and loop smoothness, demonstrating its potential as a performance tool. -
Chordal Embeddings Based on Topology of the Tonal Space
Anton Ayzenberg, Maxim Beketov, Aleksandra Burashnikova, German Magai, Anton Polevoi, Ivan Shanin, Konstantin SorokinAbstractIn the classical western musical tradition, the mutual simultaneous appearance of two tones in a melody is determined by harmony, i.e. the ratio of their frequencies. To perform NLP-based methods for MIDI file analysis, one needs to construct vector embeddings of chords, taking mutual harmonicity into account. Previous works utilising this idea were based on the notion of Euler’s Tonnetz. Being a beautiful topological model describing consonance relations in music, the classical Tonnetz has a certain disadvantage in that it forgets particular octaves. In this paper, we introduce the mathematical generalisation of Tonnetz taking octaves into account. Based on this model, we introduce several types of metrics on chords and use them to construct chordal embeddings. These embeddings are tested on two types of tasks: the chord estimation task, based on the Harmony Transformer model, and the music generation task, provided on the basis of TonicNet. -
Music Generation with Multiple Ant Colonies Interacting on Multilayer Graphs
Lluc Bono Rosselló, Hugues BersiniAbstractWe propose a methodology for music generation that makes use of Ant Colony Optimization (ACO) algorithms on multilayer graphs. In our methodology we first define a new multilayer graph model of music that contains several voices musical works. Then, we adapt ACO algorithms to allow multiple ant colonies to generate solutions on each layer while interacting with each other. This methodology is illustrated with some example configurations that show how music emerge as a result of the interaction of different simultaneous ant colony optimization instances. -
Automatically Adding to Artistic Cultures
Simon Colton, Berker BanarAbstractWe consider how generative AI systems could and should evolve from being used as co-creative tools for artefact generation to co-creative collaborators for enhancing cultural knowledge, via artefact generation. We argue that while generative deep learning techniques and ethos have many drawbacks in this respect, neuro-symbolic approaches and increased AI autonomy could improve matters so that AI systems may be able to add to cultural knowledge directly. To do this, we consider what cultural knowledge is, differences between scientific and artistic applications of generative AI, stakeholders and ethical issues. We also present a case study in decision foregrounding for generative music. -
Extending Generative Neo-Riemannian Theory for Event-Based Soundtrack Production
Simon Colton, Sara CardinaleAbstractWe present the GENRT music generation system specifically designed for making soundtracks to fit given media such as video clips. This is based on Neo-Riemannian Theory (NRT), an analytical approach to describing chromatic chord progressions. We describe the implementation of GENRT in terms of a generative NRT formalism, which produces suitable chord sequences to fit the timing and atmosphere requirements of the media. We provide an illustrative example using GENRT to produce a soundtrack for a clip from the film A Beautiful Mind. -
Is Beauty in the Age of the Beholder?
Edward Easton, Ulysses Bernardet, Anikó EkártAbstractSymmetry is a universal concept, its unique importance has made it a topic of research across many different fields. It is often considered as a constant where higher levels of symmetry are preferred in the judgement of faces and even the initial state of the universe is thought to have been in pure symmetry. The same is true in the judgement of auto-generated art, with symmetry often used alongside complexity to generate aesthetically pleasing images; however, these are two of many different aspects contributing to aesthetic judgement, each one of these aspects is also influenced by other aspects, for example, art expertise. These intricacies cause multiple problems such as making it difficult to describe aesthetic preferences and to auto-generate artwork using a high number of these aspects. In this paper, a gamified approach is presented which is used to elicit the preferences of symmetry levels for individuals and further understand how symmetry can be utilised within the context of automatically generating artwork. The gamified approach is implemented within an experiment with participants aged between 13 and 60, providing evidence that symmetry should be kept consistent within an evolutionary art context. -
Extending the Visual Arts Experience: Sonifying Paintings with AI
Thomas Fink, Alkim Almila Akdag SalahAbstractSonification of visual information is a relatively new research line that aims to create a new way to access and experience visual displays, especially for the visually impaired. When applied to artworks, sonification needs to translate the aesthetic experience as well. This is attempted via a handful studies in the literature, where most of the transformation and music generation is done manually, or only by using the low level visual features of artworks. In this paper, we present a sonification model that uses both low level and high level features such as color, edge information, saliency, object and scene detection to create a pleasant and descriptive sonification of artworks with the use of a fully automatic pipeline. The results of the model are tested via interviews done with experts in music theory and generative music models. We found a high agreement among experts for the evaluation of a small set of sonified paintings. Addition of high level features such as sounds extracted from the scene played a big role in this. Among the challenges observed during the interviews was the need to add emotion and mood information as well as semantic information to the sonification in order to create more descriptive melodies and sounds. The complexity and ambiguity of the visual information generated the most disagreement among experts both in their interpretation of the paintings as well as their sonifications. -
Application of Neural Architecture Search to Instrument Recognition in Polyphonic Audio
Leonard Fricke, Igor Vatolkin, Fabian OstermannAbstractInstrument recognition in polyphonic audio signals is a very challenging classification task. It helps to improve related application scenarios, like music transcription and recommendation, organization of large music collections, or analysis of historical trends and properties of musical styles. Recently, the classification performance could be improved by the integration of deep convolutional neural networks. However, in to date published studies, the network architectures and parameter settings were usually adopted from image recognition tasks and manually adjusted, without a systematic optimization. In this paper, we show how two different neural architecture search strategies can be successfully applied for improvement of the prediction of nine instrument classes, significantly outperforming the classification performance of three fixed baseline architectures from previous works. Although high computing efforts for model optimization are required, the training of the final architecture is done only once for later prediction of instruments in a possibly unlimited number of musical tracks. -
AI-rmonies of the Spheres
Adrián García Riber, Francisco SerradillaAbstractThanks to the efforts and cooperation of the international community, nowadays it is possible to analyze astronomical data captured by the observatories and telescopes of major space agencies around the world from a personal computer. The development of virtual observatory technology (VO), and the standardization of the formats it uses, allow professional and amateur astronomers to access astronomical data and images through internet with relative ease. Immersed in this environment of global accessibility, this article presents an astronomical data-driven unsupervised music composition system based on Deep Learning, aimed at offering an automatic and objective review on the classical topic of the Harmonies of the Spheres. The system explores the MILES stellar library from the Spanish Virtual Observatory (SVO) using a variational autoencoder architecture to cross-match its stellar spectra via Pitch-Class Set Theory with a music score generated by a LSTM with attention neural network in the style of late-renaissance music. -
SUNMASK: Mask Enhanced Control in Step Unrolled Denoising Autoencoders
Kyle Kastner, Tim Cooijmans, Yusong Wu, Aaron CourvilleAbstractThis paper introduces SUNMASK, an approach for generative sequence modeling based on masked unrolled denoising autoencoders. By explicitly incorporating a conditional masking variable, as well as using this mask information to modulate losses during training based on expected exemplar difficulty, SUNMASK models discrete sequences without direct ordering assumptions. The addition of masking terms allows for fine-grained control during generation, starting from random tokens and a mask over subset variables, then predicting tokens which are again combined with a subset mask for subsequent repetitions. This iterative process gradually improves token sequences toward a structured output, while guided by proposal masks. The broad framework for unrolled denoising autoencoders is largely independent of model type, and we utilize both transformer and convolution based architectures in this work. We demonstrate the efficacy of this approach both qualitatively and quantitatively, applying SUNMASK to generative modeling of symbolic polyphonic music, and language modeling for English text. -
SketchSynth: Cross-Modal Control of Sound Synthesis
Sebastian Löbbers, Louise Thorpe, György FazekasAbstractThis paper introduces a prototype of SketchSynth, a system that enables users to graphically control synthesis using sketches of cross-modal associations between sound and shape. The development is motivated by finding alternatives to technical synthesiser controls to enable a more intuitive realisation of sound ideas. There is strong evidence that humans share cross-modal associations between sound and shapes, and recent studies found similar patterns when humans represent sound graphically. Compared to similar cross-modal mapping architectures, this prototype uses a deep classifier that predicts the character of a sound rather than a specific sound. The prediction is then mapped onto a semantically annotated FM synthesiser dataset. This approach allows for a perceptual evaluation of the mapping model and gives the possibility to be combined with various sound datasets. Two models based on architectures commonly used for sketch recognition were compared, convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In an evaluation study, 62 participants created sketches from prompts and rated the predicted audio output. Both models were able to infer sound characteristics on which they were trained with over 84% accuracy. Participant ratings were significantly higher than the baseline for some prompts, but revealed a potential weak point in the mapping between classifier output and FM synthesiser. The prototype provides the basis for further development that, in the next step, aims to make SketchSynth available online to be explored outside of a study environment. -
Towards the Evolution of Prompts with MetaPrompter
Tiago Martins, João M. Cunha, João Correia, Penousal MachadoAbstractThe dissemination of open-source text-to-image generative models and the increasing quality of their output has led to a growth in interest in the field. The quality of the images greatly depends on the prompt used, i.e. a phrase that includes descriptive terms to be used as input on text-to-image model. However, choosing the right prompt is a complex task, often relying on a trial-and-error approach. In this paper, we introduce an evolutionary approach to prompt generation where users begin by creating a blueprint for what might be a candidate prompt and then initiate an evolutionary process to interactively explore the space of prompts encoded by the initial blueprint and according to their preferences. Our work is a step towards a more dynamic and interactive way to generate prompts that lead to a wide variety of visual outputs, with which users can easily obtain prompts that match their goals. -
Is Writing Prompts Really Making Art?
Jon McCormack, Camilo Cruz Gambardella, Nina Rajcic, Stephen James Krol, Maria Teresa Llano, Meng YangAbstractIn recent years Generative Machine Learning systems have advanced significantly. A current wave of generative systems use text prompts to create complex imagery, video, even 3D datasets. The creators of these systems claim a revolution in bringing creativity and art to anyone who can type a prompt. In this position paper, we question the basis for these claims, dividing our analysis into three areas: the limitations of linguistic descriptions, implications of the dataset, and lastly, matters of materiality and embodiment. We conclude with an analysis of the creative possibilities enabled by prompt-based systems, asking if they can be considered a new artistic medium. -
Using GPT-3 to Achieve Semantically Relevant Data Sonificiation for an Art Installation
Rodolfo Ocampo, Josh Andres, Adrian Schmidt, Caroline Pegram, Justin Shave, Charlton Hill, Brendan Wright, Oliver BownAbstractLarge Language Models such as GPT-3 exhibit generative language capabilities with multiple potential applications in creative practice. In this paper, we present a method for data sonification that employs the GPT-3 model to create semantically relevant mappings between artificial intelligence-generated natural language descriptions of data, and human-generated descriptions of sounds. We implemented this method in a public art installation to generate a soundscape based on data from different systems. While common sonification approaches rely on arbitrary mappings between data values and sonic values, our approach explores the use of language models to achieve a mapping not via values but via meaning. We find our approach is a useful tool for musification practice and demonstrates a new application of generative language models in creative new media arts practice. We show how different prompts influence data to sound mappings, and highlight that matching the embeddings of texts of different lengths produces undesired behavior. -
Using Autoencoders to Generate Skeleton-Based Typography
Jéssica Parente, Luís Gonçalo, Tiago Martins, João Miguel Cunha, João Bicker, Penousal MachadoAbstractType Design is a domain that multiple times has profited from the emergence of new tools and technologies. The transformation of type from physical to digital, the dissemination of font design software and the adoption of web typography make type design better known and more accessible. This domain has received an even greater push with the increasing adoption of generative tools to create more diverse and experimental fonts. Nowadays, with the application of Machine Learning to various domains, typography has also been influenced by it. In this work, we produce a dataset by extracting letter skeletons from a collection of existing fonts. Then we trained a Variational Autoencoder and a Sketch Decoder to learn to create these skeletons that can be used to generate new ones by exploring the latent space. This process also allows us to control the style of the resulting skeletons and interpolate between different characters. Finally, we developed new glyphs by filling the generated skeletons based on the original letters’ stroke width and showing some applications of the results. -
Visual Representation of the Internet Consumption in the European Union
Telma Rodrigues, Catarina Maçãs, Ana RodriguesAbstractThe impact of internet usage on the environment is a contradictory topic. While it can help reduce carbon emissions, with smart grids or the automation of services and resources, it can also increase e-waste that end up affecting the environment. To draw attention to the impact of energy consumption on the environment, we proposed and developed a computational artifact that unites the areas of Data Aesthetics and Interaction Design. The artifact, displayed in an interactive installation, was divided into three panels: (i) the left panel, which represents the countries—from the European Union (EU)—with the lowest energy consumption impact on the environment; (ii) the central panel, which use swarming boids to represent the internet usage at the installation site and its impact; and (iii) the right panel, which represents the EU countries with the highest energy impact on the environment. The arrangement of the three panels in a single interactive installation aims to establish a visual connection between the energy consumption in the EU and the energy consumption in the installation’s site and to promote awareness of its impact on the environment. -
GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers
Pedro Sarmento, Adarsh Kumar, Yu-Hua Chen, CJ Carr, Zack Zukowski, Mathieu BarthetAbstractRecently, symbolic music generation with deep learning techniques has witnessed steady improvements. Most works on this topic focus on MIDI representations, but less attention has been paid to symbolic music generation using guitar tablatures (tabs) which can be used to encode multiple instruments. Tabs include information on expressive techniques and fingerings for fretted string instruments in addition to rhythm and pitch. In this work, we use the DadaGP dataset for guitar tab music generation, a corpus of over 26k songs in GuitarPro and token formats. We introduce methods to condition a Transformer-XL deep learning model to generate guitar tabs (GTR-CTRL) based on desired instrumentation (inst-CTRL) and genre (genre-CTRL). Special control tokens are appended at the beginning of each song in the training corpus. We assess the performance of the model with and without conditioning. We propose instrument presence metrics to assess the inst-CTRL model’s response to a given instrumentation prompt. We trained a BERT model for downstream genre classification and used it to assess the results obtained with the genre-CTRL model. Statistical analyses evidence significant differences between the conditioned and unconditioned models. Overall, results indicate that the GTR-CTRL methods provide more flexibility and control for guitar-focused symbolic music generation than an unconditioned model. -
Artistic Curve Steganography Carried by Musical Audio
Christopher J. TralieAbstractIn this work, we create artistic closed loop curves that trace out images and 3D shapes, which we then hide in musical audio as a form of steganography. We use traveling salesperson art to create artistic plane loops to trace out image contours, and we use Hamiltonian cycles on triangle meshes to create artistic space loops that fill out 3D surfaces. Our embedding scheme is designed to faithfully preserve the geometry of these loops after lossy compression, while keeping their presence undetectable to the audio listener. To accomplish this, we hide each dimension of the curve in a different frequency, and we perturb a sliding window sum of the magnitude of that frequency to best match the target curve at that dimension, while hiding scale information in that frequency’s phase. In the process, we exploit geometric properties of the curves to help to more effectively hide and recover them. Our scheme is simple and encoding happens efficiently with a nonnegative least squares framework, while decoding is trivial. We validate our technique quantitatively on large datasets of images and audio, and we show results of a crowd sourced listening test that validate that the hidden information is indeed unobtrusive. -
LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation
Olga Vechtomova, Gaurav SahuAbstractElectronic music artists and sound designers have unique workflow practices that necessitate specialized approaches for developing music information retrieval and creativity support tools. Furthermore, electronic music instruments, such as modular synthesizers, have near-infinite possibilities for sound creation and can be combined to create unique and complex audio paths. The process of discovering interesting sounds is often serendipitous and impossible to replicate. For this reason, many musicians in electronic genres record audio output at all times while they work in the studio. Subsequently, it is difficult for artists to rediscover audio segments that might be suitable for use in their compositions from thousands of hours of recordings. In this paper, we describe LyricJam Sonic, a creative tool for musicians to rediscover their previous recordings, re-contextualize them with other recordings, and create original live music compositions in real-time. A bi-modal AI-driven approach uses generated lyric lines to find compatible audio clips from the artist’s past studio recordings, and uses them to generate new lyric lines, which in turn are used to find other clips, thus creating a continuous and evolving stream of music and lyrics. The intent is to keep the artists in a state of creative flow conducive to music creation rather than taking them into an analytical/critical state of deliberately searching for past audio segments. The system can run in either a fully autonomous mode without user input, or in a live performance mode, where the artist plays live music, while the system “listens” and creates a continuous stream of music and lyrics in response (LyricJam Sonic: https://lyricjam.ai -
Searching for Human Bias Against AI-Composed Music
Dimiter Zlatkov, Jeff Ens, Philippe PasquierAbstractWith the popularization of musical AI in society comes the question of how it will be received by the public. We conducted an empirical study to investigate the hypotheses that human listeners hold a negative bias against computer-composed music. 163 participants were recruited from Amazon’s MTurk to fill out a survey asking participants to rank 5 computer-composed and 5 human-composed musical excerpts based on subjective musical preference. Participants were split into two groups, one informed of correct authorship, the other deceived. The hypothesis, that those in the informed group would rank computer-composed excerpts as lower than human-composed excerpts, was not supported by significant results. We outline potential weaknesses in our design and present possible improvements for future work. A review of related studies on bias against AI-composed music and art is also included.
-
-
Short Talks
-
Frontmatter
-
Fabric Sketch Augmentation & Styling via Deep Learning & Image Synthesis
Omema Ahmed, Muhammad Salman Abid, Aiman Junaid, Syeda Saleha RazaAbstractThis paper introduces a two-fold methodology of creating fabric designs and patterns, using both traditional object detection and Deep Learning methodologies. The proposed methodology first augments a given partial sketch, which is taken as an input from the user. This sketch augmentation is performed through a combination of object detection, canvas quilting, and seamless tiling, to achieve a repeatable block of a pattern. This augmented pattern is then carried forward as an input to our variation of the pix2pix GAN, which outputs a styled and colored pattern using the sketch as a baseline. This design pipeline is an overall overhaul of the creative process of a textile designer, and is intended to provide assistance in the design of modern textiles in the industry by reducing the time from going to a sketch to a pattern in under a minute. -
Transposition of Simple Waveforms from Raw Audio with Deep Learning
Patrick J. Donnelly, Parker CarlsonAbstractA system that is able to automatically transpose an audio recording would have many potential applications, from music production to hearing aid design. We present a deep learning approach to transpose an audio recording directly from the raw time domain signal. We train recurrent neural networks with raw audio samples of simple waveforms (sine, square, triangle, sawtooth) covering the linear range of possible frequencies. We examine our generated transpositions for each musical semitone step size up to the octave and compare our results against two popular pitch shifting algorithms. Although our approach is able to accurately transpose the frequencies in a signal, these signals suffer from a significant amount of added noise. This work represents exploratory steps towards the development of a general deep transposition model able to quickly transpose to any desired spectral mapping. -
AI-Aided Ceramic Sculptures: Bridging Deep Learning with Materiality
Varvara Guljajeva, Mar Canet SolaAbstractWith the advent of neural networks as powerful tools for generating various forms of media, so-called ‘Deep Learning’ (DL) has entered the sphere of art production. The concept of creative artificial intelligence (AI) has become part of popular discourse around 2D digital image-making, but can AI exceed the limitations of 2D media and be applied creatively in more tactile 3D media such as sculpture? In this paper, we describe what happens when AI is applied in a real-life production line, from concept to physical object. The article presents a case study that explore DL’s potential for creating a tactile sculpture guided only by text prompt and a 3D model. In the production process, we mix several methods, including neural, digital, and traditional, to achieve the final results. In terms of methodology, this is an artistic study that explores existing DL tools for 3D object generation and later manufacturing in 3D printed ceramics. In the study, we use practice-based research methods to explore what happens when modern technology meets traditional ways of production, such as pottery. Further, we discuss reference art projects that have utilised AI, lessons learned, and the potential use of DL tools in art production. The aim of the paper is to explore new meanings and to open new avenues for investigation that emerge by bringing together creative AI with materiality. -
OSC-Qasm: Interfacing Music Software with Quantum Computing
Omar Costa Hamido, Paulo Vitor ItaboraíAbstractOSC-Qasm is a cross-platform, Python-based, OSC interface for executing Qasm code. It serves as a simple way to connect creative programming environments like Max (with The QAC Toolkit) and Pure Data with real quantum hardware, using the Open Sound Control protocol. In this paper, the authors introduce the context and meaning of developing a tool like this, and what it can offer to creative artists. -
EvoDesigner: Aiding the Exploration of Innovative Graphic Design Solutions
Daniel Lopes, João Correia, Penousal MachadoAbstractGraphic Design (gd) artefacts, like posters on the streets or book covers on store shelves, often compete with each other to be seen, catch attention and communicate effectively. Nevertheless, due to the democratisation of gd and because finding disruptive aesthetics might be time-consuming, graphic designers often follow existing trends, lacking disruptive and catchy visual features. EvoDesigner aims to assist the exploration of distinctive gd aesthetics by employing a genetic algorithm to evolve content within two-dimensional pages. The system takes the form of an extension for Adobe InDesign, so both human designers and the machine can alternately collaborate in the creation process. In this paper, we propose a method to automatically evaluate the generated posters by assessing their dissimilarity to the output of an auto-encoder that was trained with a set of posters posted at typographicposters.com by graphic designers worldwide. The results suggest the viability of the evaluation method to recall large sets of images and therefore be used to compute an image dissimilarity degree. Furthermore, the proposed method could be used for evolving gd posters that can be deemed as new when compared to the training set. -
Improving Automatic Music Genre Classification Systems by Using Descriptive Statistical Features of Audio Signals
Ravindu Perera, Manjusri Wickramasinghe, Lakshman JayaratneAbstractAutomatic music genre classification systems are vital nowadays because the traditional music genre classification process is mostly implemented without following a universal taxonomy and the traditional process for audio indexing is prone to error. Various techniques to implement an automatic music genre classification system can be found in the literature but the accuracy and efficiency of those systems are insufficient to make them useful for practical scenarios such as identifying songs by the music genre in radio broadcast monitoring systems. The main contribution of this research is to increase the accuracy and efficiency of current automatic music genre classification systems with a comprehensive analysis of correlations between the descriptive statistical features of audio signals and the music genres of songs. A greedy approach for music genre identification is also introduced to improve the accuracy and efficiency of music genre classification systems and to identify the music genre of complex songs that contain multiple music genres. The approach, proposed in this paper, reported 87.3% average accuracy for music genre classification on the GTZAN dataset over 10 music genres. -
Musical Genre Recognition Based on Deep Descriptors of Harmony, Instrumentation, and Segments
Igor Vatolkin, Mark Gotham, Néstor Nápoles López, Fabian OstermannAbstractDeep learning has recently established itself as a cluster of methods of choice for almost all classification tasks in music information retrieval. However, despite very good classification performance, it sometimes brings disadvantages including long training times and higher energy costs, lower interpretability of classification models, or an increased risk of overfitting when applied to small training sets due to a very large number of trainable parameters. In this paper, we investigate the combination of both deep and shallow algorithms for recognition of musical genres using a transfer learning approach. We train deep classification models once to predict harmonic, instrumental, and segment properties from datasets with respective annotations. Their predictions for another dataset with annotated genres are used as features for shallow classification methods. They can be trained over and again for different categories, and are particularly useful when the training sets are small, in a real world scenario when listeners define various musical categories selecting only a few prototype tracks. The experiments show the potential of the proposed approach for genre recognition. In particular, when combined with evolutionary feature selection which identifies the most relevant deep feature dimensions, the classification errors became significantly lower in almost all cases, compared to a baseline based on MFCCs or results reported in the previous work.
-
-
Backmatter
- Titel
- Artificial Intelligence in Music, Sound, Art and Design
- Herausgegeben von
-
Colin Johnson
Nereida Rodríguez-Fernández
Sérgio M. Rebelo
- Copyright-Jahr
- 2023
- Verlag
- Springer Nature Switzerland
- Electronic ISBN
- 978-3-031-29956-8
- Print ISBN
- 978-3-031-29955-1
- DOI
- https://doi.org/10.1007/978-3-031-29956-8
Informationen zur Barrierefreiheit für dieses Buch folgen in Kürze. Wir arbeiten daran, sie so schnell wie möglich verfügbar zu machen. Vielen Dank für Ihre Geduld.