1 Introduction
2 Experimental setup
2.1 Music database
2.2 Parametrization
# | ID | Audio Feature Description |
comment
|
---|---|---|---|
1 | TC | Temporal Centroid | |
2 | SC, SC_V | Spectral Centroid – average and its variance | |
34 | ASE 1-34 | Audio Spectrum Envelope (ASE) – average values in 34 frequency bands |
29 subbands as audio files are in .mp3 format
|
1 | ASE_M | Mean ASE (for all frequency bands) | |
34 | ASEV 1-34 | ASE variance in 34 frequency bands |
as above
|
1 | ASE_MV | Mean ASE variance (for all frequency bands) | |
2 | ASC, ASC_V | Audio Spectrum Centroid (ASC) – average and its variance | |
2 | ASS, ASS_V | Audio Spectrum Spread (ASS) – average and its variance | |
24 | SFM 1-24 | Spectral Flatness Measure (SFM) – average values for 24 frequency bands |
20 subbands
|
1 | SFM_M | Mean SFM (for all frequency bands) | |
24 | SFMV 1-24 | SFM variance (for 24 frequency bands) |
20 subbands
|
1 | SFM_MV | Mean SFM variance (for all frequency bands) | |
20 | MFCC 1-20 | Mel Frequency Cepstral Coefficients (MFCC) – first 20 (mean values) | |
20 | MFCCV 1-20 | MFCC Variance – first 20 | |
3 | THR_[1,2,3] RMS_TOT | No of samples higher than a single/ double/triple RMS value |
Dedicated parameters (24) in time domain based on the analysis of the distribution of the signal envelope in relation to the RMS value
|
6 | THR_[1,2,3]RMS_10 FR_[MEAN,VAR] | Mean/Variance of THR_[1,2,3]RMS_TOT for 10 time frames | |
1 | PEAK_RMS_TOT | A ratio of peak to RMS (Root Mean Square) | |
2 | PEAK_RMS10 FR_[MEAN,VAR] | A mean/variance of PEAK_RMS_TOT for 10 time frames | |
1 | ZCD | Number of transition by the level Zero | |
2 | ZCD_10 FR_[MEAN,VAR] | Mean/Variance value of ZCD for 10 time frames | |
3 | [1,2,3]RMS_TCD | Number of transitions by single/ double/triple level RMS | |
6 | [1,2,3]RMS_TCD_10 FR_[MEAN,VAR] | Mean/Variance value of [1,2,3]RMS_TCD for 10 time frames | |
TOTAL number of parameters | 173 |
2.3 Music track separation
2.3.1 OpenBliSSART
2.3.2 Feature vectors built on separated music tracks
2.3.3 Normalization methods
2.4 Experimental setup
3 Classification process
3.1 Co-training method
3.2 Effectiveness measures
3.3 Feature vector optimization
4 Experiments
4.1 Reducing feature vector
4.1.1 Attribute subset selection
4.1.2 Adding parameters extracted from separated tracks
4.1.3 Results and discussion
O | Alternative Rock | Blues | Classical | Country | DanceDJ | Hard Rock & Metal | Jazz | Latin Music | New Age | Pop | Rap & Hip Hop | R&B | Rock | Sum | TPR [%] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alternative Rock |
23.67
| 1.33 | 0.00 | 2.33 | 1.67 | 2.33 | 0.33 | 0.67 | 1.67 | 11.33 | 1.67 | 0.33 | 21.67 | 69.00 | 34.30 |
Blues | 3.67 |
35.33
| 0.00 | 14.00 | 0.00 | 0.00 | 7.33 | 3.67 | 1.00 | 6.67 | 0.67 | 7.33 | 8.33 | 88.00 | 40.15 |
Classical | 1.33 | 0.33 |
289.00
| 0.67 | 0.00 | 0.00 | 4.00 | 0.33 | 18.00 | 2.67 | 0.00 | 0.00 | 1.33 | 317.67 | 90.98 |
Country | 3.33 | 8.00 | 0.33 |
269.33
| 0.00 | 0.67 | 7.67 | 6.67 | 1.67 | 20.67 | 1.00 | 6.67 | 20.33 | 346.33 | 77.76 |
DanceDJ | 4.33 | 0.33 | 0.00 | 0.00 |
63.33
| 1.00 | 0.00 | 0.67 | 1.00 | 3.67 | 9.33 | 2.00 | 0.67 | 86.33 | 73.36 |
Hard Rock & Metal | 3.67 | 0.33 | 0.00 | 1.33 | 1.33 |
176.67
| 0.00 | 0.00 | 0.67 | 1.33 | 0.00 | 0.00 | 15.33 | 200.67 | 88.04 |
Jazz | 1.00 | 2.67 | 5.00 | 8.00 | 0.33 | 0.00 |
133.00
| 2.00 | 15.00 | 14.33 | 0.00 | 6.33 | 1.67 | 189.33 | 70.25 |
Latin Music | 0.33 | 3.67 | 0.00 | 13.67 | 0.33 | 0.00 | 4.33 |
103.00
| 0.00 | 13.33 | 5.33 | 3.33 | 0.67 | 148.00 | 69.60 |
New Age | 1.33 | 0.67 | 24.00 | 0.33 | 2.67 | 2.67 | 10.67 | 0.00 |
140.00
| 2.00 | 0.33 | 2.67 | 1.67 | 189.00 | 74.07 |
Pop | 6.67 | 10.33 | 8.00 | 34.00 | 4.33 | 5.33 | 16.00 | 11.33 | 13.00 |
101.00
| 9.33 | 16.00 | 30.00 | 265.33 | 38.07 |
Rap & Hip Hop | 1.67 | 0.33 | 0.00 | 2.00 | 10.33 | 0.33 | 0.00 | 5.33 | 1.00 | 5.00 |
303.67
| 7.33 | 0.33 | 337.33 | 90.02 |
R&B | 0.33 | 6.00 | 0.33 | 6.00 | 2.67 | 0.00 | 12.33 | 5.33 | 2.67 | 24.00 | 11.00 |
128.00
| 4.67 | 203.33 | 62.95 |
Rock | 14.33 | 6.33 | 1.00 | 18.33 | 1.67 | 22.00 | 2.33 | 1.00 | 9.33 | 28.67 | 0.67 | 3.33 |
198.67
| 307.67 | 64.57 |
Precision [%] | 36.05 | 46.70 | 88.20 | 72.79 | 71.43 | 83.73 | 67.18 | 73.57 | 68.29 | 43.04 | 88.53 | 69.82 | 65.07 | ||
F1 [%] | 35.16 | 43.18 | 89.57 | 75.20 | 72.38 | 85.83 | 68.68 | 71.53 | 71.06 | 40.40 | 89.27 | 66.21 | 64.82 | ||
Accuracy [%] | 96.92 | 96.73 | 97.61 | 93.93 | 98.27 | 97.92 | 95.77 | 97.10 | 96.02 | 90.22 | 97.41 | 95.46 | 92.72 |
OH | Alternative Rock | Blues | Classical | Country | DanceDJ | Hard Rock & Metal | Jazz | Latin Music | NewAge | Pop | Rap & Hip Hop | R&B | Rock | Sum | TPR [%] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alternative Rock |
28
| 1.33 | 0 | 2 | 2 | 2 | 0.33 | 0.33 | 0.67 | 10 | 2 | 0.33 | 20 | 68.99 | 40.59 |
Blues | 2.33 |
39.33
| 0 | 11.67 | 0 | 0.33 | 6.67 | 2.67 | 0.67 | 8.67 | 1 | 6.67 | 8 | 88.01 | 44.69 |
Classical | 0.67 | 0.67 |
291.33
| 0.67 | 0 | 0 | 4.67 | 0 | 13.33 | 5 | 0 | 0.33 | 1 | 317.67 | 91.71 |
Country | 3 | 11.67 | 0.33 |
271.67
| 0.33 | 1 | 5 | 5 | 0 | 23.67 | 1.67 | 5.67 | 17.33 | 346.34 | 78.44 |
DanceDJ | 4 | 0.67 | 0 | 0.33 |
66
| 1 | 0 | 0 | 0.33 | 3.67 | 8.33 | 1.33 | 0.67 | 86.33 | 76.45 |
Hard Rock & Metal | 5 | 0.33 | 0 | 1.67 | 0.67 |
170.67
| 0 | 0 | 0.33 | 2 | 1.33 | 0 | 18.67 | 200.67 | 85.05 |
Jazz | 0.33 | 6.33 | 7 | 4.67 | 0.67 | 0 |
139.33
| 3.33 | 7 | 11.67 | 0 | 7 | 2 | 189.33 | 73.59 |
Latin Music | 0.33 | 4 | 0 | 11.67 | 0.67 | 0 | 3 |
107
| 0 | 13 | 3.67 | 3.67 | 1 | 148.01 | 72.29 |
New Age | 1 | 0.67 | 18 | 1 | 4 | 2.67 | 4.67 | 0 |
150.33
| 2.67 | 0 | 1 | 3 | 189.01 | 79.54 |
Pop | 9.67 | 9 | 8.33 | 32.67 | 3 | 5.33 | 17.33 | 10.33 | 7 |
109.33
| 7.33 | 15 | 31 | 265.32 | 41.21 |
Rap & Hip Hop | 1.67 | 0.67 | 0 | 1.33 | 11 | 0.67 | 0.33 | 7 | 0 | 9 |
293.33
| 11.67 | 0.67 | 337.34 | 86.95 |
R&B | 1 | 10 | 0 | 7 | 1.67 | 0 | 10.33 | 6 | 1.67 | 22.67 | 12.33 |
126
| 4.67 | 203.34 | 61.97 |
Rock | 20 | 9 | 1 | 17.67 | 1.67 | 20 | 3.33 | 1 | 4.67 | 33 | 0.67 | 3.33 |
192.33
| 307.67 | 62.51 |
Precision [%] | 36.36 | 41.99 | 89.37 | 74.63 | 71.99 | 83.80 | 71.45 | 75.00 | 80.82 | 42.98 | 88.44 | 69.23 | 64.04 | ||
F1 [%] | 38.36 | 43.30 | 90.52 | 76.49 | 74.15 | 84.42 | 72.51 | 73.62 | 80.17 | 42.08 | 87.69 | 65.40 | 63.27 | ||
Accuracy [%] | 96.83 | 96.39 | 97.83 | 94.27 | 98.35 | 97.76 | 96.30 | 97.29 | 97.37 | 90.13 | 97.09 | 95.37 | 92.48 |
OD p_90 | Alternative Rock | Blues | Classical | Country | DanceDJ | Hard Rock & Metal | |
Precision
| 35.65 | 42.37 | 88.75 | 73.13 | 74.35 | 84.86 | |
Recall
| 39.61 | 42.05 | 91.08 | 77.29 | 77.21 | 86.55 | |
F1 | 37.53 | 42.21 | 89.90 | 75.15 | 75.75 | 85.70 | |
Accuracy
| 96.79 | 96.44 | 97.69 | 93.95 | 98.47 | 97.93 | |
OD p_90 | Jazz | Latin Music | New Age | Pop | Rap & Hip Hop | R&B | Rock |
Precision
| 69.95 | 73.30 | 75.31 | 42.31 | 90.80 | 68.54 | 64.44 |
Recall
| 70.07 | 70.50 | 75.85 | 40.07 | 88.74 | 63.94 | 63.81 |
F1 | 70.01 | 71.87 | 75.58 | 41.16 | 89.76 | 66.16 | 64.12 |
Accuracy
| 96.03 | 97.11 | 96.74 | 90.04 | 97.57 | 95.38 | 92.60 |
OP p_61 | Alternative Rock | Blues | Classical | Country | DanceDJ | Hard Rock & Metal | |
Precision
| 37.36 | 49.36 | 88.89 | 72.92 | 71.74 | 84.87 | |
Recall
| 32.85 | 43.56 | 91.50 | 77.48 | 76.45 | 88.54 | |
F1 | 34.96 | 46.28 | 90.18 | 75.13 | 74.02 | 86.67 | |
Accuracy
| 97.02 | 96.86 | 97.75 | 93.93 | 98.34 | 98.05 | |
OP p_61 | Jazz | Latin Music | New Age | Pop | Rap & Hip Hop | R&B | Rock |
Precision
| 68.07 | 74.88 | 72.73 | 41.86 | 88.99 | 69.14 | 63.86 |
Recall
| 71.30 | 69.14 | 77.60 | 37.81 | 89.43 | 64.26 | 64.14 |
F1 | 69.65 | 71.90 | 75.09 | 39.74 | 89.21 | 66.61 | 64.00 |
Accuracy
| 95.89 | 97.17 | 96.58 | 90.03 | 97.41 | 95.45 | 92.53 |
OT p_60 | Alternative Rock | Blues | Classical | Country | DanceDJ | Hard Rock & Metal | |
Precision
| 36.02 | 48.18 | 88.35 | 72.47 | 71.48 | 84.36 | |
Recall
| 32.37 | 40.15 | 90.66 | 78.06 | 76.44 | 86.88 | |
F1 | 34.10 | 43.80 | 89.49 | 75.16 | 73.88 | 85.60 | |
Accuracy
| 96.95 | 96.81 | 97.60 | 93.90 | 98.33 | 97.91 | |
OT p_60 | Jazz | Latin Music | New Age | Pop | Rap & Hip Hop | R&B | Rock |
Precision
| 67.45 | 74.39 | 69.40 | 43.00 | 88.75 | 69.54 | 64.54 |
Recall
| 70.42 | 69.37 | 74.43 | 37.81 | 89.62 | 64.75 | 65.65 |
F1 | 68.91 | 71.79 | 71.83 | 40.24 | 89.18 | 67.06 | 65.09 |
Accuracy
| 95.80 | 97.15 | 96.14 | 90.22 | 97.40 | 95.50 | 92.69 |
OS p_60 | Alternative Rock | Blues | Classical | Country | DanceDJ | Hard Rock & Metal | |
Precision
| 36.90 | 45.42 | 87.86 | 72.20 | 70.50 | 85.28 | |
Recall
| 33.33 | 41.29 | 91.08 | 77.77 | 75.67 | 88.54 | |
F1 | 35.03 | 43.26 | 89.44 | 74.88 | 72.99 | 86.88 | |
Accuracy
| 96.99 | 96.65 | 97.57 | 93.83 | 98.27 | 98.08 | |
OS p_60 | Jazz | Latin Music | New Age | Pop | Rap & Hip Hop | R&B | Rock |
Precision
| 67.92 | 73.85 | 71.66 | 42.67 | 89.11 | 70.02 | 64.49 |
Recall
| 70.07 | 69.37 | 75.84 | 37.31 | 89.72 | 63.93 | 65.11 |
F1 | 68.98 | 71.54 | 73.69 | 39.81 | 89.41 | 66.84 | 64.80 |
Accuracy
| 95.84 | 97.11 | 96.41 | 90.18 | 97.46 | 95.52 | 92.66 |
2.201 | Alternative Rock | Blues | Classical | Country | DanceDJ | Hard Rock & Metal | |
O–OH p_59 | 8.791 | 3.392 | 3.415 | 1.709 | 4.101 | 3.808 | |
O–OD p_90 | 7.575 | 3.785 | 3.501 | 1.732 | 4.071 | 3.72 | |
O–OP p_61 | 7.477 | 3.782 | 3.456 | 1.748 | 4.053 | 3.764 | |
O–OT p_60 | 6.088 | 4.082 | 3.564 | 1.769 | 4.096 | 3.825 | |
O–OS p_60 | 6.735 | 4.01 | 3.538 | 1.774 | 4.187 | 3.784 | |
2.201 | Jazz | Latin Music | New Age | Pop | Rap & Hip Hop | R&B | Rock |
O–OH p_59 | 1.988 | 3.531 | 1.96 | 1.848 | 3.264 | 3.631 | 2.023 |
O–OD p_90 | 2.081 | 3.689 | 2.111 | 1.901 | 3.197 | 3.705 | 2.046 |
O–OP p_61 | 2.062 | 3.731 | 2.122 | 2.305 | 3.235 | 3.612 | 2.062 |
O–OT p_60 | 2.065 | 3.667 | 2.217 | 1.964 | 3.193 | 3.798 | 2.015 |
O–OS p_60 | 2.078 | 3.606 | 2.166 | 2.055 | 3.204 | 3.754 | 2.014 |
-
In most cases of the mixture of signals the improvement of the effectiveness measures was observed in comparison to the original signal.
-
For each of the genres where Harmonic plays important part (Classical, Latin Music, New Age and Pop) the improvement of TPR values is observed for the OH signal. For the three of four selected genres (Classical vs. Latin Music and vs. New Age and vs. Pop), the improvement of Precision is also observed. In particular, an increase of 12.53 percent points in Precision for the New Age is achieved. Jazz genre deserves a special attention, in which case Precision was higher for over 4.27 percent points and TPR for over 3.34 percent points, as well as DanceDJ where TPR got over 3.1 percent points higher. The improvement of Jazz should be stressed out especially in the context of the lower rate of misclassification between Pop and Jazz. It was also shown that for genres such as Rock and Hard Rock & Metal, the decrease of correctness for OH was observed, what confirms that the harmonic part does not play an important part for those genres. Surprisingly, Alternative Rock got over 6 percent points of improvement of the TPR. The behavior of Blues is also interesting, where the TPR was also improved for over 4.5 percent points, while Precision decreased by almost 5 percent points.
-
The improvement of Recall (TPR) value for Alternative Rock was gained in the case of the OD signal. Surprisingly, higher Precision values of New Age (over 7 percent points) and Recall (TPR) of Dance & DJ (almost 4 percent points) were also gained. In the case of classes such as Latin Music, New Age, Pop and Classical, a slight improvement of classification was also observed. That proves that the lack of Drum element (the percussion signal was present only for 89.6% of elements from the input audio dataset) is a piece of information/feature for the classifier with the significance in the training process.
-
The improvement of classification for the genres where the piano plays an important part was not so visible for the OP signal. Precision was improved in the case of several genres (e.g. Classical, Dance & DJ, Jazz, Latin, but also Blues, Alternative Rock, etc.), along with Recall (TPR) values. Improvement of over 3 percent points for the DanceDJ genre was obtained.
-
A slight improvement of Precision is to be observed, i.e. for the OT – Hard Rock & Metal, Latin, New Age, and in the case of OS – New Age, Rap & Hip Hop and R&B. This is also visible for Recall (TPR) values (e.g. DanceDJ, R&B).