• Nie Znaleziono Wyników

Music  Features  and  Parameters  Related  to  Mood  of  Music

4 OVERVIEW  OF  AUDIO  SIGNAL  PARAMETRIZATION

4.1 MUSIC  MOOD  RECOGNITION  PARAMETRIZATION

4.1.1 Music  Features  and  Parameters  Related  to  Mood  of  Music

Figure  4.1     Three  layers  of  music  interpretation  and  description    

4.1.1 Music  Features  and  Parameters  Related  to  Mood  of  Music  

In   MIR   and   especially   in   MER,   studies   are   performed   to   determine   the   relationship   between  music  features  and  the  impact  on  the  listener.  Music  Emotion  Recognition  is  the   area   where   these   relationships   are   crucial   and   underlie   the   whole   concept   of   mood   recognition.   In   the   subsequent   section   relationships   investigated   by   different   researchers   are  cited  and  compared.  On  the  other  hand,  it  could  be  observed  that  composers  commonly  

use   these   rules;   a   skilled   and   conscious   one   uses   particular   elements   of   music   to   achieve   desired  impact  on  the  listener.  

Several   works   related   to   Music   Mood   Recognition   refer   to   different   groups   and   combinations   of   music   features   and   specific   parameters.   However,   it   should   be   remembered   that   the   basis   of   these   descriptors   had   roots   in   research   performed   earlier   within  general  Music  Information  Recognition  [50,232].  

Eronen   [79]   analyzed   several   features   with   regard   to   recognition   performance   in   a   musical   instrument   recognition   system.   He   took   into   consideration   a   wide   set   of   features   covering  both  spectral  and  temporal  properties  of  sound.    

A  number  of  studies  considered  interpretation  between  music  features  and  valence  and   arousal  [91,92,177,327].  Valence  and  arousal  are  expressions  taken  from  Thayer's  model  of   emotion,   which   is   described   in   details   in   Section   2.5.   A   summarized  set  of  music  features   important  in  the  prediction  of  valence  and  arousal  is  listed  in  Tab.  4.1  [43].  Definitions  and   description  of  music  features  are  included  in  Section  2.2.  

Table  4.1    Features  in  the  prediction  of  valence  and  arousal  [43]  

Valence   Arousal  

Chroma   Slow  tempo  

Percussiveness  variability  across  bands   Loudness  

Measure  related  to  the  ratio  of  fast  and  

slow  tempos   Chroma  eccentricity  

Modulation  spectrum   Fast  tempo  

Harmonic  strangeness   Spectral  tilt  

Hevner   summarized   her   findings   related   to   the   music   features   that   create   emotional   content   of   music   [108].   They   are   schematically   shown   in   Tab.   4.2,   according   to   eight   clusters  of  adjectives  included  in  Hevner's  model  of  emotions  (Fig.  2.20)  [108].  

The   next   step   of   the   analysis   is   to   determine   the   relationship   between   music   features   and   particular   parameters.   Brinker   [43]   tested   79   features   and   proposed   a   schematic   alignment,  which  is  presented  in  Tab.  4.3.  

       

Table  4.2    Musical  characteristics  related  to  emotion  groups  with  weights  proposed  by  Hevner  [108]  

No. Adjectives Music Characteristic Weight

1 spiritual, lofty, awe-inspiring,

2 pathetic, doleful, sad, mournful, tragic, melancholy, frustrated

Table 4.3 Parameters related to musical features proposed by Brinker [43]

Feature  class   Descriptor  

Spectral   MFCC  and  modulations  

Tonality   Chroma,  key  consonants,  dissonants,  

harmonic  strangeness,  chroma  eccentricity   Rhythm   Tempos  (fast-­‐slow),  onsets,  inter-­‐onsets  

intervals  

Percussiveness   Characterization  and  classification  of  onset   per  band  

This  set  allowed  Brinker  to  achieve  Valence  and  Arousal  prediction  with  variance  of  0.68   for  Arousal  and  0.50  for  Valence  [43].  

As   mentioned   before,   within   the   area   of   Music   Emotion   Recognition   authors   use   different  sets  of  parameters  and  algorithms.  Panda  for  audio  features  extraction  employed   Marsyas  and  MIR  Toolbox  [234].  He  fed  parameters  into  SVMs  classification  and  regression   system,  reducing  the  number  of  features  using  forward  feature  selection  (FFS).  

Rauber   and   his   collaborators   [256]   executed   2-­‐stage   features   extraction   based   on   Psycho-­‐Acoustic   Models   and   used   them   in   the   SOM   model   (Fig.   4.2).   They   based   their   parameters   on   the   basic     of   auditory   perception   that   is   loudness   sensation   and   rhythm   patterns  per  frequency  band.  

 

Figure  4.2   2-­‐stage  feature  extraction  proposed  by  Rauber  [256]  

Baume  [25]  described  47  different  types  of  audio  features  and  evaluated  them  for  the   purpose  of  music  mood  recognition.  He  tested  these  sets  with  different  types  of  regressors   (Fig.  4.3)  as  well  as  different  subsets  for  each  SVR  regressor  (Tab.4.4).  Baume  used  different   subset   evaluation   techniques   that   can   be   divided   into   three   categories.   He   followed   Liu's   [184]:   categorization   of   feature   selection   techniques   the   filter   model,   the   wrapper   model   and  the  hybrid  model  [25].  The  filter  model  relies  on  general  characteristics  of  the  data  to   evaluate   feature   subsets,   whereas   the   wrapper   model   uses   the   performance   of   a   predetermined   algorithm   (such   as   a   support   vector   machine)   as   the   evaluation   criterion.  

The   wrapper   model   gives   superior   performance   as   it   finds   features   best   suited   to   the   chosen  algorithm,  but  it  is  more  computationally  expensive  and  specific  to  that  algorithm.  

The   hybrid   model   attempts   to   combine   the   advantages   of   both.   Results   of   his   works   are   presented  accordingly  in  Fig.  4.3  and  Tab.  4.4.    

For   the   purpose   of   MIR,   including   genre   classification   and   MER,   Li   [175]   used   MFCC,   STFT,   DWCH   and   lyrics-­‐based   feature   sets.   At   the   same   time   Skowronek   and   her   collaborators  [287,288]  employed  mostly  rhythm  based,  key  and  chroma  features  in  their   experiments.  Schmidt  [273,275]  tested  several  subgroups  of  features  (i.  e.  MFCC,  Chroma,   and   Statistical   Spectrum   Descriptors)   for   emotion   recognition   and   time-­‐varying   emotion   regression.   Schmidt   analyzed   individual   sets   of   features   and   determined   accuracy   of   4-­‐

category   classification   for   each   of   them   (Tab.   4.5).   Schmidt   [273,274]   tested   several   subgroups   of   features   (i.   e.   MFCC,   Chroma,   and   Statistical   Spectrum   Descriptors)   for   emotion  recognition  and  time-­‐varying  emotion  regression.  Schmidt  analyzed  individual  sets   of   features   and   determined   accuracy   of   4-­‐categories   classification   for   each   of   them   (Tab.  

4.5).    

 

Figure  4.3   The  absolute  error  of  the  best  performing  combinations  for  each  of  the  five  regressors.  The  first  

Table  4.4   Best  feature  combinations  for  each  regressor  [25]    

Table  4.5   Results  of  4-­‐way  mood  classification  for  several  groups  of  parameters  [275]  

MFCC  &  Spectral  Contrast   50.18±4.18%  

As   a   result,   Schmidt   found   different   features   appropriate   for   particular   analysis   [273-­‐

275].  Even  within  his  research,  length  and  the  content  of  the  feature  vector  varied.  

[235]   Panda   and   collaborators   recently   proposed   a   unique   feature   set   consisting   of   standard   and   melodic   features   extracted   directly   from   audio.   Their   results   show   that   melodic  features  perform  better  than  standard  audio.  They  achieved  the  result  of  64%  F-­‐

measure,  with  only  11  features  (9  melodic  and  2  standard).  

In  many  studies  different  layers  of  the  music  parametrization  are  mixed.  Parameters  are   commonly   included   in   music   features   set   and   vice   versa.   For   example   Brinker   with   collaborators  [43]  put  on  the  same  stage  of  analysis  chroma  and  modulation  spectrum  (Tab.  

4.2).  On  the  other  hand,  systematics  proposed  by  Thayer  [308]  are  related  only  to  the  music   features   and   are   very   hard   to   describe   without   the   expert   involved   in   the   process.  

Methodology   proposed   by   the   author   of   the   presented   dissertation   is   based   on   3-­‐stage   music   analysis   described   before   (Fig.   4.1).   It   involves   attempt   to   create   time-­‐based   parameters   that   describe   particular   musical   content   with   mathematical   tools.   Proposed   parameters  and  motivation  behind  them  is  presented  in  Section  4.6.  

Each   of   the   works   presented   in   this   Section   refers   to   different   sets   of   features   and   parameters,  even  though  all  of  them  aim  at  music  mood  recognition.  Moreover,  even  within   one  computational  method,  different  settings  may  require  other  parameters.  Therefore  it  is   difficult   to   determine   one   and   only   valid   set   of   features   which   would   be   suitable   for   any   approach  to  mood  description  and  recognition  of  music.    

4.1.2 Preprocessing  

Preprocessing   is   a   very   important   step   that   occurs   before   almost   any   analysis.   The   purpose   of   the   process   is   preparing   or   adjusting   data   for   the   particular   method   or   goal,   extracting  desired  information  and  removing  redundant  content.  

Usually,  data  values  within  a  dataset  may  differ  widely,  which  is  one  of  the  reasons  of   preprocessing.   Normalization   is   often   applied   to   bring   various   data   to   the   same   range   of  values.   The   procedure   of   and   different   types   of   normalization   are   described   in   Section   5.1.  

 Regardless  of  the  type  of  the  extracted  feature,  segmentation  of  the  analyzed  signal  is   first  applied  to  set  appropriate  time  resolution  for  the  particular  analysis  and  recognition   tasks.  Segmentation  of  audio  piece  is  used  to  split  it  into  its  structural  components  such  as   vowels,  phrases,  notes,  bars,  etc.  [159].  It  is  also  commonly  used  for  the  purpose  of  analysis   of   varying   signals   to   achieve   more   detailed   information   within   time   domain.   During   parametrization   process,   segmentation   is   implemented,   i.e.   to   avoid   bias   caused   by   fragments   of   silence,   observe   differences   between   fragments,   perform   proper   averaging   process.   Lengths   of   segments   as   well   as   their   overlap,   etc.   are   adjusted   to   match   the   requirements  of  the  specific  feature.  In  MER,  segmentation  is  used  not  only  for  smoothing   and  determining  whether  some  values  are  constant,  but  also  for  mood  tracking  [188].