Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2019 | OriginalPaper | Chapter

8. Mechanisms for Profiling

Author: Rita Singh

Published in: Profiling Humans from their Voice

Publisher: Springer Singapore

share
SHARE

Abstract

So how is profiling actually done? Most of this book has been dedicated to developing the basic understanding needed for it. We have seen that the knowledge of how a parameter affects the vocal production mechanism can help us identify the most relevant representations from which we may extract the information needed for profiling. We have also seen how such knowledge can help us reason out why certain parameters may exert confusable influences on the voice signal. All of this knowledge can then help us design more targeted methods to discover features that are highly effective for profiling.
Footnotes
1
The variables are more appropriately called explanatory variables, since they may not be independent of one another.
 
Literature
1.
go back to reference Gath, I., & Yair, E. (1988). Analysis of vocal tract parameters in Parkinsonian speech. The Journal of the Acoustical Society of America, 84(5), 1628–1634. CrossRef Gath, I., & Yair, E. (1988). Analysis of vocal tract parameters in Parkinsonian speech. The Journal of the Acoustical Society of America, 84(5), 1628–1634. CrossRef
2.
go back to reference Grenier, Y., & Omnes-Chevalier, M. C. (1988). Autoregressive models with time-dependent log area ratios. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(10), 1602–1612. MATHCrossRef Grenier, Y., & Omnes-Chevalier, M. C. (1988). Autoregressive models with time-dependent log area ratios. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(10), 1602–1612. MATHCrossRef
3.
go back to reference Adeli, H., & Hung, S. L. (1994). Machine learning: Neural networks, genetic algorithms, and fuzzy systems. New Jersey: Wiley. MATH Adeli, H., & Hung, S. L. (1994). Machine learning: Neural networks, genetic algorithms, and fuzzy systems. New Jersey: Wiley. MATH
4.
go back to reference Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer. Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer.
5.
go back to reference Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26(3), 159–190. CrossRef Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26(3), 159–190. CrossRef
6.
7.
go back to reference Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323. CrossRef Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323. CrossRef
8.
go back to reference Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320. CrossRef Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320. CrossRef
9.
go back to reference Wilhelms-Tricarico, R. (1995). Physiological modeling of speech production: Methods for modeling soft-tissue articulators. The Journal of the Acoustical Society of America, 97(5), 3085–3098. CrossRef Wilhelms-Tricarico, R. (1995). Physiological modeling of speech production: Methods for modeling soft-tissue articulators. The Journal of the Acoustical Society of America, 97(5), 3085–3098. CrossRef
10.
go back to reference Steinecke, I., & Herzel, H. (1995). Bifurcations in an asymmetric vocal-fold model. The Journal of the Acoustical Society of America, 97(3), 1874–1884. CrossRef Steinecke, I., & Herzel, H. (1995). Bifurcations in an asymmetric vocal-fold model. The Journal of the Acoustical Society of America, 97(3), 1874–1884. CrossRef
11.
go back to reference Deng, L. (1999). Computational models for speech production. In Computational Models of Speech Pattern Processing (K. Ponting Ed.) (pp. 199–213). Berlin: Springer. MATHCrossRef Deng, L. (1999). Computational models for speech production. In Computational Models of Speech Pattern Processing (K. Ponting Ed.) (pp. 199–213). Berlin: Springer. MATHCrossRef
12.
go back to reference Alipour, F., Berry, D. A., & Titze, I. R. (2000). A finite-element model of vocal-fold vibration. The Journal of the Acoustical Society of America, 108(6), 3003–3012. CrossRef Alipour, F., Berry, D. A., & Titze, I. R. (2000). A finite-element model of vocal-fold vibration. The Journal of the Acoustical Society of America, 108(6), 3003–3012. CrossRef
13.
go back to reference Drechsel, J. S., & Thomson, S. L. (2008). Influence of supraglottal structures on the glottal jet exiting a two-layer synthetic, self-oscillating vocal fold model. The Journal of the Acoustical Society of America, 123(6), 4434–4445. CrossRef Drechsel, J. S., & Thomson, S. L. (2008). Influence of supraglottal structures on the glottal jet exiting a two-layer synthetic, self-oscillating vocal fold model. The Journal of the Acoustical Society of America, 123(6), 4434–4445. CrossRef
14.
go back to reference Sagisaka, Y., Campbell, N., & Higuchi, N. (Eds.). (2012). Computing prosody: Computational models for processing spontaneous speech. Berlin: Springer Science & Business Media. Sagisaka, Y., Campbell, N., & Higuchi, N. (Eds.). (2012). Computing prosody: Computational models for processing spontaneous speech. Berlin: Springer Science & Business Media.
15.
go back to reference Stouten, V. (2009). Automatic voice onset time estimation from reassignment spectra. Speech Communication, 51(12), 1194–1205. CrossRef Stouten, V. (2009). Automatic voice onset time estimation from reassignment spectra. Speech Communication, 51(12), 1194–1205. CrossRef
16.
go back to reference Lin, C. Y., & Wang, H. C. (2011). Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection. The Journal of the Acoustical Society of America, 130(1), 514–525. MathSciNetCrossRef Lin, C. Y., & Wang, H. C. (2011). Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection. The Journal of the Acoustical Society of America, 130(1), 514–525. MathSciNetCrossRef
17.
go back to reference Hansen, J. H., Gray, S. S., & Kim, W. (2010). Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Communication, 52(10), 777–789. CrossRef Hansen, J. H., Gray, S. S., & Kim, W. (2010). Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Communication, 52(10), 777–789. CrossRef
18.
go back to reference Sonderegger, M., & Keshet, J. (2012). Automatic measurement of voice onset time using discriminative structured prediction. The Journal of the Acoustical Society of America, 132(6), 3965–3979. CrossRef Sonderegger, M., & Keshet, J. (2012). Automatic measurement of voice onset time using discriminative structured prediction. The Journal of the Acoustical Society of America, 132(6), 3965–3979. CrossRef
19.
go back to reference Keshet, J., Shalev-Shwartz, S., Singer, Y., & Chazan, D. (2007). A large margin algorithm for speech-to-phoneme and music-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2373–2382. CrossRef Keshet, J., Shalev-Shwartz, S., Singer, Y., & Chazan, D. (2007). A large margin algorithm for speech-to-phoneme and music-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2373–2382. CrossRef
20.
go back to reference Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
21.
go back to reference Breiman, L. (2017). Classification and regression trees. Routledge Press, Taylor & Francis Group. Breiman, L. (2017). Classification and regression trees. Routledge Press, Taylor & Francis Group.
22.
go back to reference Torgo, L., & Gama, J. (1997). Regression using classification algorithms. Intelligent Data Analysis, 1(4), 275–292. CrossRef Torgo, L., & Gama, J. (1997). Regression using classification algorithms. Intelligent Data Analysis, 1(4), 275–292. CrossRef
24.
go back to reference Shashanka, M., Raj, B., & Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience. Article ID 947438. Shashanka, M., Raj, B., & Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience. Article ID 947438.
25.
go back to reference Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13 (T.K. Leen, T.G. Dietterich & V. Tresp (Eds.)), Proceedings of the Neural Information Processing Systems (NIPS) (pp. 556–562). Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13 (T.K. Leen, T.G. Dietterich & V. Tresp (Eds.)), Proceedings of the Neural Information Processing Systems (NIPS) (pp. 556–562).
26.
go back to reference Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Proceedings of the Twenty-Eighth Annual International Conference on Research and Development in Information Retrieval (SIGIR) (pp. 601–602). Salvador, Brazil: ACM. Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Proceedings of the Twenty-Eighth Annual International Conference on Research and Development in Information Retrieval (SIGIR) (pp. 601–602). Salvador, Brazil: ACM.
27.
go back to reference Cichocki, A., Zdunek, R., & Amari, S. I. (2006). Csiszar’s divergences for non-negative matrix factorization: Family of new algorithms. In International Conference on Independent Component Analysis and Blind Signal Separation (ICA) (pp. 32–39). Charleston, SC, USA. Cichocki, A., Zdunek, R., & Amari, S. I. (2006). Csiszar’s divergences for non-negative matrix factorization: Family of new algorithms. In International Conference on Independent Component Analysis and Blind Signal Separation (ICA) (pp. 32–39). Charleston, SC, USA.
28.
go back to reference Cichocki, A., Lee, H., Kim, Y. D., & Choi, S. (2008). Non-negative matrix factorization with \(\alpha \)-divergence. Pattern Recognition Letters, 29(9), 1433–1440. Cichocki, A., Lee, H., Kim, Y. D., & Choi, S. (2008). Non-negative matrix factorization with \(\alpha \)-divergence. Pattern Recognition Letters, 29(9), 1433–1440.
29.
go back to reference Heiler, M. & Schnörr, C. (2006). Controlling sparseness in non-negative tensor factorization. In Proceedings of the European Conference on Computer Vision (ECCV) (56–67). Graz, Austria. Heiler, M. & Schnörr, C. (2006). Controlling sparseness in non-negative tensor factorization. In Proceedings of the European Conference on Computer Vision (ECCV) (56–67). Graz, Austria.
30.
go back to reference Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074. CrossRef Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074. CrossRef
31.
go back to reference Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014). Deep learning for monaural speech separation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1562–1566). lorence, Italy: IEEE. Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014). Deep learning for monaural speech separation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1562–1566). lorence, Italy: IEEE.
32.
go back to reference Kumar, A. (2018). Acoustic Intelligence in Machines, Doctoral dissertation. School of Computer Science: Carnegie Mellon University. Kumar, A. (2018). Acoustic Intelligence in Machines, Doctoral dissertation. School of Computer Science: Carnegie Mellon University.
33.
go back to reference Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of the European Conference on Machine Learning (V. Barr, & Z. Markov (Eds.)) (pp. 4–15). Heidelberg: Springer. Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of the European Conference on Machine Learning (V. Barr, & Z. Markov (Eds.)) (pp. 4–15). Heidelberg: Springer.
34.
go back to reference Zhang, H. (2004). The optimality of naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS). Florida, USA: AAAI. Zhang, H. (2004). The optimality of naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS). Florida, USA: AAAI.
35.
go back to reference Ng, A. Y., & Jordan, M. I. (2002). On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems (pp. 841–848). Ng, A. Y., & Jordan, M. I. (2002). On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems (pp. 841–848).
36.
go back to reference Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Massachusetts, USA: MIT press. Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Massachusetts, USA: MIT press.
37.
go back to reference Ho, T.K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Vol. 1, pp. 278–282). Montreal, Canada: IEEE. Ho, T.K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Vol. 1, pp. 278–282). Montreal, Canada: IEEE.
38.
go back to reference Myers, R. H., & Myers, R. H. (1990). Classical and Modern Regression with Applications (Vol. 2). Belmont, California: Duxbury Press.Classical and Modern Regression with Applications. Myers, R. H., & Myers, R. H. (1990). Classical and Modern Regression with Applications (Vol. 2). Belmont, California: Duxbury Press.Classical and Modern Regression with Applications.
39.
go back to reference Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1), 63–76. MATHCrossRef Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1), 63–76. MATHCrossRef
40.
go back to reference Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In Proceedings of the European Conference on Machine Learning (ECML) (pp. 241–252). Cavtat-Dubrovnik, Coratia. Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In Proceedings of the European Conference on Machine Learning (ECML) (pp. 241–252). Cavtat-Dubrovnik, Coratia.
41.
go back to reference Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. MATHCrossRef Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. MATHCrossRef
42.
go back to reference Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499. MathSciNetMATHCrossRef Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499. MathSciNetMATHCrossRef
43.
44.
go back to reference Bedard, A., & Georges, T. (2000). Atmospheric infrasound. Acoustics Australia, 28(2), 47–52. Bedard, A., & Georges, T. (2000). Atmospheric infrasound. Acoustics Australia, 28(2), 47–52.
Metadata
Title
Mechanisms for Profiling
Author
Rita Singh
Copyright Year
2019
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-8403-5_8