Top

Neural Computing and Applications

Published in:

28-08-2017 | S.I. : EANN 2016

Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns

Author: Petr Hájek

Published in: Neural Computing and Applications | Issue 7/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Automated textual analysis of firm-related documents has become an important decision support tool for stock market investors. Previous studies tended to adopt either dictionary-based or machine learning approach. Nevertheless, little is known about their concurrent use. Here we use the combination of financial indicators, readability, sentiment categories, and bag-of-words (BoW) to increase prediction accuracy. This paper aims to extract both sentiment and BoW information from the annual reports of US firms. The sentiment analysis is based on two commonly used dictionaries, namely a general dictionary Diction 7.0 and a finance-specific dictionary proposed by Loughran and McDonald (J Finance 66:35–65, 2011. doi:10.1111/j.1540-6261.2010.01625.x). The BoW are selected according to their tf–idf. We combine these features with financial indicators to predict abnormal stock returns using a multilayer perceptron neural network with dropout regularization and rectified linear units. We show that this method performs similarly as naïve Bayes and outperforms other machine learning algorithms (support vector machine, C4.5 decision tree, and k-nearest neighbour classifier) in predicting positive/negative abnormal stock returns in terms of ROC. We also show that the quality of the prediction significantly increased when using the correlation-based feature selection of BoW. This prediction performance is robust to industry categorization and event window.

previous article 2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

next article A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Loughran T, Mcdonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Finance 66:35–65. doi:10.1111/j.1540-6261.2010.01625.x CrossRef

Henry E (2008) Are investors influenced by how earnings press releases are written? J Bus Commun 45:363–407. doi:10.1177/0021943608319388 CrossRef

Tetlock PC, Saar-Tsechansky M, MacSkassy S (2008) More than words: quantifying language to measure firms’ fundamentals. J Finance 63:1437–1467. doi:10.1111/j.1540-6261.2008.01362.x CrossRef

Doran JS, Peterson DR, Price SM (2012) Earnings conference call content and stock price: the case of REITs. J Real Estate Finance Econ 45:402–434. doi:10.1007/s11146-010-9266-z CrossRef

Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of Internet stock message boards. J Finance 59:1259–1294. doi:10.1111/j.1540-6261.2004.00662.x CrossRef

Tetlock PC (2007) Giving content to investor sentiment: the role of media in the stock market. J Finance 62:1139–1168. doi:10.1111/j.1540-6261.2007.01232.x CrossRef

Bodnaruk A, Loughran T, McDonald B (2015) Using 10-K text to gauge financial constraints. J Finance Quant Anal 50:623–646. doi:10.2139/ssrn.2331544 CrossRef

Myskova R, Hajek P (2016) The effect of managerial sentiment on market-to-book ratio. Transform Bus Econ 15:80–96

Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud: a comparative study of machine learning methods. Knowl Based Syst 128:139–152. doi:10.1016/j.knosys.2017.05.001 CrossRef

10.

Hajek P, Olej V (2013) Evaluating sentiment in annual reports for financial distress prediction using neural networks and support vector machines. In: Iliadis L, Papadopoulos H, Jayne C (eds) Communications in computer and information science. Springer, Berlin, pp 1–10

11.

Hajek P, Olej V, Myskova R (2014) Forecasting corporate financial performance using sentiment in annual reports for stakeholders’ decision-making. Technol Econ Dev Econ 20:721–738. doi:10.3846/20294913.2014.979456 CrossRef

12.

Hajek P, Olej V (2016) Intuitionistic neuro-fuzzy network with evolutionary adaptation. Evol Syst. doi:10.1007/s12530-016-9157-5

13.

Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Support Syst 55:685–697. doi:10.1016/j.dss.2013.02.006 CrossRef

14.

Kearney C, Liu S (2014) Textual sentiment in finance: a survey of methods and models. Int Rev Finance Anal 33:171–185. doi:10.1016/j.irfa.2014.02.006 CrossRef

15.

Khadjeh Nassirtoussi A, Aghabozorgi S, Ying Wah T, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41:7653–7670. doi:10.1016/j.eswa.2014.06.009 CrossRef

16.

Loughran T, Mcdonald B (2016) Textual analysis in accounting and finance: a survey. J Account Res 54:1187–1230. doi:10.1111/1475-679X.12123 CrossRef

17.

Huang AH, Zang AZ, Zheng R (2014) Evidence on the information content of text in analyst reports. Acc Rev 89:2151–2180. doi:10.2308/accr-50833 CrossRef

18.

Li F (2006) Do stock market investors understand the risk sentiment of corporate annual reports? Gene. doi:10.2139/ssrn.898181

19.

Li F (2008) Annual report readability, current earnings, and earnings persistence. J Account Econ 45:221–247. doi:10.1016/j.jacceco.2008.02.003 CrossRef

20.

Feldman R, Govindaraj S, Livnat J, Segal B (2010) Management’s tone change, post earnings announcement drift and accruals. Rev Acc Stud 15:915–953. doi:10.1007/s11142-009-9111-x CrossRef

21.

Davis AK, Tama-Sweet I (2012) Managers’ use of language across alternative disclosure outlets: earnings press releases versus MD&A. Contempl Acc Res 29:804–837. doi:10.1111/j.1911-3846.2011.01125.x CrossRef

22.

Balakrishnan R, Qiu XY, Srinivasan P (2010) On the predictive ability of narrative disclosures in annual reports. Eur J Oper Res 202:789–801. doi:10.1016/j.ejor.2009.06.023 CrossRefMATH

23.

Butler M, Kešelj V (2009) Financial forecasting using character n-gram analysis and readability scores of annual reports. In: Gao Y, Japkowicz N (eds) Lecture notes in computer science. Springer, Berlin, pp 39–51

24.

Hart RP (2001) Redeveloping DICTION: theoretical considerations (new). In: West MD (ed) Theory, method, and practice in computer content analysis. CT Ablex, Westport, pp 43–60

25.

Short JC, Palmer TB (2008) The application of DICTION to content analysis research in strategic management. Organ Res Methods 11:727–752. doi:10.1177/1094428107304534 CrossRef

26.

Price SM, Doran JS, Peterson DR, Bliss BA (2012) Earnings conference calls and stock returns: the incremental informativeness of textual tone. J Bank Finance 36:992–1011. doi:10.1016/j.jbankfin.2011.10.013 CrossRef

27.

Hinton GE, Srivastava N, Krizhevsky A, et al (2012) Improving neural networks by preventing co-adaptation of feature detectors, pp 1–18. ArXiv e-prints: arXiv:1207.0580

28.

Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inf Technol 1:4–20. doi:10.4304/jait.1.1.4-20

29.

Hajek P, Bohacova J (2016) Predicting abnormal bank stock returns using textual analysis of annual reports: a neural network approach. In: Jayne C, Iliadis L (eds) Communications in computer and information science. Springer, Aberdeen, pp 67–78

30.

Demers E, Vega C (2014) Understanding the role of managerial optimism and uncertainty in the price formation process: evidence from the textual content of earnings announcements. doi:http://dx.doi.org/10.2139/ssrn.1152326

31.

Li F (2010) The information content of forward-looking statements in corporate filings: a Naïve Bayesian machine learning approach. J Acc Res 48:1049–1102. doi:10.1111/j.1475-679X.2010.00382.x CrossRef

32.

Demers E, Vega C (2010) Soft information in earnings announcements: News or noise? INSEAD Bus Sch World. doi:10.2139/ssrn.1153450

33.

Huang X, Teoh SH, Zhang Y (2014) Tone management. Acc Rev 89:1083–1113. doi:10.2308/accr-50684 CrossRef

34.

Davis AK, Piger JM, Sedor LM (2012) Beyond the numbers: measuring the information content of earnings press release language. Contempl Acc Res 29:845–868. doi:10.1111/j.1911-3846.2011.01130.x CrossRef

35.

Henry E, Leone AJ (2016) Measuring qualitative information in capital markets research: comparison of alternative methodologies to measure disclosure tone. Acc Rev 91:153–178. doi:10.2308/accr-51161 CrossRef

36.

Li X, Huang X, Deng X, Zhu S (2014) Enhancing quantitative intra-day stock return prediction by integrating both market news and stock prices information. Neurocomputing 142:228–238. doi:10.1016/j.neucom.2014.04.043 CrossRef

37.

Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news. ACM Trans Inf Syst 27:1–19. doi:10.1145/1462198.1462204 CrossRef

38.

Geva T, Zahavi J (2014) Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news. Decis Support Syst 57:212–223. doi:10.1016/j.dss.2013.09.013 CrossRef

39.

Engelberg JE, Reed AV, Ringgenberg MC (2012) How are shorts informed? Short sellers, news, and information processing. J Finance Econ 105:260–278. doi:10.1016/j.jfineco.2012.03.001 CrossRef

40.

García D (2013) Sentiment during recessions. J Finance 68:1267–1300. doi:10.1111/jofi.12027 CrossRef

41.

Li Q, Wang T, Li P et al (2014) The effect of news and public mood on stock movements. Inf Sci (Ny) 278:826–840. doi:10.1016/j.ins.2014.03.096 CrossRef

42.

Schumaker RP, Zhang Y, Huang CN, Chen H (2012) Evaluating sentiment in financial news articles. Decis Support Syst 53:458–464. doi:10.1016/j.dss.2012.03.001 CrossRef

43.

Li Q, Wang T, Gong Q et al (2014) Media-aware quantitative trading based on public Web information. Decis Support Syst 61:93–105. doi:10.1016/j.dss.2014.01.013 CrossRef

44.

Yu Y, Duan W, Cao Q (2013) The impact of social and conventional media on firm equity value: a sentiment analysis approach. Decis Support Syst 55:919–926. doi:10.1016/j.dss.2012.12.028 CrossRef

45.

Kothari SP, Li X, Short JE (2009) The effect of disclosures by management, analysts, and business press on cost of capital, return volatility, and analyst forecasts: a study using content analysis. Acc Rev 84:1639–1670. doi:10.2308/accr.2009.84.5.1639 CrossRef

46.

Hanley KW, Hoberg G (2010) The information content of IPO prospectuses. Rev Finance Stud 23:2821–2864. doi:10.1093/rfs/hhq024 CrossRef

47.

Mayew WJ, Venkatachalam M (2012) The power of voice: managerial affective states and future firm performance. J Finance 67:1–44. doi:10.1111/j.1540-6261.2011.01705.x CrossRef

48.

Li X, Xie H, Chen L et al (2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69:14–23. doi:10.1016/j.knosys.2014.04.022 CrossRef

49.

Wisniewski TP, Yekini LS (2015) Stock market returns and the content of annual report narratives. Acc Forum 39:281–294. doi:10.1016/j.accfor.2015.09.001 CrossRef

50.

Feuerriegel S, Ratku A (2016) Analysis of how underlying topics in financial news affect stock prices using latent dirichlet allocation. In: Bui TX, Sprague RH (eds) 49th Hawaii international conference on system sciences. IEEE, Kauai, pp 1072–1081

51.

Fama EF, French KR (1993) Common risk factors in the returns on stocks and bonds. J Finance Econ 33:3–56. doi:10.1016/0304-405X(93)90023-5 CrossRefMATH

52.

Loughran T, Mcdonald B (2014) Measuring readability in financial disclosures. J Finance 69:1643–1671. doi:10.1111/jofi.12162 CrossRef

53.

De Franco G, Hope OK, Vyas D, Zhou Y (2015) Analyst report readability. Contempl Acc Res 32:76–104. doi:10.1111/1911-3846.12062 CrossRef

54.

Escalante H, Ponce-López V, Escalera S (2016) Evolving weighting schemes for the bag of visual words. Neural Comput Appl. doi:10.1007/s00521-016-2223-x

55.

Dhillon IS, Mallela S, Kumar R (2003) A divisive information-theoretic feature Clustering algorithm for text classification. J Mach Learn Res 3:1265–1287. doi:10.1162/153244303322753661 MathSciNetMATH

56.

Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:491–502. doi:10.1109/TKDE.2005.66 CrossRef

57.

Hajek P, Michalak K (2013) Feature selection in corporate credit rating prediction. Knowl Based Syst 51:72–84. doi:10.1016/j.knosys.2013.07.008 CrossRef

58.

Glezakos TJ, Tsiligiridis TA, Iliadis LS et al (2009) Feature extraction for time-series data: an artificial neural network evolutionary training model for the management of mountainous watersheds. Neurocomputing 73:49–59. doi:10.1016/j.neucom.2008.08.024 CrossRef

59.

Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Machine learning working then conference, pp 412–420

60.

Li Z, Lu W, Sun Z, Xing W (2016) A parallel feature selection method study for text classification. Neural Comput Appl. doi:10.1007/s00521-016-2351-3

61.

Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224. doi:10.1145/1014052.1014149 MathSciNetMATH

62.

Crain SP, Zhou K, Yang S-H, Zha H (2012) Dimensionality reduction and topic modeling: from latent semantic Indexing to latent dirichlet allocation and beyond. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, New, pp 129–161CrossRef

63.

Egozi O, Markovitch S, Gabrilovich E (2011) Concept-based information retrieval using explicit semantic analysis. ACM Trans Inf Syst 29:1–34. doi:10.1145/1961209.1961211 CrossRef

64.

Nam J, Kim J, Loza Mencía E et al (2014) Large-scale multi-label text classification: revisiting neural networks. In: Calders T, Esposito F, Hullermeier E, Meo R (eds) Lecture notes in computer science. Springer, Berlin, pp 437–452

65.

Barrow E, Eastwood M, Jayne C (2016) Selective dropout for deep neural networks. In: Akira H, Seiichi O, Doya K et al (eds) International conference on neural information processing. Springer, Kyoto, pp 519–528CrossRef

66.

Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958. doi:10.1214/12-AOS1000 MathSciNetMATH

67.

Wu H, Gu X (2015) Towards dropout training for convolutional neural networks. Neural Netw 71:1–10. doi:10.1016/j.neunet.2015.07.007 CrossRef

68.

Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Dasgupta S, McAllester D et al (eds) Proceedings of the 30th international conference on machine learning. JMLR, Atlanta, pp 1–6

69.

Jaitly N, Hinton G (2011) Learning a better representation of speech soundwaves using restricted boltzmann machines. In: ICASSP on IEEE international conference on acoustics, speech and signal processing. IEEE, Prague, pp 5884–5887

70.

Chawla NV, Japkowicz N, Drive P (2004) Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6:1–6. doi:10.1145/1007730.1007733 CrossRef

71.

Taddy M (2013) Multinomial inverse regression for text analysis. J Am Stat Assoc 108:755–770. doi:10.1080/01621459.2012.734168 MathSciNetCrossRefMATH

72.

Taddy M (2015) Document classification by inversion of distributed language representations. In: Proceedings of the 53rd annual meeting of the association for computational linguistics, pp 45–49

73.

Wong FMF, Liu Z, Chiang M (2014) Stock market prediction from WSJ: Text mining via sparse matrix factorization. In: 2014 IEEE international conference on data mining. IEEE, pp 430–439

74.

Sun A, Lachanski M, Fabozzi FJ (2016) Trade the tweet: social media text mining and sparse matrix factorization for stock market prediction. Int Rev Finance Anal 48:272–281. doi:10.1016/j.irfa.2016.10.009 CrossRef

75.

Guay W, Samuels D, Taylor D (2016) Guiding through the fog: financial statement complexity and voluntary disclosure. J Acc Econ 62:234–269. doi:10.1016/j.jacceco.2016.09.001 CrossRef

76.

Fama EF, French KR (2012) Size, value, and momentum in international stock returns. J Finance Econ 105:457–472. doi:10.1016/j.jfineco.2012.05.011 CrossRef

77.

Yin L, Ge Y, Xiao K et al (2013) Feature selection for high-dimensional imbalanced data. Neurocomputing 105:3–11. doi:10.1016/j.neucom.2012.04.039 CrossRef

78.

Tang D, Wei F, Yang N, et al (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual meeting of the association for computational linguistics. Association for Computational Linguistics, Baltimore, pp 1555–1565

79.

Wang P, Xu B, Xu J et al (2016) Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174:806–814. doi:10.1016/j.neucom.2015.09.096 CrossRef

80.

Allee KD, DeAngelis MD (2015) The structure of voluntary disclosure narratives: evidence from tone dispersion. J Acc Res 53:241–274. doi:10.1111/1475-679X.12072 CrossRef

81.

Thenmozhi M, Sarath Chand G (2016) Forecasting stock returns based on information transmission across global markets using support vector machines. Neural Comput Appl. doi:10.1007/s00521-015-1897-9

Title: Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns
Author: Petr Hájek
Publication date: 28-08-2017
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 7/2018
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-3194-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 7/2018

Fast-flux hunter: a system for filtering online fast-flux botnet

Adaptive finite-time control of a class of non-triangular nonlinear systems with input saturation

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

Neutrosophic triplet group

FuSSFFra, a fuzzy semi-supervised forecasting framework: the case of the air pollution in Athens

Constructive lower bounds on model complexity of shallow perceptron networks

Premium Partner