Skip to main content

2016 | OriginalPaper | Buchkapitel

KPCA-WT: An Efficient Framework for High Quality Microblog Extraction in Time-Frequency Domain

verfasst von : Min Peng, Xinyuan Dai, Kai Zhang, Guanyin Zeng, Jiahui Zhu, Shuang Ouyang, Qianqian Xie, Gang Tian

Erschienen in: Web-Age Information Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Massive social event relevant messages are generated in online social media, which makes the filtering and screening a great challenge. In order to obtain massages with high quality, a high quality information extraction framework based on kernel principal component analysis and wavelet transformation (KPCA-WT) is proposed. First, based on multiple features fusion, we design an algorithm to extract the microblogs of high quality, which transforms the features into wavelet domain to capture the detailed differences between the feature signals. Then the weights of the features are evaluated by EM algorithm and fused further to get a comprehensive value of each message. In addition, to reduce the effect of noisy features and speed up the operation, these features are processed through kernel principal component analysis before transforming into wavelet domain. Experimental results show that the proposed framework can extract information with higher quality, less redundancy, and greatly reduce the time consumption.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Peng, M., Zhu, J., Li, X., et al.: Central topic model for event-oriented topics mining in microblog stream. In: CIKM 2015, pp. 1611–1620 (2015) Peng, M., Zhu, J., Li, X., et al.: Central topic model for event-oriented topics mining in microblog stream. In: CIKM 2015, pp. 1611–1620 (2015)
2.
Zurück zum Zitat Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part II. LNCS, vol. 8181, pp. 188–201. Springer, Heidelberg (2013)CrossRef Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part II. LNCS, vol. 8181, pp. 188–201. Springer, Heidelberg (2013)CrossRef
3.
Zurück zum Zitat Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)MathSciNetMATH Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)MathSciNetMATH
4.
Zurück zum Zitat Scholkopf, B., Smola, A., Mller, K.R.: Kernel principal component analysis. In: ICANN 1997, pp. 583–588 (1997) Scholkopf, B., Smola, A., Mller, K.R.: Kernel principal component analysis. In: ICANN 1997, pp. 583–588 (1997)
5.
Zurück zum Zitat O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: exploratory search and topic summarization for twitter. In: ICWSM 2010, pp. 384–385 (2010) O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: exploratory search and topic summarization for twitter. In: ICWSM 2010, pp. 384–385 (2010)
6.
Zurück zum Zitat Yang, X., Ghoting, A., Ruan, Y., et al.: A framework for summarizing and analyzing twitter feeds. In: KDD 2012, pp. 370–378 (2012) Yang, X., Ghoting, A., Ruan, Y., et al.: A framework for summarizing and analyzing twitter feeds. In: KDD 2012, pp. 370–378 (2012)
7.
Zurück zum Zitat Sharifi, B., Hutton, M.A., Kalita, J.K.: Experiments in microblog summarization. In: SocialCom 2010, pp. 49–56 (2010) Sharifi, B., Hutton, M.A., Kalita, J.K.: Experiments in microblog summarization. In: SocialCom 2010, pp. 49–56 (2010)
8.
Zurück zum Zitat Takamura, H., Yokono, H., Okumura, M.: Summarizing a document stream. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 177–188. Springer, Heidelberg (2011)CrossRef Takamura, H., Yokono, H., Okumura, M.: Summarizing a document stream. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 177–188. Springer, Heidelberg (2011)CrossRef
9.
Zurück zum Zitat Zhu, J., et al.: Coherent topic hierarchy: a strategy for topic evolutionary analysis on microblog feeds. In: Li, J., Sun, Y., Yu, X., Sun, Y., Dong, X.L., Dong, X.L. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 70–82. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21042-1_6 CrossRef Zhu, J., et al.: Coherent topic hierarchy: a strategy for topic evolutionary analysis on microblog feeds. In: Li, J., Sun, Y., Yu, X., Sun, Y., Dong, X.L., Dong, X.L. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 70–82. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-21042-1_​6 CrossRef
10.
Zurück zum Zitat Chen, Y., Cheng, X., Yang, S.: Finding high quality threads in web forums. J. Softw. 22(8), 1785–1804 (2011)CrossRef Chen, Y., Cheng, X., Yang, S.: Finding high quality threads in web forums. J. Softw. 22(8), 1785–1804 (2011)CrossRef
11.
Zurück zum Zitat Xi, W., Lind, J., Brill, E.: Learning effective ranking functions for newsgroup search. In: SIGIR 2004, pp. 394–401 (2004) Xi, W., Lind, J., Brill, E.: Learning effective ranking functions for newsgroup search. In: SIGIR 2004, pp. 394–401 (2004)
12.
Zurück zum Zitat Ghose, A., Ipeirotis, P.G.: Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. TKDE 23(10), 1498–1512 (2011) Ghose, A., Ipeirotis, P.G.: Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. TKDE 23(10), 1498–1512 (2011)
13.
Zurück zum Zitat Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: NIST SP, pp. 243–243 (1994) Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: NIST SP, pp. 243–243 (1994)
14.
Zurück zum Zitat Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150 (2003) Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150 (2003)
15.
Zurück zum Zitat Fan, W., Gordon, M.D., Pathak, P.: A generic ranking function discovery framework by genetic programming for information retrieval. Inf. Process. Manage. 40(4), 587–602 (2004)CrossRefMATH Fan, W., Gordon, M.D., Pathak, P.: A generic ranking function discovery framework by genetic programming for information retrieval. Inf. Process. Manage. 40(4), 587–602 (2004)CrossRefMATH
16.
Zurück zum Zitat He, Q., Chang, K., Lim, E.P.: Analyzing feature trajectories for event detection. In: SIGIR 2007, pp. 207–214 (2007) He, Q., Chang, K., Lim, E.P.: Analyzing feature trajectories for event detection. In: SIGIR 2007, pp. 207–214 (2007)
17.
Zurück zum Zitat Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematic, Philadelphia (1992)CrossRefMATH Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematic, Philadelphia (1992)CrossRefMATH
18.
Zurück zum Zitat Chipman, H.A., Kolaczyk, E.D., McCulloch, R.E.: Adaptive bayesian wavelet shrinkage. J. Am. Stat. Assoc. 92(440), 1413–1421 (1977)CrossRefMATH Chipman, H.A., Kolaczyk, E.D., McCulloch, R.E.: Adaptive bayesian wavelet shrinkage. J. Am. Stat. Assoc. 92(440), 1413–1421 (1977)CrossRefMATH
19.
Zurück zum Zitat Burstei, J., Wolska, M.: Toward evaluation of writing style: finding overly repetitive word use in student essays. In: EACL 2003, pp. 35–42 (2003) Burstei, J., Wolska, M.: Toward evaluation of writing style: finding overly repetitive word use in student essays. In: EACL 2003, pp. 35–42 (2003)
20.
Zurück zum Zitat Becker, H., Naaman, M., Gravano, L.: Selecting quality twitter content for events. In: ICWSM 2011 (2011) Becker, H., Naaman, M., Gravano, L.: Selecting quality twitter content for events. In: ICWSM 2011 (2011)
Metadaten
Titel
KPCA-WT: An Efficient Framework for High Quality Microblog Extraction in Time-Frequency Domain
verfasst von
Min Peng
Xinyuan Dai
Kai Zhang
Guanyin Zeng
Jiahui Zhu
Shuang Ouyang
Qianqian Xie
Gang Tian
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-39958-4_24

Neuer Inhalt