Skip to main content
Top
Published in: World Wide Web 5/2017

05-01-2017

Top-k temporal keyword search over social media data

Authors: Fan Xia, Chengcheng Yu, Linhao Xu, Weining Qian, Aoying Zhou

Published in: World Wide Web | Issue 5/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Social media services have already become main sources for monitoring emerging topics and sensing real-life events. A social media platform manages social stream consisting of a huge volume of timestamped user generated data, including original data and repost data. However, previous research on keyword search over social media data mainly emphasizes on the recency of information. In this paper, we first propose a problem of top-k most significant temporal keyword query to enable more complex query analysis. It returns top-k most popular social items that contain the keywords in the given query time window. Then, we design a temporal inverted index with two-tiers posting list to index social time series and a segment store to compute the exact social significance of social items. Next, we implement a basic query algorithm based on our proposed index structure and give a detailed performance analysis on the query algorithm. From the analysis result, we further refine our query algorithm with a piecewise maximum approximation (PMA) sketch. Finally, extensive empirical studies on a real-life microblog dataset demonstrate the combination of two-tiers posting list and PMA sketch achieves remarkable performance improvement under different query settings.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
Literature
1.
go back to reference Anand, A., Bedathur, S.J., Berberich, K., Schenkel, R.: Efficient Temporal Keyword Search over Versioned Text. In: CIKM, pp. 699–708 (2010) Anand, A., Bedathur, S.J., Berberich, K., Schenkel, R.: Efficient Temporal Keyword Search over Versioned Text. In: CIKM, pp. 699–708 (2010)
3.
go back to reference Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A time machine for text search. SIGIR, 519 (2007) Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A time machine for text search. SIGIR, 519 (2007)
4.
go back to reference Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: Real-Time Search at Twitter. In: ICDE, pp. 1360–1369 (2012) Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: Real-Time Search at Twitter. In: ICDE, pp. 1360–1369 (2012)
5.
go back to reference Chakrabarti, K., Keogh, E.J., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. 27(2), 188–228 (2002)CrossRef Chakrabarti, K., Keogh, E.J., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. 27(2), 188–228 (2002)CrossRef
6.
go back to reference Chen, C., Li, F., Ooi, B.C., Wu, S.: Ti: an Efficient Indexing Mechanism for Real-Time Search on Tweets. In: SIGMOD Conference, pp. 649–660 (2011) Chen, C., Li, F., Ooi, B.C., Wu, S.: Ti: an Efficient Indexing Mechanism for Real-Time Search on Tweets. In: SIGMOD Conference, pp. 649–660 (2011)
7.
go back to reference Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable Pla for Efficient Similarity Search. In: VLDB, pp. 435–446 (2007) Chen, Q., Chen, L., Lian, X., Liu, Y., Yu, J.X.: Indexable Pla for Efficient Similarity Search. In: VLDB, pp. 435–446 (2007)
8.
go back to reference Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
9.
go back to reference Fuchs, E., Gruber, T., Nitschke, J., Sick, B.: Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2232–2245 (2010)CrossRef Fuchs, E., Gruber, T., Nitschke, J., Sick, B.: Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2232–2245 (2010)CrossRef
10.
go back to reference Gao, M., Jin, C., Qian, W., Gong, X.: Real-Time Search over a Microblogging System. In: CGC, pp. 352–359 (2012) Gao, M., Jin, C., Qian, W., Gong, X.: Real-Time Search over a Microblogging System. In: CGC, pp. 352–359 (2012)
11.
go back to reference He, J., Suel, T.: Faster temporal range queries over versioned text. SIGIR, 565 (2011) He, J., Suel, T.: Faster temporal range queries over versioned text. SIGIR, 565 (2011)
12.
go back to reference Hitt, M.A., Anderson, C.: The long tail: Why the future of business is selling less of more (2007) Hitt, M.A., Anderson, C.: The long tail: Why the future of business is selling less of more (2007)
13.
go back to reference Huang, X., Cheng, H., Li, R.H., Qin, L., Yu, J.X.: Top-k structural diversity search in large networks. Proceedings of the VLDB Endowment 6(13), 1618–1629 (2013)CrossRef Huang, X., Cheng, H., Li, R.H., Qin, L., Yu, J.X.: Top-k structural diversity search in large networks. Proceedings of the VLDB Endowment 6(13), 1618–1629 (2013)CrossRef
14.
go back to reference Huo, W., Tsotras, V.J.: A Comparison of Top-K Temporal Keyword Querying over Versioned Text Collections. In: Database and Expert Systems Applications, pp. 360–374. Springer (2012) Huo, W., Tsotras, V.J.: A Comparison of Top-K Temporal Keyword Querying over Versioned Text Collections. In: Database and Expert Systems Applications, pp. 360–374. Springer (2012)
15.
go back to reference Jestes, J., Phillips, J.M., Li, F., Tang, M.: Ranking large temporal data. PVLDB 5(11), 1412–1423 (2012) Jestes, J., Phillips, J.M., Li, F., Tang, M.: Ranking large temporal data. PVLDB 5(11), 1412–1423 (2012)
16.
go back to reference Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)CrossRefMATH
17.
go back to reference Keogh, E.J., Chu, S., Hart, D.M., Pazzani, M.J.: An Online Algorithm for Segmenting Time Series. In: ICDM, pp. 289–296 (2001) Keogh, E.J., Chu, S., Hart, D.M., Pazzani, M.J.: An Online Algorithm for Segmenting Time Series. In: ICDM, pp. 289–296 (2001)
18.
go back to reference Lemire, D.: A Better Alternative to Piecewise Linear Time Series Segmentation. In: SDM, pp. 545–550 (2007) Lemire, D.: A Better Alternative to Piecewise Linear Time Series Segmentation. In: SDM, pp. 545–550 (2007)
19.
go back to reference Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDB J. 19(5), 715–733 (2010)CrossRef Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDB J. 19(5), 715–733 (2010)CrossRef
20.
go back to reference Li, J., Liu, C., Liu, B., Mao, R., Wang, Y., Chen, S., Yang, J.J., Pan, H., Wang, Q.: Diversity-aware retrieval of medical records. Comput. Ind. 69, 81–91 (2015)CrossRef Li, J., Liu, C., Liu, B., Mao, R., Wang, Y., Chen, S., Yang, J.J., Pan, H., Wang, Q.: Diversity-aware retrieval of medical records. Comput. Ind. 69, 81–91 (2015)CrossRef
21.
go back to reference Li, Y., Bao, Z., Li, G., Tan, K.L.: Real Time Personalized Search on Social Networks. In: ICDE, pp. 639–650. IEEE (2015) Li, Y., Bao, Z., Li, G., Tan, K.L.: Real Time Personalized Search on Social Networks. In: ICDE, pp. 639–650. IEEE (2015)
22.
go back to reference Ma, H., Qian, W., Xia, F., He, X., Xu, J., Zhou, A.: Towards modeling popularity of microblogs. Frontiers of Computer Science 7(2), 171–184 (2013)MathSciNetCrossRef Ma, H., Qian, W., Xia, F., He, X., Xu, J., Zhou, A.: Towards modeling popularity of microblogs. Frontiers of Computer Science 7(2), 171–184 (2013)MathSciNetCrossRef
23.
go back to reference O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)CrossRefMATH O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)CrossRefMATH
24.
go back to reference Teevan, J., Ramage, D., Morris, M.R.: #Twittersearch: a Comparison of Microblog Search and Web Search. In: WSDM, pp. 35–44 (2011) Teevan, J., Ramage, D., Morris, M.R.: #Twittersearch: a Comparison of Microblog Search and Web Search. In: WSDM, pp. 35–44 (2011)
26.
go back to reference Wang, J., Huang, J.Z., Guo, J., Lan, Y.: Recommending high-utility search engine queries via a queryrecommending model. Neurocomputing 167, 195–208 (2015)CrossRef Wang, J., Huang, J.Z., Guo, J., Lan, Y.: Recommending high-utility search engine queries via a queryrecommending model. Neurocomputing 167, 195–208 (2015)CrossRef
27.
go back to reference Wu, L., Lin, W., Xiao, X., Xu, Y.: Lsii: an Indexing Structure for Exact Real-Time Search on Microblogs. In: ICDE, pp. 482–493 (2013) Wu, L., Lin, W., Xiao, X., Xu, Y.: Lsii: an Indexing Structure for Exact Real-Time Search on Microblogs. In: ICDE, pp. 482–493 (2013)
28.
go back to reference Xia, F., Yu, C., Qian, W., Zhou, A.: Top-K Temporal Keyword Query over Social Media Data. In: Asia-Pacific Web Conference, pp. 183–195. Springer (2016) Xia, F., Yu, C., Qian, W., Zhou, A.: Top-K Temporal Keyword Query over Social Media Data. In: Asia-Pacific Web Conference, pp. 183–195. Springer (2016)
29.
go back to reference Xu, Z., Zhang, R., Ramamohanarao, K., Parampalli, U.: An Adaptive Algorithm for Online Time Series Segmentation with Error Bound Guarantee. In: EDBT, pp. 192–203 (2012) Xu, Z., Zhang, R., Ramamohanarao, K., Parampalli, U.: An Adaptive Algorithm for Online Time Series Segmentation with Error Bound Guarantee. In: EDBT, pp. 192–203 (2012)
Metadata
Title
Top-k temporal keyword search over social media data
Authors
Fan Xia
Chengcheng Yu
Linhao Xu
Weining Qian
Aoying Zhou
Publication date
05-01-2017
Publisher
Springer US
Published in
World Wide Web / Issue 5/2017
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-016-0430-0

Other articles of this Issue 5/2017

World Wide Web 5/2017 Go to the issue

Premium Partner