Skip to main content

2020 | OriginalPaper | Buchkapitel

Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

verfasst von : Asia J. Biega, Jana Schmidt, Rishiraj Saha Roy

Erschienen in: Advances in Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Detailed query histories often contain a precise picture of a person’s life, including sensitive and personally identifiable information. As sanitization of such logs is an unsolved research problem, commercial Web search engines that possess large datasets of this kind at their disposal refrain from disseminating them to the wider research community. Ironically, studies examining privacy in search often require detailed search logs with user profiles. This paper builds on an observation that information needs are also expressed in the form of questions in online Community Question Answering (CQA) communities. We take a step towards understanding the process of formulating queries from questions to form a basis for automatic derivation of search logs from CQA forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the StackExchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We also release a dataset of 7,000 question-query pairs from our study.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adar, E.: User 4xxxxx9: anonymizing query logs. In: Proceedings of Query Log Analysis Workshop, International Conference on World Wide Web (2007) Adar, E.: User 4xxxxx9: anonymizing query logs. In: Proceedings of Query Log Analysis Workshop, International Conference on World Wide Web (2007)
2.
Zurück zum Zitat Bailey, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A.P., Yilmaz, E.: Relevance assessment: are judges exchangeable and does it matter. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 667–674. ACM (2008) Bailey, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A.P., Yilmaz, E.: Relevance assessment: are judges exchangeable and does it matter. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 667–674. ACM (2008)
3.
Zurück zum Zitat Bailey, P., Moffat, A., Scholer, F., Thomas, P.: UQV100: a test collection with query variability. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 725–728. ACM (2016) Bailey, P., Moffat, A., Scholer, F., Thomas, P.: UQV100: a test collection with query variability. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 725–728. ACM (2016)
4.
Zurück zum Zitat Barr, C., Jones, R., Regelson, M.: The linguistic structure of English web-search queries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1021–1030. Association for Computational Linguistics (2008) Barr, C., Jones, R., Regelson, M.: The linguistic structure of English web-search queries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1021–1030. Association for Computational Linguistics (2008)
5.
Zurück zum Zitat Biega, A.J., Saha Roy, R., Weikum, G.: Privacy through solidarity: a user-utility-preserving framework to counter profiling. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 675–684. ACM (2017) Biega, A.J., Saha Roy, R., Weikum, G.: Privacy through solidarity: a user-utility-preserving framework to counter profiling. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 675–684. ACM (2017)
6.
Zurück zum Zitat Biega, J.A., Gummadi, K.P., Mele, I., Milchevski, D., Tryfonopoulos, C., Weikum, G.: R-susceptibility: an IR-centric approach to assessing privacy risks for users in online communities. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 365–374. ACM (2016) Biega, J.A., Gummadi, K.P., Mele, I., Milchevski, D., Tryfonopoulos, C., Weikum, G.: R-susceptibility: an IR-centric approach to assessing privacy risks for users in online communities. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 365–374. ACM (2016)
7.
Zurück zum Zitat Carterette, B., Kanoulas, E., Hall, M., Clough, P.: Overview of the TREC 2014 session track. Technical report, Delaware University Newark (2014) Carterette, B., Kanoulas, E., Hall, M., Clough, P.: Overview of the TREC 2014 session track. Technical report, Delaware University Newark (2014)
8.
Zurück zum Zitat Chen, G., Bai, H., Shou, L., Chen, K., Gao, Y.: UPS: efficient privacy protection in personalized web search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 615–624. ACM (2011) Chen, G., Bai, H., Shou, L., Chen, K., Gao, Y.: UPS: efficient privacy protection in personalized web search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 615–624. ACM (2011)
9.
Zurück zum Zitat Chouldechova, A., Mease, D.: Differences in search engine evaluations between query owners and non-owners. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 103–112. ACM (2013) Chouldechova, A., Mease, D.: Differences in search engine evaluations between query owners and non-owners. In: Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 103–112. ACM (2013)
10.
Zurück zum Zitat Hagen, M., Potthast, M., Beyer, A., Stein, B.: Towards optimum query segmentation: in doubt without. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1015–1024. ACM (2012) Hagen, M., Potthast, M., Beyer, A., Stein, B.: Towards optimum query segmentation: in doubt without. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1015–1024. ACM (2012)
12.
Zurück zum Zitat Hauff, C., Hagen, M., Beyer, A., Stein, B.: Towards realistic known-item topics for the ClueWeb. In: Proceedings of the 4th Information Interaction in Context Symposium, pp. 274–277. ACM (2012) Hauff, C., Hagen, M., Beyer, A., Stein, B.: Towards realistic known-item topics for the ClueWeb. In: Proceedings of the 4th Information Interaction in Context Symposium, pp. 274–277. ACM (2012)
13.
Zurück zum Zitat Saha Roy, R., Katare, R., Ganguly, N., Laxman, S., Choudhury, M.: Discovering and understanding word level user intent in web search queries. J. Web Semant. 30, 22–38 (2015)CrossRef Saha Roy, R., Katare, R., Ganguly, N., Laxman, S., Choudhury, M.: Discovering and understanding word level user intent in web search queries. J. Web Semant. 30, 22–38 (2015)CrossRef
14.
Zurück zum Zitat Saha Roy, R., Suresh, A., Ganguly, N., Choudhury, M.: Place value: word position shifts vital to search dynamics. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 153–154. ACM (2013) Saha Roy, R., Suresh, A., Ganguly, N., Choudhury, M.: Place value: word position shifts vital to search dynamics. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 153–154. ACM (2013)
15.
Zurück zum Zitat Serdyukov, P., Dupret, G., Craswell, N.: Log-based personalization: the 4th web search click data (WSCD) workshop. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 685–686. ACM (2014) Serdyukov, P., Dupret, G., Craswell, N.: Log-based personalization: the 4th web search click data (WSCD) workshop. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 685–686. ACM (2014)
16.
Zurück zum Zitat Zhang, S., Yang, G.H., Singh, L., Xiong, L.: Safelog: supporting web search and mining by differentially-private query logs. In: 2016 AAAI Fall Symposium Series (2016) Zhang, S., Yang, G.H., Singh, L., Xiong, L.: Safelog: supporting web search and mining by differentially-private query logs. In: 2016 AAAI Fall Symposium Series (2016)
Metadaten
Titel
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
verfasst von
Asia J. Biega
Jana Schmidt
Rishiraj Saha Roy
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-45442-5_14

Neuer Inhalt