10.1 Introduction
10.2 SogouQ and Related Data Collections
-
AOL Query logs (2006/36M queries/English) includes user ids and click data. This dataset was intentional and intended for research purposes. However, the queries were not filtered and further lead to much controversy about privacy issues.
-
MSN Query logs (2006/100M queries/English) includes session ids and click-through information, but not user ids (Craswell et al. 2009).
-
Yandex Query logs (unknown time/210M queries/Russian) includes user sessions extracted from Yandex logs, with user ids, queries, query terms, URLs, their domains, URL rankings, and clicks. However, the user data is fully anonymized.2
Here User ID is automatically assigned according to the cookie information when a user accesses the search engine by using the browser. Different queries that are input by the same browser correspond to the same user ID.[Access time]\t[User ID]\t[Query]\t[Rank of the URL in the returned result]\t[The sequence number of user click]\t[URL that user clicked]\n
10.3 SogouQ and NTCIR Tasks
Assessors were asked to provide a label for each intent cluster in the form “<originalquery><additionalstring>”. Such a change provides valuable data to better understand a query in the perspective of two intent roles, i.e., kernel-object and modifier (Ren and Yu 2016; Yu and Ren 2012; Zheng et al. 2018). In contrast to the NTCIR-9 Intent task where we had up to 24 intents for a single topic, organizers of Intent-2 decided to select up to 9 intents per topic based on votes because search result diversification is mainly about diversifying the first search result page, which can only accommodate around ten URLs.A subtopic string of a given query is a query that specialises and/or disambiguates the search intent of the original query. If a string returned in response to the query does neither, it is considered incorrect.e.g. original query: “harry potter” (underspecified) subtopic string: “harry potter philosophers stone movie” incorrect: “harry potter hp” (doe not specialise)It is encouraged that participants submit subtopics of the form “<originalquery><additionalstring>”
The Vertical Incorporating subtask is also a successive task of the Document Ranking subtask. The difference is that the participants should decide whether the result list should contain vertical result or not. SogouQ is still a useful resource of user behaviors for Chinese subtasks. Similarly, Yahoo! Japan provides the participants of Japanese subtasks a Web search related query data, which is generated from the query log of Yahoo! Japan Search from July 2009 to June 2013.4[tid] [subtopic] [vertical] [score]IMINE2-E-000 iPhone 6 apple.com Web 0.9IMINE2-E-000 iPhone 6 sales News 0.90IMINE2-E-000 iPHone 6 photo Image 0.88IMINE2-E-000 iPhone 6 review Web 0.78