1 Introduction
-
SODA:
movie Brad Pitt
-
ATHENA:
Show me all movies with the
actor Brad Pitt.
-
Ginseng: What are the movies with the actor Brad Pitt?
-
We provide an overview of recent NLI systems comparing and analyzing them based on their expressive power.
-
Existing papers often use different data sets and evaluation metrics based on precision and recall, while others perform user studies to evaluate the system (see Sect. 5 for details). Given this heterogeneity of evaluations, it is very hard to directly compare these systems. Hence, we propose a set of sample questions of increasing complexity as well as an associated domain model aiming at testing the expressive power of NLIs.
-
The paper serves as a guide for researchers and practitioners, who want to give natural language access to their databases.
2 Foundation: a sample world
2.1 Database ontology
# | Natural language question | Challenges |
---|---|---|
Q1
| Who is the director of ‘Inglourious Basterds’? | J, F(s)
|
Q2
| All movies with a rating higher than 9. | J, F(r)
|
Q3
| All movies starring Brad Pitt from 2000 until 2010. | J, F(d)
|
Q4
| Which movie has grossed most? | J, O
|
Q5
| Show me all drama and comedy movies. | J, U
|
Q6
| List all great movies. | C
|
Q7
| What was the best movie of each genre? | J, A
|
Q8
| List all non-Japanese horror movies. | J, F(n)
|
Q9
| All movies with rating higher than the rating of ‘Sin City’. | J, S
|
Q10
| All movies with the same genres as ‘Sin City’. | J, 2xS |
2.2 Input questions
Q1
) is a join over different tables (Person
, Director
, Directing
and Movie
) with an ISA-relationship between the tables Person
and Di
-rector
. Moreover, the query has a filter on the attribute Movie.Title
, which has to be equal to ‘Inglourious Basterds.’ Therefore, the system faces three different challenges: (a) identify the bridge table Directing
to link the tables Director
and Movie
, (b) identify the hierarchical structure (ISA-relationship) between Director
and Person
and (c) identify ‘Inglourious Basterds’ as a filter phrase for Movie.Title
.Q2
) is based on a single table (Movie
) with a range filter. The challenge for the NLIs is to translate ‘higher than’ into a comparison operator ‘greater than.’Q3
) is a join over four tables (Person
, Actor
, Starring
and Movie
) and includes two filters: (a) a filter on the attribute Person.FirstName
and Person.LastName
and (b) a two-sided date range filter on the attribute Movie.ReleaseDate
. The challenge in this query (compared to the previous ones) is the date range filter. The system needs to detect that ‘from 2000 until 2010’ refers to a range filter and that the numbers need to be translated into the dates 2000-01-01 and 2010-12-31.Q4
) is a join over two tables (Movie
and Gross
). In addition, an aggregation on the attribute Gross.Gross
and grouping on the attribute Movie.id
or ordering the result based on Gross.Gross
is needed. For both approaches, an aggregation to a single result (indicated by the keyword ‘most’) is requested.Q5
) is a join over two tables (Movie
and Genre
). The query can either be interpreted as ‘movies that have both genres’ (intersection) or ‘movie with at least one of those genres’ (union). The expected answer is based on the union interpretation, which can be solved with two filters that are concatenated with an OR on the attribute Genre.Genre
.Q6
) needs the definition of concepts. In the sample world, the concept ‘great movie’ is defined as a movie with a rating greater or equal 8. If the system is capable of concepts, then it needs to detect the concept and translate it accordingly to the definition.Q7
) is a join over two tables (Movie
and Genre
) with an aggregation. The challenges are to (a) identify the grouping by the attribute Genre.Genre
and (b) translate the token ‘best’ to a maximum aggregation on the attribute Movie.Rating
.Q8
) is a join over two tables (Movie
and Genre
) with a negation on the attribute Movie.OriginalLang
and a filter on the attribute Genre.Genre
. The challenge in this question is to identify the negation ‘non-Japanese.’ Another possible input question with a negation over a larger movie database, would be ‘All actors without an Oscar.’ Here again, the challenge is to identify ‘without’ as a keyword for the negation.Q9
) is based on a single table (Movie
) and includes a subquery. The challenge in this question is to divide it in two steps: first select the rating of the movie ‘Sin City’ and then use this SQL statement as a subquery to compare with the ranking of every other movie in the database.Q10
) is a join over two tables (Movie
and Genre
). One possible solution would include two not exist
: the first one verifies for each movie that there exist no other genres as the genres of ‘Sin City.’ The second one verifies for each movie that it has no genre, which ‘Sin City’ does not have. For example, the movie ‘Sin City’ has the genre ‘Thriller,’ the movie ‘Mission: Impossible’ has the genres ‘Thriller’ and ‘Action.’ The first not exist
will check if ‘Mission: Impossible’ has the genre ‘Thriller’ from ‘Sin City’ which is true. The second not exist
checks if ‘Sin City’ has the genres ‘Thriller’ and ‘Action’ (from ‘Mission: Impossible’), which is false.2.3 Question analysis
Q1
to Q10
based on what poses the challenge in answering them.Q1
, because the challenge of this question is to identify the right filter (‘virginia’). The question ‘what is a good movie to go see this weekend?’ includes a time range and is therefore labeled as Q6
. For the Yahoo! Corpus, we also made some assumptions, for example, the question ‘what is your favorite tom hanks movie?’ is interpreted as ‘give me the best ranked tom hanks movie’ and labeled with Q4
. Furthermore, if a question could have multiple labels, the label of the more difficult (higher number) question is chosen. For example, the sample question ‘can anyone tell a good action movie to watch?’ is labeled with Q6
because it requires handling of a concept (‘good movie’) and not Q1
because it uses a filter (‘action movie’). If the question cannot be labeled with one of the input questions, we label it with x
.3 For example, the question ‘i want to make a girl mine but she is more beautiful than me. what can i do now?’ has nothing to do with movies.Q1
. For example, the question ‘what movie had “wonderful world” by sam cooke at the beginning?’ has filters for the song ‘wonderful world’ and a join on movie. About 30% of the questions are labeled with x
which are off-topic questions. There are no questions labeled with Q2
; this means that there are no questions with a numerical range. This can be explained by the composition of the corpus itself, which is a collection of questions from users to users. If users ask about the ranking of a movie, they ask something like ‘what is your favorite movie?’ and not something similar to Q2
.
Q1
or Q4
. There are three concepts (‘population density,’ ‘major city’ and ‘major river’) used in the corpus that occur in roughly 8% of the questions, 7% of which are labeled with Q6
. There are no numerical range questions (Q2
) and no date questions (Q3
). The latter can be explained by the dataset not including any dates. There are also no unions (Q5
) and no questions with multiple subqueries (Q10
).
Select
-queries and 40% have Filter
. Furthermore, they found huge differences between different domains. For example, the use of Filter
ranges from 61% (LinkedGeoData) to 3% (OpenBioMed) or less. This implies that the distribution of the usage for the question types is domain-dependent. Nevertheless, our ten sample questions are fully covered in the query log analyzed by Bonifati et al. [7].x
in the Yahoo! Corpus—are not covered by our analysis.3 Background: natural language processing technologies
3.1 Stop word
Q7
) the stop words ‘of each’ imply an aggregation on the successive token ‘genre.’ On the other hand, stop words should not be used for lookups in the inverted indexes. In the question ‘Who is the director of “Inglourious Basterds”?’ (Q1
), the stop word ‘of’ would return a partial match for a lot of movie titles, which are not related to the movie ‘Inglourious Basterds.’ Therefore, the NLIs should identify stop words, but not remove them because they can be useful for certain computations.3.2 Synonymy
starring
Brad Pitt from 2000 until 2010.’ (Q3
) could also be phrased as ‘All moviesplaying
Brad Pitt from 2000 until 2010.’ The answer should be the same, but because in the sample world no element is named ‘playing,’ a lookup would not find an answer. Therefore, it is necessary that the system takes synonyms into account. A possible solution is the use of a translation dictionary. Usually, such a dictionary is based on DBpedia [34] and/or WordNet [42].3.3 Tokenization
Q1-10
) end either with a question mark or a period. If the punctuation mark is not separated from the last word, the NLI would have to search for a match for the token ‘Basterds”?’ (Q1
) instead of ‘Basterds.’ Without other processing, the NLI will not find any full matches. Depending on the task to solve, part of the tokenization process can be splitted on punctuation marks or deleting them. Either way, there are some scenarios to think about. For example, decimals should neither be split on the punctuation mark nor should they be removed. Consider the following example ‘All movies with a rating higher than 7.5’ (similar to Q2
). If you remove the dot between 7 and 5, the result would be completely different. Also other NLP technologies could be dependent on punctuation marks, for example, dependency trees.3.4 Part of speech tagging
3.5 Stemming/lemmatization
ies
\(\rightarrow \)i
’ which means that the suffix ‘ies
’ will be reduced to ‘i
.’ This is needed for words like ‘ponies’ which are reduced to ‘poni.’ In addition, there is a rule ‘y
\(\rightarrow \)i
’ which ensures that ’pony’ is also reduced to ‘poni.’ In the sample world, stemming can be used to ensure that the words ‘directors,’ ‘director,’ ‘directing’ and ‘directed’ can be used to find the table Director
, because they are all reduced to the same stem ‘direct.’ The disadvantage of stemming is that the generated stem not only consists of words with a similar meaning. For example, the adjective ‘direct’ would be reduced to the same stem as ‘director,’ but the meaning differs. An example question could be ‘Which movie has a direct interaction scene between Brad Pitt and Jessica Alba?,’ where the word ‘direct’ has nothing to do with the director of the movie. In general, stemming increases recall but harms precision.3.6 Parsing
4 Limitations
5 Recently developed NLIs
Q7
‘What was the best movie of each genre?’ The main advantage of this approach is the simplicity and adaptability.5.1 Keyword-based systems
Q7
), the ‘keyword-only version’ would be something like ‘best movie genre,’ which is more likely to be interpreted as ‘the genre of the best movie.’ If the users would write the question like ‘best movie by genre,’ a keyword-based NLI would try to lookup the token ‘by’ in the base and metadata or classify ‘by’ as a stop word and ignore it.5.1.1 SODA (Search Over DAta warehouse)
Q6
) without having to specify what a great movie is.Q1
), the input question for SODA could be: ‘director Inglourious Basterds.’Q1
, this means that the keyword ‘director’ can be found in the inverted index of the meta data, either the table name Director
or to the attribute name Director.directorId
and Directing.director
-Id
(Fig. 7: red). The keyword ‘Inglourious Basterds’ is only found in the inverted index of the base data as a value of the attribute Movie.Title
(Fig. 7: green). This leads to three different solution sets for the next steps: {Directing.directorId
, Movie.Title
}, {Director.directorId
, Movie.Title
} and {Director
, Movie.Title
}.
Director
, Movie.Title
} receives the highest score, because the table name Director
is a full match and not only a fuzzy match like in directorId
. Afterward, only the best n solutions are provided to the next step.Director
and Movie
correspond to the different entry points. An entry point is a node in the metadata graph. The table Director
is a child of the table Person
(ISA-relationship). Therefore, SODA includes the table Per
-son
in the solution. To link the table Movie
to the other two tables, it is necessary to add more tables to the solution. The closest link is through the table Directing
(see Fig. 7), and therefore this table is included.Q2
). Moreover, SODA uses a very strict syntax for aggregation operators. For example, to retrieve the number of movies per year, the input question needs to be written like ‘select count (movie) group by (year).’ These patterns are useful, but are not in natural language. Furthermore, there is no lemmatization, stemming or any other preprocessing of the input question which can lead to a problem with words that are used in plural. For example the input question ‘all movies’ would not detect the table Movie
but the input question ’all movie’ would display the expected result.5.1.2 NLP-reduce
Q7
), because it will remove the token ‘of each’ as a stop word. Furthermore, stemming helps the user to formulate questions like ‘all movies’ which is more natural than ‘all movie’ for SODA.Q2
.5.1.3 Précis
AND
, OR
and NOT
. For example, input question ‘Show me all drama and comedy movies.’ (Q5
) would be formulated as ‘“drama” OR “comedy”.’ The answer is an entire multi-relation database, which is a logical subset of the original database.AND
, OR
and NOT
to define the input question. However, the weaknesses are that this again composes a logical query language, although a simpler one. Furthermore, it can only solve Boolean questions, and the input question can only consist of terms which are located in the base data and not in the metadata. For example, the input question ‘Who is the director of “Inglourious Basterds”?’ (Q1
) cannot directly be solved because ‘director’ is the name of a table and therefore part of the metadata. There is a mechanism included that adds more information to the answer (e.g., the actors, directors etc., to a movie), but then the user would have to search for the director in the answer.5.1.4 QUICK (QUery Intent Constructor for Keywords)
5.1.5 QUEST (QUEry generator for STructured sources)
5.1.6 SINA
5.1.7 Aqqu
NN
) and proper nouns (NNP
) are not allowed to be split (e.g., ‘Brad Pitt’). In the next step, Aqqu uses three different templates which define the general relationship between the keywords. Afterward, Aqqu tries to identify the corresponding relationship. This can either be done with the help of the input question (verbs and adjectives), or with the help of ML which for example can identify abstract relationship like ‘born\(\rightarrow \)birth date.’ The last step is the ranking which is solved with ML. The best result is achieved by using a binary random forest classifier.5.2 Pattern-based systems
Q6
) or aggregations (Q7
). For example, the question ‘What was the best movie of each genre?’ (Q7
) cannot be formulated with keywords only. It needs at least some linking phrase between ‘best movie’ and ‘genre,’ which indicates the aggregation. This could be done with the non-keyword token (trigger word) ‘by’ for the aggregation, which will indicate that the right side includes the keywords for the group by
-clause and the left side the keywords for the select
-clause. The difficulty with trigger words is to find every possible synonym allowed by natural language. For example, an aggregation could be implied with the word ‘by’ but also ‘of each’ (compare Q7
).5.2.1 NLQ/A
Q9
), but if the parse tree is wrong, the system will fail to translate even simpler questions. Instead, NLQ/A lets the users resolve all ambiguity problems, also those which could be solved with PoS tagging or parse trees. To avoid needing too many interaction steps, NLQ/A provides an efficient greedy approach for the interaction process.Q1
), the input question could be: ‘Who is the director of “Inglourious Basterds”?.’1:n
-grams are generated. Phrases starting with prepositions are discarded. After stop word removal, the input question Q1
would become ‘director of Inglourious Basterds.’ If n
is set to 2, the extracted phrases would be: {‘director,’ ‘director of,’ ‘Inglourious,’ ‘Inglourious Basterds,’ ‘Basterds.’} Next, the phrases are extended according to a synonym dictionary. For example if there is a phrase ‘starring,’ it would be extended with the phrase ‘playing.’ Those extended phrases are mapped to the knowledge graph based on the string similarity (edit distance). For one extended phrase, there can be multiple candidate mappings.Q1
, if the users select ‘Director’ as candidate in step 3, the system would find the path as shown in Fig. 9. ‘Inglourious Basterds’ is also a candidate, but not selected by the users because there is no ambiguity to solve.
5.2.2 QuestIO (QUESTion-based Interface to Ontologies)
Movie.Release
-Date
would be extracted as ‘Release Date,’ which is a human-understandable label. In contrast, the attribute Movie.OriginalLang
would result in ‘Original Lang,’ where the token ‘Lang’ is a shortened version for ‘Language’ and is not human-understandable.5.3 Parsing-based systems
nmod
) could be used to identify aggregations.5.3.1 ATHENA
Q1
), the input question for ATHENA could be: ‘Who is the director of “Inglourious Basterds”?.’Q5
) with the TIMEX annotator. Those time ranges are then matched to the ontology properties with the corresponding data type.Q1
, there is a dependency between the tokens ‘director’ and ‘Inglourious Basterds’ indicated by the token ‘of.’Q1
, the metadata annotation will detect three different matches for ‘director,’ namely the table name Director
and the attribute name Director.directorId
and Directing.director
-Id
(Fig. 10: red). The translation index will find a match for the bi-gram ‘Inglourious Basterds,’ corresponding to the attribute Movie.Title
(Fig. 10: green).
Directing.directorId
, Movie.Title
}, {Director.directorId
, Movie.Title
} and {Director
, Movie.Title
}. Each interpretation is represented by a set of interpretation trees. An interpretation tree (iTree) is a subtree of the ontology. Each iTree must satisfy:Director
and Movie
need to be connected, for example, via the relation Directing
. The attribute Title
needs to be connected with the corresponding concept (in this case the table) Movie
.Person
is not allowed to inherit Role
of Actor
. The other direction is allowed, such that Actor
inherits FirstName
and LastName
from Person
.Q5
) imply a relationship constraint between the ontology element Movie
, Starring
and Person
. Those three ontology elements need to be connected. Accordingly, in this example, the ontology element Actor
needs to be included.Director
, Movie.Title
}, which is extended with the ontology element Directing
and Movie
. After this step, for each interpretation only one iTree is left.
from
clause: Specifies all concepts found in the ontology along with their aliases. The aliases are needed, for example, if a concept occurs multiple times. For example, the input question ‘Show me all drama and comedy movies.’ (Q4
) would point to Genre
in the ontology twice: once for the token ‘drama’ and once for ‘comedy.’ Therefore, two aliases are needed to distinguish between them.group by
clause: The group by
clause is triggered by the word ‘by’ and only tokens annotated with metadata in step 1.a are considered. For example, the input question ‘What was the best movie by genre?’ (modified Q7
). To identify the dependencies between dependent and dependee (illustrated by the ‘by’), the Stanford Dependency Parser is used.select
clause: There are two possible types: aggregation and display properties. The aggregation properties depend on the group by
clause. The default aggregation function is sum
. For the (modified) input question Q7
, ATHENA would detect a group by
clause because the ‘by genre’ needs an aggregation function. Assuming ATHENA can translate ‘best movie’ to mean ‘best ranked movie,’ it would apply the aggregation function max
on Movie.Rating
. If there are no aggregations, ATHENA uses the tokens annotated with metadata as display properties, which are shown to the user.order by
clause: Properties used in the order by
clause are indicated by tokens like ‘least,’ ‘most,’ ‘ordered by,’ ‘top’ and others. For example, the input question ‘Which movie has grossed most?’ (Q3
) would trigger an order by
clause for Movie.Gross
because of the trigger word ‘most.’where
clause: Tokens annotated with the translation index, time range or numerical expression are used in the where
clause to filter the result (e.g., the tokens ‘Inglourious Basterds’). If the filter is applied on an aggregation, a having
clause is generated instead of the where
clause.n
interpretations that ATHENA has found are translated back into full sentences in English for the users, so that the users can choose the best fitting one.Q9
).Q8
) nor multiple elements in the group by
clause (e.g., ‘What was the best movie by year andgenre?’) are supported.5.3.2 Querix
N
), verb (V
), preposition (P
), wh
-pronoun (Q
, e.g., what, where, when, etc.,) and conjunction (C
). This sequence is called query skeleton. The query skeleton is used to enrich nouns and verbs and to identify subject
-property-object
patterns in the query.Q-V-N-P-N
’ is extracted from the input question (Q1
) as ‘Who (Q
) is (V
) the director (N
) of (P
) “Inglourious Basterds” (N
) ?’. (2) It enriches all nouns and verbs with synonyms provided by WordNet.subject-property-object
patterns in the input question. (2) It searches for matches between nouns and verbs of the input question with the resources in the ontology (including synonyms). (3) It tries to match the results of the two previous steps. The query generator then composes SPARQL queries from the joined triplets delivered by the last step of the matching center. If there are several different solutions with the highest cost score, Querix will consult the user by showing a menu from which the user can choose the intended meaning.5.3.3 FREyA (Feedback, Refinement and Extended vocabularY Aggregation)
5.3.4 BELA
1
and the SPARQL query generated by it produces an answer with at least one result, the translation process is stopped and the answer is returned to the user. Only for ASK
-questions (which have yes/no answers), the process continues until the confidence of the interpretations start to differ, then a threshold of 0.9 is applied and an empty result (which equals a no-answer) is also accepted.5.3.5 USI Answers
5.3.6 NaLIX (Natural Language Interface to XML)
5.3.7 NaLIR (Natural Language Interface for Relational databases)
5.3.8 BioSmart
VB
) followed by a object (NP
). A more expressive and therefore complex input question can be built by nesting simple query types arbitrarily.5.4 Grammar-based systems
5.4.1 TR Discover
Q1
). When the users start typing ‘p,’ TR Discover will not only suggest ‘person’ but also longer phrases like ‘person directing’ (autocomplete). After ‘person directing’ is selected (or typed), TR Discover will again suggest phrases, like ‘movies’ or even specific movies like ‘Inglourious Basterds’ (prediction). For input question Q1
, the input could be ‘person directing Inglourious Basterds.’G1-3
) and lexical entries (L1-2
). For the sample world (and the input question Q1
), the following rules could be defined: G1
:NP
\(\rightarrow \)
N
G2
:NP
\(\rightarrow \)
NP VP
G3
:VP
\(\rightarrow \)
V NP
N[TYPE=person, NUM=sg, SEM=
<x.person(x)
>]
\(\rightarrow \)person
L2
:V[TYPE=[person,movie,title], SEM=
<X x.X(y.directMovie(y,x)
>, TNS=presp]
\(\rightarrow \)directing
Q1
), the lexical entries L1
and L2
are found and provided to the user.L1
and the token ‘directing’ will be parsed with the lexical entry L2
. This leads to the FOL representation:x.person(x)
\(\rightarrow \)
directMovie(y,x)& type(y,Movie) & label(y, ‘Inglourious Basterds’)
L2
and how it is resolved is not explained by Song et al. [52]. If there are multiple possibilities to parse the input question, the first one is chosen.Q3
: ‘grossed most’) cannot be used, synonyms are not properly handled, and negations only work for SPARQL.5.4.2 Ginseng (Guided Input Natural language Search ENGine)
5.4.3 SQUALL (Semantic Query and Update High-Level Language)
Q1
needs to be formulated as ‘Who is the director ofInglourious_Basterds
?’5.4.4 MEANS (MEdical question ANSwering)
wh
-pronouns and Boolean questions in a medical subfield targeting the seven medical categories: problem, treatment, test, sign/symptom, drug, food and patient.wh
-question, the Expected Answer Type (EAT) is identified and replaced with ‘ANSWER
’ as a simplified form for the next step. For example, the EAT of the input question Q1
would be ‘director.’ In the next step, MEANS identifies medical entities using a Conditional Random Field (CRF) classifier and rules to map noun phrases to concepts. The next step is used to identify seven predefined semantic relations. The annotator is a hybrid approach based on a set of manually constructed patterns and a Support Vector Machine (SVM) classifier.5.4.5 AskNow
Q1
‘Who is the director of “Inglourious Basterds”?’ would be matched to the NQS template:[Wh][R1][D][R2][I]
, where [Wh]
is the question word ‘Who,’ [R1]
is the auxiliary relation ‘is,’ [D]
is the query desire class ‘director,’ [R2]
the relation ‘of’ and [I]
is the query input class ‘Inglourious Basterds.’wh
-type. In the next step, the query desire, query input and their relations will be matched to the KB. As an example, Spotlight can be used for the matching to DBpedia. During the matching process, AskNow uses WordNet synonyms and a BOA pattern library (bootstrapping).5.4.6 SPARKLIS
5.4.7 GFMed
6 Evaluation
6.1 Evaluation of 24 recently developed NLIs
Q1
) or a clear statement written in the paper (e.g., ’we can identify aggregations’ (Q7
), we label those questions for the system with a checkmark (✓) in Table 3. If the question needs to be asked in a strict syntactical way (e.g., SODA needs the symbol ’>’ instead of ’higher than’) or the answer is partially correct (e.g., Q4
returns a ordered list of movies instead of only one), it is labeled with a triangle (▲). If there is a clear statement that something is not implemented (e.g., ATHENA does not support negations), we label it with ✗. If we were not able to conclude, if a system can or cannot answer a question based on the paper, we labeled it with a question mark in Table 3.Q1
). This limitation is based on the approach of these keyword-based systems: they expect just keywords (which are mostly filters) and the systems identify relationships between them. Therefore, they do not expect any complex questions like Q4
or Q7
. Pattern-based NLIs are an extension of keyword-based systems in such a way that they have a dictionary with trigger words to answer more complex questions like aggregations (Q7
). However, they cannot answer questions of higher difficulties, including subqueries (Q9
/Q10
). For example, the difficulty with questions including subqueries is to identify which part of the input question belongs to which subquery. Trigger words are not sufficient to identify the range of each subquery.Q1
) can be easily asked with keywords. Both, Waltinger et al. [59] (USI Answer) and Lawrence and Riezler [33] (NLmaps) are describing this phenomenon and that the users prefer to ask questions with keywords if possible. They adapted their systems so that they can handle different forms of user input. Because of that Waltinger et al. [59] (USI Answer) point out that parse trees should only be used with caution. This is similar to the approach of Zheng et al. [69] (NLQ/A), who remark that NLP technologies are not worth the risk, because wrong interpretations in the processing leads to errors. Walter et al. [58] (BELA) propose a new approach of applying certain processing steps only if the question cannot be answered by using simpler mechanisms. This approach can be used to answer questions formulated as keywords or as complete sentences. Nevertheless, parse trees are useful to identify subqueries, but only in grammatically correct sentences (e.g., NaLIR). The identification of possible subqueries is necessary to answer questions like Q9
and Q10
.movies
starringBrad_Pitt
.’ SPARKLIS can answer all questions (if concepts are defined) but is based on a strict user interface where the users have to ‘click their questions together’ and cannot ‘write freely.’ In contrast to those two systems, NaLIR and ATHENA are systems where the user can write without restrictions during the process of phrasing the questions. However, NaLIR cannot handle concepts. Finally, ATHENA solves aggregations with trigger words which the users need to know. Moreover, ATHENA cannot solve multiple subqueries.6.2 Evaluation of commercial systems
Q1
is answered in the featured snippet. In contrast, for question Q2
the featured snippet on top shows the top 250 drama movies, but the first result site contains the correct answer.select
and filter
questions and has some troubles to handle the year without a specific date in Q3
. What is different to other systems is that if it cannot answer a question, Siri gives feedback to the user and tries to explain which type of questions can be answered. For most of the sample questions, we got the answer ‘Sorry, I can’t search what something is about. But I can search by title, actors or directors and categories like horror or action.’select
questions like Q1
about the director of a given movie. However, the users can find the correct results by browsing the movie page.7 Machine learning approaches for NLIs
8 Conclusions
Q9
and Q10
are two examples for questions composed of one and multiple subqueries, respectively. The most common NLP technology that is able to solve this problem is a parse tree. This can either be a general dependency or constituency parse tree provided, for example, by the Stanford Parser (e.g., NaLIR), or a parse tree self-learned with the rules of a grammar-based NLI (e.g., SQUALL). An alternative is the use of templates (e.g., AskNow). Li and Jagadish [35] (NaLIR) mention that the identification alone is not enough: after the identification of the subqueries, the necessary information needs to be propagated to each part of the subquery.