A. Argument annotation
Fundamental task in managing arguments is to understand how we can find the location of an argument in documents. For that matter, many supervised machine learning methods are used. The approach is to classify the arguments into argument component or non-argument component.
Data that comes from several sources such as magazine, advertisement, parliamentary notes, judicial summary, etc. were collected to be stored in a database [
19]. As a continuation of which, a software named Araucaria was built [
20]. This software was used to analyze argumentation and provided a relation among arguments in form of diagram. Initial analysis was conducted from existed corpus [
19] and continued by exploration in 2 areas: argumentation surface feature and utilized argumentation scheme [
21].
There was different investigation of argument coming from perspective to legal documents based on their rhetoric and visualization [
7]. This research was conducted based on feature extraction in which 11 features were utilized. There were 286 words involved as one of the features sets.
Different approach for detecting argument components was done by utilizing combination of rule-based and probabilistic sequence model [
9]. High-level organizational element from such argumentative discourse were attempted to identified. Organizational element was also known as shell language. Rule-based was defined by using 25 patterns of handwritten regular expression. Manual annotation without standard guideline was done to 170 essays. The annotation was executed by experts that has been familiar to essay writings. Sequence model was made in accordance to Conditional Random Fields (CRF) by using a number of general features based on lexical frequency. After conducting evaluation, hybrid sequence model was assumed to have best performance in the task.
Argument extraction was applied to support public policy formulation [
8]. Result from this research was used to assist policy maker in observing how was the reaction from society in respect to the policy. Tense and mood were the main features as argument indicator.
By using ontology approach, 8 rules were defined to identify arguments from such statements [
10]. Rules were defined by research intuition and informal examination to 9 essays. In other research, argumentation scheme was used for essay scoring [
3]. It was based on Walton theory [
2] involving some adjustments within. This research focused on how annotation protocols intended for argumentative essays were made. Annotation protocol was made for 3 argumentative schemes; they are policy argument scheme, causal argument scheme and argument from a sample scheme.
From other perspective of data, researcher attempted to see argument aspect from social media [
22]. It was started by separating statement from dataset into 2 classes: statements which contains argument and does not contain. It was continued by computation involving Conditional Random Fields (CRF).
Argument extraction from Greek news was experimented [
23]. Technique that was used in this research was word embeddings extracted from huge size of not-annotated corpus. From the result, one of interesting conclusions was that word embeddings could positively contribute in extracting argumentative sentence.
Unstructured and various data can be found in a web site. Argument extraction to websites were attempted as well [
24]. In their research, a gold standard corpus from user-generated web discourse were built along with direct testing by using several machine learning algorithms.
As the continuation from research that did binary classification, which were argument components classification into 2 classes: argument or not, researchers made a try to formulate specific categories from argumentative statements. Generally, 2 classes were defined: claim and premise. Aside from those classes, there were still other various naming or definitions.
Corpus with claim and evidence as labels was built by extracting argumentative statements from Wikipedia articles [
25]. It has been utilized by public to be tested by many approaches. There was an opinion saying that all leaves of tree were arguments [
26]. They were premises and conclusions, which were placed together one to another.
A new corpus from persuasive essays was made [
5]. It contained argumentative statements. This corpus consisted of 90 essays which was labelled by 3 annotators. This corpus covered 3 components of argumentation: major claim, claim, and premise. Other than that, statements that were not categorized as arguments were classified as non-argumentative. It was the 4th class. In order to see how argument components were related one to another, 2 classes to describe their relationship were defined. They were support class and attack class.
From aforementioned corpus, features formulation was also made such that annotated argumentative components could be recognized automatically [
4]. All proposed features were categorized to 5 group of sub-features: structural, lexical, syntactic, indicator and contextual. It achieved an accuracy of 77.3%. Specifically, other researchers took a closer look to discourse marker role which was one feature from argumentative corpus in German language [
27]. From several conducted experiments, discourse markers were said to be quite indicative in differentiating claim to premise. One research tried to combine all features that has been proposed before [
28]. The results were better yet there was no significant improvement.
Caused by phenomenon that big and sparse feature space can result on difficulty of feature selection, a more compact feature was proposed [
29]. By utilizing corpus of persuasive essays, n-gram and syntactic rules could be replaced by feature and constraint through extracted argument and domain word. Escalation of argument mining performance can be significantly achieved. After argument components were identified, post processing was conducted by using topic modelling: latent dirichlet allocation (LDA) to extract argument word and domain word.
Analyzing argumentation category was also enriched by contribution in certain fields such as debate technology and assessment of argumentation quality. Given a context, automatic claim detection in one discourse was possible [
30]. This technique was then developed further by considering negation detection to each detected claim [
31]. Following this current research, evidence detection in unstructured text was also conducted [
32]. Specified context of data was used for experiments. After claim and evidence were successfully detected, several approaches to get stance from context-dependent claim was observed [
33].
Claim and evidence cannot be separated in forming arguments. If claim does not have evidence, then it will not have meanings. For example, political debates contain many claims followed by evidences as the data to support claims. Given a condition of argumentation summarizer needs, an automatic summarizer for argumentation specifically for political debates was built by some researchers [
34]. Not only for political debates, automatic summarizer for online debate forum was also conducted as well [
35].
In addition, research on argument mining was also conducted in persuasive online discussion. A computational model that handled micro and macro level of argumentation was proposed [
36]. Even further, generating argument using a novel framework named CANDELA was conducted. The argument generation was done with retrieval, planning, and realization [
37].
Table
1 summarized all current works in argument annotation which are done so far. For further analysis in completing state-of-the-art of argument annotation research, we concentrate to utilize deep learning methods to handle this argument annotation tasks.
Table 1
Current works in argument annotation
1 | | Argumentative essays | Annotation protocols |
2 | | Persuasive essays | 5 group of sub-features |
3 | | Legal documents | 11 feature sets |
4 | | Greek language text | Tense and mood |
5 | | Argumentative discourse | combination of rule-based and probabilistic sequence model |
6 | | 52 essays written by university students | Ontology: 8 rules |
7 | | Social media | Conditional Random Fields (CRF) |
8 | | Greek news | Word embeddings |
9 | | Argumentative corpus in German language | Discourse markers |
10 | | Persuasive essays | 68 sub-features |
11 | | Persuasive essays | Argument and domain words; LDA |
12 | | Political debates | CDCD approach |
Argument analysis
To assess quality of arguments, not only extrinsic aspects need to be observed, but also intrinsic aspects as well. However, it is different to categorization whose assessment can be done directly by observing the texts (extrinsic aspects). Discourse marker as the main component to differ such argumentative statements is no longer valid to use in scoring quality of arguments. In this case, keywords as discourse marker are not representative as the evaluator.
A good argument is the one that can convince the reader that it is a valid and strong argument. To handle this issue, some researchers started to propose some approaches in measuring argument validity. Persuasiveness level of an argument can be estimated by feature extraction to discussion in the online forum [
12]. Posting time and writer reputation were said to be useful to utilize as metadata information. Textual features had worse result compared to argumentation-based features. If the data is an essay, argument quality can be assessed through the essay score. In addition to prompt adherence, coherence and technical quality aspect, argument strength can be involved as well to give grade to essays [
38].
Huge number of online communities impacts to the appearance of debates in several issues in blogs or forums. Combination of textual entailment and argumentation theory were attempted to extract argumentation from debates, as well as their acceptability [
39].
In other research, convincingness appeared as new terminology in assessing quality of argumentation [
13]. Relation between arguments in one whole sequence of statements was assessed. Based on that relation, classification was applied. The output was to find out which argument was more convincing and create a list of arguments sorted by their convincingness level. Furthermore, there was another similar task in assessing argument quality. It was done by observing either the relation was sufficient or not [
11]. Long Short Term Memory (LSTM) as one of promising deep learning method for text was modified involving Siamese network to recognize argumentation relation in persuasive essay [
40]. Furthermore, Hierarchical Attention Network (HAN) with XGBoost was utilized to similar task and indicated to be a promising method for hierarchical data [
41].
Table
2 summarized all current works in argument analysis which are done so far. Slightly different with current works, we concentrate to utilize deep learning methods to handle argument analysis tasks.
Table 2
Current works in argument analysis
1 | | Argument sufficiency | Feature extraction |
2 | | Persuasiveness level | Feature extraction |
3 | | Convincingness level | Relation between arguments in one whole sequence |
4 | | Argument quality | Textual features |
5 | | Argument acceptability | Combination of textual entailment and argumentation theory |
6 | | Argument relation | Siamese network |
7 | | Argument relation | Hierarchical Attention Network (HAN) with XGBoost |