2.1 Project Athena
Athena
1 is an EU FP7 funded project aiming to bring citizens, first responders and LEAs together to tackle crisis situations using social media and custom mobile applications. A key outcome will be a suite of prototype software tools that will support citizens, first responders and LEAs in achieving these goals. These components comprise a data processing centre, a mobile application and a command and control intelligence dashboard. An initial consideration of how FCA may be applied within the Athena project was given in (Andrews et al.
2013) and here we explore that potential with a larger and concrete dataset.
The envisaged Athena system has the following workflow. The crisis information processing centre scans the social media landscape for relevant posts about the ongoing crisis, these reports are augmented by citizen and first responder reports made via the mobile application. These posts and reports are then filtered, analysed and aggregated and passed to the dashboard which is housed the strategic command and control centre during a major crisis event. The dashboard will present the information extracted from social media through a real-time crisis map whilst other parts of the interface will allow users to query and visualise the data to obtain better situational awareness of the current crisis. Completing the cycle between LEAs, first responders and back to citizens the dashboard will facilitate the updating of the mobile crisis map making it accessible to citizens and first responders who can then also track the crisis in real time.
In this paper, we restrict ourselves to presenting the workflow associated with FCA. Namely, the extraction of social media posts, categorisation and entity extraction, FCA and, finally, how this can be visualised on a dashboard. By combining these components together we demonstrate a novel mechanism to aggregate and display crisis related information from social media.
Formal Concept Analysis (FCA) was proposed by Wille and Gartner in the 1990s (Ganter and Wille
1999; Wille
2005). FCA is a method for deriving a hierarchical classification of objects based on a set of binary attributes. The hierarchy begins with a concept containing the set all objects and no attributes and filters down to reach a final concept which contains the set of all attributes and no objects (unless there is an object that contains all attributes). In between tthere are a number of hierarchically structured groups of objects and attributes each one known as a formal concept.
To compute a concept hierarchy, one must first have a data set consisting of objects and their attributes. Each of these attributes are described in a binary format, that is, either an object has the attribute or it does not. This means that each object corresponds to a row of a binary object-attribute matrix where each column represents a single attribute.
This process can be described formally as having a set of objects
G with a set of attributes
M such that the binary relation
I⊆
G×
M. This means that for a specific object
g∈
G with the attribute
m∈
M the relationship
g
I
m implies that object
g has attribute
m. If we then take a subset of objects
A⊆
G then by using the derivation operator
′ to define the set of attributes that are shared by all objects in
A such that,
$$ A^{\prime}= \{ j \in M \| \forall i \in A : iIj \}. $$
(1)
This means that given the set of object
A, the set of attributes
A
′ are all the attributes that are common to the attributes in A. Similarly, for a set of attributes
B⊆
M the derivation operator is used to define the set of objects with the attributes in
B as follows,
$$ B^{\prime}= \{ i \in G | \forall j \in B : iIj \}. $$
(2)
That is given the set of attributes B the set of objects B
′ is the set of all objects possessing all the attributes in B. The set {A,B} is then considered a formal concept if A = B
′ and B = A
′. In this case we would call A the extent and B the intent.
The computation of these concepts induces a hierarchy such that the further down the hierarchy one travels the more specialised each formal concept becomes due to the addition of further attributes to the concept. In this case the number of objects belonging to each concept will also decrease. The resulting list of concepts can be used to understand how attributes are grouped in the data and how common the appearance of that combination of attributes is.
In the context of our crisis application, G would represent the set of all tweets and M would represent the set of all attributes (as will be defined by our taxonomy). These attributes include the set of all categories as well as the entities we will extract. For example, A could be the attributes earthquake and avalanche and B
′ would be all of the tweets that match that contain those attributes.
FCA has been applied to a number of different applications areas. One of its original applications was to software maintenance and identifying classes in object oriented software (Tilley et al.
2005) but it has also been used to identify software faults (Cellier et al.
2008) and to identify which developer should be assigned to fix a software bug (Wermelinger et al.
2009).
Text processing is another typical application of FCA including analysis of text corpora in tourism, finance and real estate adverts (Cimiano et al.
2005; Cole and Eklund
2001) as well as knowledge discovery from databases, document and email repositories (Poelmans et al.
2010; Stumme et al.
1998; Cole et al.
2003), content based retrieval (Jay et al.
2008), exploration of social and web communities (Rome and Haralick
2005; Jay et al.
2008), uses in the semantic web domain (d’Aquin and Motta
2011; Beydoun
2009) and as part of a collaborative recommendation system (du Boucher-Ryan and Bridge
2006).
FCA has also been used to explore biological data. These applications include the analysis of gene expression data (Kaytoue et al.
2009,
2011), the identification of biomarkers for breast cancer (Motameny et al.
2008), enzyme classification (Coste et al.
2014) and the tracking of ecological traits within particular species (Bertaux et al.
2009). FCA has also been used in the detection of organised crime threats (Brewster et al.
2014), incidents of domestic violence (Poelmans et al.
2009) and terrorist threats (Elzinga et al.
2010).
Here we will use FCA to try to classify and interpret crisis related data. The next section will look at the construction of two crisis taxonomies which will provide the basis of the attributes to be used in our FCA application.