Skip to main content
Erschienen in:

Open Access 10.04.2024

A Perceived Risk Index Leveraging Social Media Data: Assessing Severity of Fire on Microblogging

verfasst von: Carmen De Maio, Giuseppe Fenza, Mariacristina Gallo, Vincenzo Loia, Alberto Volpe

Erschienen in: Cognitive Computation | Ausgabe 5/2024

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fires represent a significant threat to the environment, infrastructure, and human safety, often spreading rapidly with wide-ranging consequences such as economic losses and life risks. Early detection and swift response to fire outbreaks are crucial to mitigating their impact. While satellite-based monitoring is effective, it may miss brief or indoor fires. This paper introduces a novel Perceived Risk Index (PRI) that, complementing satellite data, leverages social media data to provide insights into the severity of fire events. In the light of the results of statistical analysis, the PRI incorporates the number of fire-related tweets and the associated emotional expressions to gauge the perceived risk. The index’s evaluation involves the development of a comprehensive system that collects, classifies, annotates, and correlates social media posts with satellite data, presenting the findings in an interactive dashboard. Experimental results using diverse datasets of real-fire tweets demonstrate an average best correlation of 77% between PRI and the brightness values of fires detected by satellites. This correlation extends to the real intensity of the corresponding fires, showcasing the potential of social media platforms in furnishing information for emergency response and decision-making. The proposed PRI proves to be a valuable tool for ongoing monitoring efforts, having the potential to capture data on fires missed by satellites. This contributes to the development to more effective strategies for mitigating the environmental, infrastructural, and safety impacts of fire events.
Hinweise
Giuseppe Fenza, Mariacristina Gallo, Vincenzo Loia, and Alberto Volpe contributed equally to this work.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

The latest reports published by the Congressional Research Service [1] highlight that, since 2000, the United States area monitored by the National Interagency Coordination Center (NICC) has been impacted by 70, 025 wildfires a year that have damaged about 7 million acres. The report highlights that although there were more fires per year on average in the 1990s, these fires were generally smaller, and the amount of land burned was half the current annual average. Beyond fire events, recent history demonstrates a significant improvement in environmental catastrophes due to illicit activities, climate change, or fatalities. Despite their causes, catastrophic events pose a huge threat to the environment, people, and infrastructure [2]. So, detecting them as early as possible can limit damage [3].
Due to this trend, social media like Twitter,1 are continuously updated and can describe a detailed picture of past and current happenings in the location of interest; therefore, they are considered a powerful source of information [4]. These and other characteristics of Web 2.0 and open sources are the key pillars behind the second generation of Open-Source Intelligence (OSINT) [5] and the reasons behind the diffusion of OSINT in both public and private sectors like defense, marketing, due diligence and so on.
Existing literature shows how the Open-Source Intelligence (OSINT) cycle can be applied to collect and analyze text and multimedia content from the web and social media [6] to manage dangerous situations [7, 8]. In this sense, monitoring open sources and extracting the related actionable and valuable knowledge rewrite classical and vertical processes into new ones that can support the expert in a critical context. In the environmental domain, the use of open sources such as social media and micro-blogging could constitute valuable input to the fire monitoring process [9, 10]; in particular, the identification and reporting of posts related to occurring and unknown firing events could increase the monitoring coverage that usually is human-based or, depending on both financial availability and current weather conditions, satellite-based.
For what concern satellite-based monitoring, both private and publicly available platforms allow the domain expert to locate active emergencies all over the world. However, these platforms have the inherent weakness of being a satellite-based service, only providing information when and where the satellite passes orbit and only in case of suitable weather conditions, leaving many emergencies without an appropriate information set required to make high-impact and high-risk decisions. Satellite monitoring does not ensure comprehensive detection of all fires or real-time access to their details. Instances of fire initiation and extinction between satellite observations present potential limitations. Factors such as cloud cover, dense smoke, or the presence of a tree canopy can obscure a fire entirely, making it undetectable. Furthermore, fires that are too small or insufficiently hot may elude registration through satellite-based monitoring. Therefore, it is important to use a multi-domain approach that leverages the strengths of both OSINT and GEOINT (Geospatial Intelligence), respectively, for the monitoring of open sources and satellite imagery and cross-relate information from different sources to realize a reliable detection of fire events [11, 12].
This paper proposes a fire indicator, the Perceived Risk Index (PRI), which leverages the cognitive process behind people’s risk perception [13] on social media [14] to detect, locate and monitor active fires. In particular, PRI considers the amount of alerts in the same geographical zone and the intensity of expressed emotion to give monitoring experts a measure of the perceived intensity of an ongoing fire. The index is assessed through a system that collects, classifies, annotates, and correlates posts with satellite data. The system gathers information from social media posts, classifies it to filter out irrelevant posts, and applies Natural Language Processing (NLP) techniques to extract knowledge in terms of contents, time, and geographical localization. Additionally, other types of information are collected by cross-relating open-source data, particularly satellite observations. The obtained fire reports are stored in a full-text index implemented with Apache Solr and visually exposed through a dashboard to help experts detect and monitor ongoing fire events.
The main contributions of the work can be summarized as follows:
  • Definition of a Perceived Risk Index (PRI) measuring risk related to ongoing fires leveraging microblogging contents and expressed emotions.
  • Assessing PRI reliability by correlating its distribution during real fires with satellite fire intensity distributions on different datasets.
  • A monitoring framework able to collect, classify, annotate, correlate posts with satellite data, and summarize detected alerts in an interactive dashboard.
The rest of the paper is organized as follows: the “Related Work” section analyzes the state-of-the-art on fire detection and techniques involved in the process; the “Overall Process” section describes the proposed approach. Experimentation is described in the “Experimentation” section; and some work limitations are described in the “Limitations” section. The “Conclusions” section concludes the work.
Existing scientific literature amply explored the relationship between open sources and key domains like economy and finance [15, 16], public safety [17, 18] and environmental monitoring [19]. In the domain of interest, in particular, proposed methodologies cover all relevant categories, such as air pollution, floods, fires, and so on. For example, in 2019, Gurajala et al. [20] collected two years of tweets from Paris, London, and New Delhi to analyze the societal response to air quality. In particular, leveraging Natural Language Processing, topic modeling, and three different text classifiers demonstrated that the number of tweets related to concerns about air quality degradations is highly correlated to PM values.
The literature also highlights that locating environmental events via social activity without other information is feasible, albeit requires further refinement to achieve optimal results [21]. Therefore, an assessment phase is required to validate social data with classical insights and vice-versa. Main assessment processes leverage data from multiple sources and remote sensing by fusing them to compute a final index useful to the expert. This information can subsequently be utilized to direct remote sensing data collection (e.g., via satellites) for a more comprehensive analysis while managing the crisis. Studies have demonstrated that utilizing multiple modes or cross-modal learning can significantly improve the results compared to using only one data type [22, 23]. In particular, Pramanik et al. [24] state that online social media ubiquity can be considered a “sensor” and can be used to extend further coverage where there are no Continuous Ambient Air Quality Monitoring Stations (CAAQMSs). Nevertheless, they fuse influential user tweets with CAAQMS to create a crowd-sensed air quality measurement framework required to raise awareness and support the activities of boards. Through these sensors, authors define the “influential users” that express their reactions, sentiments, and opinions towards pollution levels and, by tracking their tweets, estimate air quality in urban areas of developing regions. Kumbalaparambi et al. [25] relate tweets and their enrichment to the PM2.5 concentration. In particular, starting from a word cloud of expressed emotions for each season, they identify the main tokens used when talking about air pollution issues. Then, the authors use a self-attention mechanism to categorize into three air pollution classes (poor, good, and noise-neutral) related tweets that discuss air quality issues; finally, they define a BiLSTM to estimate the PM2.5 concentration. The BiLSTM is trained by collected tweets and related signals produced by CAAQMSs. Sadiq et al. [26] highlight how, during flood events, the infrastructure damaged cannot always be detected using remote sensing and leveraging social sensing like microblogging activities as a source of useful information. They also fuse remote and social sensing data to derive informed flood extent maps. Khan et al. [27] studied the quality of social media data to understand if it constitutes a reliable alternative in the absence of authoritative and official data related to flooding scenarios by focusing on media content like images and video. Social media data is fused with official rainfall data to assess the validity of tweet statements and identify the following three types of signals: (i) confirmatory signals, which imply a high level of confidence that a region is flooded; (ii) complementary signals that provide contextual information such as needs and requests, disaster impact or damage; and (iii) novel signals when both data sources do not overlap and provide a unique set of data points. Liu et al. [28] designed and tested six different computational and spatiotemporal analytical approaches to assess the relevance of risk information extracted from tweets and apply it during the 2013 Colorado flood event.
In a fire monitoring scenario, the goal is to detect as early as possible a fire event to activate the disaster management process able to save lives, protect the environment, and assess damages. In this sense, social media are adopted for estimating a level of fire risk from the vulnerability of population and ecological system points of view. Loureiro et al. [9], leveraging Natural Language Processing (NLP) and sentiment analysis, related social media posts about wildfire with political, economic, and welfare perceptions. The approach defines a hedonometer estimating how sentiments about wildfires vary with exposure, measured via Euclidean distance between the event of interest and air quality. Yue et al. [10] propose a proof of concept that uses geo-tagged social media-derived data to evaluate wildfire hazard and social-ecological vulnerability with the final goal of identifying the most vulnerable area. Researchers conclude that (i) Geo-tagged social media data are useful for disaster risk studies and (ii) massive and vulnerable populations might result in a significant increase in wildfire risk perception. CASPER (Category and Sentiment-based Problem Finder) system [29] detects wildfires by tracking the sentiment expressed in social media posts.
Some studies expand text comprehension through transformer models [30]. This is the case for the work in [31], where a BERT-based classifier recognizes fire-related tweets obtained via a query-based crawler and signals the alarm only in case of a true positive. Even if Ningsih and Hadiana [32] notice that it is not always apparent if the words of a person announce a catastrophe, the detection of disasters in tweets is often difficult due to the uncertainty of tweet language structure, a vast number of recent methodologies leverage a classifier for disaster tweets classification to support disaster management, rescue and emergency responders in spreading information during disasters and needy situations [3335].
This paper proposes a Perceived Fire Index based on tweet information that intends to give an overall idea of the seriousness of an ongoing fire event when more official data are unavailable. The proposed index leverages recent developments in Natural Language Processing (i.e., Transformer-based classifiers) to identify relevant tweets and analyze them in terms of geo-location and expressed emotions.

Overall Process

The proposed solution constructs a monitoring framework that gives experts awareness of ongoing fires by cross-relating open-source information coming from satellite and social media. The system acquires and extracts information and processes data to provide experts, through an interactive dashboard, with a summary of potentially dangerous situations regarding fire events. Finally, an assessment of a Perceived Risk Index, derived from the posted tweets, offers experts an indicator of event severity.
The process, outlined in Fig. 1, consists of the following steps:
1.
Twitter Crawler: Employing the Twitter API, a crawler retrieves tweets based on a user-specified search query (e.g., “fire”);
 
2.
Fire Tweet Classifier: In this phase, incoming tweets are classified to filter relevant ones (i.e., tweets actually reporting a fire) through an ad-hoc model fine-tuned during the experimentation of the proposal;
 
3.
Tweet Annotation: During this phase, relevant tweets are automatically processed to extract additional information regarding the place and date of the fire and the emotions expressed in the text. In particular, Neural Networks at the state-of-the-art are exploited for Named Entity Recognition, a fine-tuned roBERTa model extracts emoji and their score from tweet content, and GeoNames is adopted for the geolocalize the warning in the tweet.
 
4.
Fire Detection: The system constantly retrieves additional information from distributors of active fire data, exploiting satellite sensors.
 
5.
Fire Reports Database: All collected tweets and their metadata are stored within the Fire Reports database.
 
6.
Perceived Risk Index: This stage, involving geographic data and specific time intervals, exploits matching tweets to evaluate a Perceived Risk Index. It measures people’s perceptions about active fires, giving experts an idea of the seriousness of fire from the users’ point of view.
 
7.
Interactive Fire Dashboard: This component allows experts to monitor specific areas or at-risk situations in a user-friendly, interactive interface.
 

Twitter Crawler

To realize the crawler, Twitter API,2 which enables programmatic access to Twitter, is accessed through the Python library Tweepy.3 It is adopted to request tweets in real time. API also catches additional data such as retweets, replies, likes, and special contents (e.g., images) of any tweet the query finds.

Fire Tweet Classifier

For the Fire Tweet Classifier construction, a language model is fine-tuned through a fire dataset. The objective is to construct a classifier to distinguish between generic tweets and tweets reporting fires. A bert-base-uncased model4 [36] is trained and tested through three wildfire datasets from the “Disaster Tweet Corpus 20205”. In particular, training and test sets were constructed (with a percentage of 70 and 30, respectively) by randomly selecting tweets contained in the following datasets:
  • Wildfire-australia-2013
  • Wildfire-california-2014
  • Wildfire-colorado-2012
The Disaster Tweet Corpus 2020 dataset consists of tweets collected during 48 disasters over 10 disaster types, with human annotations denoting if a tweet is related to such disaster or not [37]. In particular, datasets contain 6440 tweets, of which one-half refers to fires and the other does not.
Training has been done through the following hyperparameters: batch size of 16; learning rate of \(5e^{-5}\); AdamW optimizer; 2 epochs.

Tweet Annotation

Tweet Annotation aims to extract useful information from collected tweets, such as geo-localization, time, emotions, and so on. Actually, the date is associated with the creation timestamp of the tweet; for the location, there are two ways to obtain it: (i) via the geo-tag (attached coordinates or place identifier) when available and (ii) by searching places mentioned in the text, through a NER (Named Entity Recognition) process. The locations mentioned in the text undergo geocoding to obtain the relative coordinates. If no place is found in a tweet, it is discarded.
As mentioned, the Tweet Annotation subphase passes through a Natural Language Processing pipeline that extracts the location mentioned within the tweet content and the associated expressed emotions. For the first goal, the adopted Python library is Stanza6: a collection of tools for the linguistic analysis of many human languages [38]. Named entities of interest for the proposed system are locations (e.g., addresses, cities, counties) and GeoPolitical Entities (GPE) such as States. Whenever the NER extracts more than one geographical entity, the corresponding tweet is discarded. Moreover, a call to the GeoNames Webservice7 (through the GeoPy library8) allows us to detect its coordinates for each extracted entity. Given a location name, GeoNames searches for it and returns a detailed data structure containing geographical information (e.g., latitude and longitude). In case multiple locations are found via this process, only the first result (the most likely) will be considered for the given query.
The extraction of emotions from tweets exploits the emoji extraction implemented by the twitter-roberta-base-emoji9 transformer model. The model predicts 20 emojis and their scores (in the range \([0-1]\)) [39]. Predicted emojis and their score, particularly the fire one, are added to stored reports. The framework exploits the fire emotional score to measure the perceived user’s emotion associated with the fire event.

Fire Detection

The Fire Information for Resource Management System (FIRMS)10 collects active fire data. FIRMS distributes Near Real-Time (NRT) active fire data from the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the Aqua and Terra satellites and the Visible Infrared Imaging Radiometer Suite (VIIRS) aboard S-NPP and NOAA 20 (formally known as JPSS-1). Globally, these data are available within 3 h of satellite observation, but active fire detection is available in real-time for the US and Canada. Data collected by FIRMS, available to download in a structured format, contains information such as latitude, longitude, brightness, satellite, instrument, and acquiring date.
Requests to FIRMS are made through the official API.11 Uncoupled from the tweet retrieval, the system restarts the satellite collection every 30 min. Then, new satellite observations are matched with stored fire reports (not yet validated) through places and times. The geographical match exploits Eq. 1, while the time matching considers a match between days. When a match is found, the corresponding fire report (i.e., the set of tweets referencing the same fire event) is considered “validated”, and the level of intensity of fire (i.e., the reported brightness12) is associated with it. In particular, items corresponding to tweets just validated are updated, and the corresponding brightness is stored.
The matching between places is made by evaluating the distance between their coordinates leveraging the Haversine formula [40]. It is a mathematical formula used to calculate the distance between two points on the surface of a sphere, given their latitudinal and longitudinal coordinates. It is commonly used in navigation and geolocation applications, especially in calculating distances on the Earth. Following its formal definition:
$$\begin{aligned} {\begin{matrix} &{} hav(\Theta ) = hav(\phi _2 - \phi _1) + \cos (\phi _1)\cos (\phi _2)hav(\lambda _2- \lambda _1)\\ &{} hav(\Theta ) = \sin ^2\left( \frac{\Theta }{2}\right) = \frac{1-\cos (\Theta )}{2} \end{matrix}} \end{aligned}$$
(1)
where
  • \(\phi _1\) and \(\phi _2\) are latitudes of the first and second point, respectively;
  • \(\lambda _1\) and \(\lambda _2\) are longitudes of the first and second point, respectively;
  • \(\theta\) is the central angle formed by the two points and the center of the Earth.
Regarding time, tweets and satellite observations match when the reference date is the same, regardless of time.

Fire Reports Database

Every tweet and fire information extracted is stored in an Apache Solr13 index. Apache Solr is an open-source enterprise-search platform with REST-like API. Solr stores the documents in structures called cores. Every core has its schema, which defines data types for every field and indexing and querying functionalities. The core used for the proposed system stores documents consisting of the following fields:
  • id: Solr identifier for the specific document (representing a tweet): is unique and automatically generated at the document creation;
  • \(id\_tweet\): identifier of the tweet associated with the document;
  • text: text content of the tweet.
  • user: author of tweet;
  • \(retweet\_count\): number of retweets for the given tweet;
  • \(favorite\_count\): number of likes expressed for the given tweet;
  • \(retweeted\_tweet\): identifier of the retweeted tweet (if the given tweet is a retweet);
  • entities: entities extracted by the NLP from the tweet content;
  • emotions: type of emotions and their scores extracted from the tweet text and expressed as follows: \(emotionType_1: score_1\), \(emotionType_2: score_2\), \(\dots\);
  • coordinates: geolocalization of the tweet (extracted from the tweet metadata or its text), expressed in the “Latitude, Longitude” format;
  • date: tweet creation timestamp;
  • bright: level of brightness (i.e., intensity) of fire detected by the satellite on the same date and place;
  • firms: boolean value stating if a satellite detection has validated the report (i.e., the satellite has also detected the fire event).
The store component containing fire reports has been implemented through the Python library pysolr.14

Perceived Risk Index Evaluation

The definition of the Perceived Risk Index (PRI) derives from a preliminary analysis of tweet aspects’ influence. In particular, a Multiple Linear Regression (MLR) has been done to determine a dependency between the brightness value (i.e., the fire strength or intensity) detected by the satellite and information from Twitter. The emotional fire score, number of retweets and likes, and the total number of tweets in the same day are associated with each detected tweet. The level of brightness is the dependent variable; the objective is to understand what tweet features (as independent variables) affect the fire intensity. The adopted dataset is the same as Fire Tweet Classifier training (see the “Fire Tweet Classifier” section). Results of the linear regression, shown in Table 1, register an \(R^2\) of 0.72. They affirm that with a level of significance \(\alpha =0.05\), the emotional fire score and the total number of tweets are relevant features for predicting a fire event intensity level. The same cannot be said for the number of retweets and likes. It follows that the Perceived Risk Index should exploit the number of tweets and the fire event intensity level. So, by defining a decay function \(\delta _t\) as follows:
$$\begin{aligned} \delta _t = 2^{-\lambda (t-t_{last})} \end{aligned}$$
(2)
where:
  • \(\lambda\) is a decay factor in \([0-1]\);
  • t is the current instant;
  • \(t_{last}\) is the instant of the last valid tweet.
The PRI for the g geographical area, at time t, is assessed through the following equation:
$$\begin{aligned} {Perceived\_Risk\_Index_{gt} = {\sum _{tw \in T} s_{tw}} * |T| * \delta _t} \end{aligned}$$
(3)
where:
  • T is the tweet set for a given geographic area and date, posted by different users;
  • \(s_{tw}\) is the emotional fire score of tweet tw;
  • |T| is the cardinality of T.
The decay function needs to align the intensity of PRI to the real evolution of fire. In particular, it guarantees a gradual PRI decrease when the tweet stream slows down.
After an empirical analysis of results on the evaluated datasets, the Perceived Risk Index value can be interpreted as follows:
  • \(Perceived\_Risk\_Index < 50\) corresponds with low risk;
  • \(50< Perceived\_Risk\_Index < 500\) corresponds with a moderate risk;
  • \(Perceived\_Risk\_Index \ge 500\) corresponds with a high risk.
Table 1
Regression results
 
coef
stderr
t
\(P>\Vert t\Vert\)
[0.025
0.975]
Emotional fire score
11.07
5.24
2.11
0.04
0.77
21.37
# Tweets
1.35
0.05
26.17
0.00
1.25
1.45
# Retweet
0.04
0.02
1.92
0.06
–0.01
0.07
# Like
–2.3
1.88
–1.22
0.22
–5.99
1.39

Interactive Fire Dashboard

Data contained in Solr is available through a dashboard realized with Banana,15 a Solr plugin. From the dashboard, it is possible to filter reports and get every information about them, such as tweets generating the report and the intensity score. The dashboard also includes a map to show reports geographically based on their coordinates. In particular, the user can specify a query and a time period to search for reports (Fig. 2). Results are exposed in a table and graphical form and through a map, highlighting reports through a marker corresponding with their position. Markers change based on the number of detected fires and their intensity, as depicted in the example in Fig. 3).

Experimentation

The experimentation of the proposed approach consists of analyzing the existence of a significant correlation between the intensity of the fire (detected by the satellite) and the proposed Perceived Risk Index. The objective is to demonstrate its validity as an index for fire detection and monitoring.

Datasets

The implemented system has been tested by collecting tweets from 1 to 20 in June 2022, leveraging the Twitter API. A total of 6245 English tweets have been found through the following query:
Among all collected tweets, 4923 have been classified as relevant (i.e., fire-related) and undergo the process described in the “Overall Process” section and exemplified in Fig. 4. At the end of annotation process 3804 tweets are considered. Info about the size of datasets is reported in Table 2.
Table 2
Size of adopted datasets
Dataset
#Fire-related Tweets
Our
3804
Australia wildfire
865
Colorado wildfire
901
California wildfire
1454
The example in Fig. 4 shows that in classifying a set of three tweets, one is considered fire-related and undergoes the annotation process. The annotation process, particularly NER, recognizes “Arizona” as a geographical entity from text content and “Jun 14, 2022” as a time reference from tweet metadata. The Transformer model predicts the fire emoji for tweet content with an intensity (i.e., probability) of about 0.37. Then, through the GeoNames Webservice, coordinates of “Arizona” are extracted (i.e., latitude: 34.5, longitude: \(-\)111.5). Given the information from the annotation process, a match with satellite data considering date and geographical area can be done. In particular, let us assume the satellite has detected a fire at the coordinates latitude: 33.3 and longitude: \(-\)111.6. The match could be done if we assume a maximum distance limit of 150 kms from the extracted geographical entity (i.e., Arizona). So, the fire report can be validated, and the brightness (severity/intensity) level of the detected fire can be determined. In this instance, the corresponding satellite detection indicates a brightness of 367K.
In addition, three public datasets of wildfire (described in the “Fire Tweet Classifier” section) are adopted for the correlation evaluation.

Fire Tweet Classifier Performance Evaluation

The fine-tuned model for the classification of relevant tweets obtains the subsequent performance: Accuracy: \(99\%\), Average F1-score: \(98\%\). Performance has also been compared with recent approaches, as shown in Table 3.
Table 3
Fire Tweet Classifier: performance comparison
Approach
Average F1-score
DNN [41]
0.949
BERT [37]
0.929
Fine-tuned bert-base-uncased (our)
0.979

Results

The evaluation of the validity of the proposed index in measuring the seriousness of an ongoing fire passes through the elaboration of Pearson’s correlation coefficient. In particular, for each considered dataset, two distributions are compared:
  • PRIs for each considered time interval;
  • The corresponding brightness values for each considered time interval.
The shared time interval and geographical area join distribution pairs. Moreover, for each dataset, we assess the decay function \(\delta\) by setting three values for the decay factor \(\lambda\): 0.01, 0.1, and 0.9.
After checking the normalization of distributions, we extracted Pearson’s correlation coefficients and their significance (represented by the p_value).
Results, summarized in Table 4, show an average best correlation of 0.77 among all adopted datasets with a valid significance. In particular, for the adopted datasets, the value 0.1 is the most suitable decay factor. Such correlation demonstrates the validity of the proposed Perceived Risk Index as a measure for early alerting in detecting and monitoring fire events.
Table 4
Correlation results
Dataset
\(\boldsymbol{\lambda}\)
Pearson’s coefficient
Significance (p _value)
Our
0.1
0.77
0.009
 
0.01
0.76
0.009
 
0.9
0.56
0.01
Australia wildfire
0.1
0.77
0.006
 
0.01
0.75
0.007
 
0.9
0.63
0.02
Colorado wildfire
0.1
0.76
0.008
 
0.01
0.69
0.008
 
0.9
0.59
0.01
California wildfire
0.1
0.78
0.006
 
0.01
0.78
0.007
 
0.9
0.60
0.04
Values in bold identify combinations with the best correlation for each dataset

Limitations

The evaluation of the proposed Perceived Risk Index (PRI) showcases the potential of utilizing social media information to derive a risk assessment associated with fire. Nevertheless, despite the potential of this research, it does have some limitations which will be the focus of future developments and improvements:
  • The current analysis is limited to posts in English. Future research should explore methods to extend the analysis to a multilingual context because analyzing only English content may lead to the omission of crucial data for preventing and managing emergency situations, as relevant information could be expressed in various languages.
  • The analysis does not incorporate images and videos associated with fire scenes, a factor that could enhance the credibility of user-contributed posts. Future developments should focus on integrating multimedia content to provide a more comprehensive understanding of incident reports.
  • The methodology faces challenges in mitigating false alerts generated by disinformation campaigns. Future extensions should improve filters to identify and exclude misleading information effectively. This includes addressing the dissemination of inaccurate and deceptive information by malicious actors.
  • The current implementation relies on Twitter as the social media source. This introduces a dependency on the availability of the Twitter API service and related updates. Future enhancements should explore the inclusion of multiple social media platforms to diversify data sources and improve reliability.
  • The methodology relies on the acquisition of satellite data through FIRMS API, which could bring in a delay between collected data and data really accessible. In this sense, the hypothesis of introducing additional acquisition techniques could be evaluated in the future.

Conclusions

This paper adopts an information fusion approach to propose a Perceived Risk Index concerning fire events. First, a Twitter crawler collects tweets, and a classifier filters them based on their relevance; then, tweets are processed in terms of content (such as NLP, geo-location, and emotion extraction). Finally, fire information is cross-related with satellite information to construct the Perceived Risk Index. Such an indicator leverages the number of relevant daily tweets for a specific geographic area and the emotional fire score extracted from them. Collected reports and additional information populate a Solr core with content available to experts through an interactive dashboard. Although some technical limitations due, for example, to the availability of APIs or accessibility of social media posts, the analysis of the correlation (also on real datasets) between the fire brightness and the proposed indicator reveals the validity of the index for assisting experts in detecting and monitoring fire events.
In the future, the proposed indicator could be extended by evaluating the contribution of the following information:
  • Analysis to a multilingual context;
  • Images (and their contents) attached to tweets;
  • The existence of links in the tweets and, eventually, the reliability of corresponding sites;
  • Posts from additional social media, like Instagram and Facebook, to enrich incoming data.
  • Introduce a corroboration methodology to weaken disinformation attempts.

Declarations

Ethical Approval

This article does not contain any studies with animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fußnoten
1
Twitter is now called X but we refer to Twitter because the prototype and the experimental results leverage data collected before the change.
 
2
Available at https://​developer.​twitter.​com/​en/​docs/​twitter-api (last verified: 16 October 2023).
 
3
Available at: https://​www.​tweepy.​org/​ (last verified: 16 October 2023).
 
4
Available at: https://​huggingface.​co/​bert-base-uncased (last verified: 16 October 2023).
 
5
Available at: https://​zenodo.​org/​record/​3713920#.​Y_​4zE3bMJ9M (last verified: 16 October 2023).
 
6
Available at: https://​stanfordnlp.​github.​io/​stanza/​index.​html (last verified: 16 October 2023).
 
7
Available at: http://​www.​geonames.​org/​export/​web-services.​html (last verified: 16 October 2023).
 
8
Available at: https://​geopy.​readthedocs.​io/​en/​stable/​ (last verified: 16 October 2023).
 
9
Available at: https://​huggingface.​co/​cardiffnlp/​twitter-roberta-base-emoji (last verified: 16 October 2023).
 
10
Available at: https://​firms.​modaps.​eosdis.​nasa.​gov/​ (last verified: 16 October 2023).
 
12
FIRMS measures fire intensity through a fire pixel brightness temperature (in Kelvin).
 
13
Available at: https://​solr.​apache.​org/​ (last verified: 16 October 2023).
 
14
Available at: https://​pypi.​org/​project/​pysolr/​ (last verified: 16 October 2023).
 
15
Available at: https://​github.​com/​LucidWorks/​banana (last verified: 16 October 2023).
 
Literatur
2.
Zurück zum Zitat Naderpour M, Rizeei HM, Khakzad N, Pradhan B. Forest fire induced Natech risk assessment: a survey of geospatial technologies. Reliab Eng Syst Safety. 2019;191:106558.CrossRef Naderpour M, Rizeei HM, Khakzad N, Pradhan B. Forest fire induced Natech risk assessment: a survey of geospatial technologies. Reliab Eng Syst Safety. 2019;191:106558.CrossRef
3.
Zurück zum Zitat Cui F. Deployment and integration of smart sensors with IoT devices detecting fire disasters in huge forest environment. Comput Commun. 2020;150:818–27.CrossRef Cui F. Deployment and integration of smart sensors with IoT devices detecting fire disasters in huge forest environment. Comput Commun. 2020;150:818–27.CrossRef
4.
Zurück zum Zitat Rahimizadeh P, Shayegan MJ. Event detection in twitter by weighting tweet’s features. In: 2022 8th International Conference on Web Research (ICWR). IEEE; 2022. pp. 203–9. Rahimizadeh P, Shayegan MJ. Event detection in twitter by weighting tweet’s features. In: 2022 8th International Conference on Web Research (ICWR). IEEE; 2022. pp. 203–9.
5.
Zurück zum Zitat Williams HJ, Blum I. Defining second generation open source intelligence (OSINT) for the defense enterprise. 2018. Williams HJ, Blum I. Defining second generation open source intelligence (OSINT) for the defense enterprise. 2018.
6.
Zurück zum Zitat Fenza G, Gallo M, Loia V, Volpe A. Cognitive name-face association through context-aware graph neural network. Neural Comput Appl. 2021;34:10279–93.CrossRef Fenza G, Gallo M, Loia V, Volpe A. Cognitive name-face association through context-aware graph neural network. Neural Comput Appl. 2021;34:10279–93.CrossRef
7.
Zurück zum Zitat Young B. Application of the intelligence cycle to prevent impacts of disastrous wildland fires. Technical Report, Naval Postgraduate School, Center for Homeland Defense and Security. 2018. Young B. Application of the intelligence cycle to prevent impacts of disastrous wildland fires. Technical Report, Naval Postgraduate School, Center for Homeland Defense and Security. 2018.
9.
Zurück zum Zitat Loureiro ML, Alló M, Coello P. Hot in twitter: assessing the emotional impacts of wildfires with sentiment analysis. Ecol Econ. 2022;200:107502.CrossRef Loureiro ML, Alló M, Coello P. Hot in twitter: assessing the emotional impacts of wildfires with sentiment analysis. Ecol Econ. 2022;200:107502.CrossRef
10.
Zurück zum Zitat Yue Y, Dong K, Zhao X, Ye X. Assessing wild fire risk in the united states using social media data. J Risk Res. 2021;24(8):972–86.CrossRef Yue Y, Dong K, Zhao X, Ye X. Assessing wild fire risk in the united states using social media data. J Risk Res. 2021;24(8):972–86.CrossRef
11.
Zurück zum Zitat De Maio C, Fenza G, Gallo M, Loia V, Volpe A. Cross-relating heterogeneous text streams for credibility assessment. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS). IEEE; 2020. pp. 1–8. De Maio C, Fenza G, Gallo M, Loia V, Volpe A. Cross-relating heterogeneous text streams for credibility assessment. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS). IEEE; 2020. pp. 1–8.
12.
Zurück zum Zitat Mantsis DF, Bakratsas M, Andreadis S, Karsisto P, Moumtzidou A, Gialampoukidis I, Karppinen A, Vrochidis S, Kompatsiaris I. Multimodal fusion of sentinel 1 images and social media data for snow depth estimation. IEEE Geosci Remote Sens Lett. 2020;19:1–5.CrossRef Mantsis DF, Bakratsas M, Andreadis S, Karsisto P, Moumtzidou A, Gialampoukidis I, Karppinen A, Vrochidis S, Kompatsiaris I. Multimodal fusion of sentinel 1 images and social media data for snow depth estimation. IEEE Geosci Remote Sens Lett. 2020;19:1–5.CrossRef
14.
Zurück zum Zitat Li F, Zhou T. Effects of objective and subjective environmental pollution on well-being in urban china: a structural equation model approach. Soc Sci Med. 2020;249:112859.CrossRef Li F, Zhou T. Effects of objective and subjective environmental pollution on well-being in urban china: a structural equation model approach. Soc Sci Med. 2020;249:112859.CrossRef
15.
Zurück zum Zitat Fekrazad A, Harun SM, Sardar N. Social media sentiment and the stock market. J Econ Fin. 2022;46(2):397–419.CrossRef Fekrazad A, Harun SM, Sardar N. Social media sentiment and the stock market. J Econ Fin. 2022;46(2):397–419.CrossRef
16.
Zurück zum Zitat Ao S. Sentiment analysis based on financial tweets and market information. In: 2018 International Conference on Audio, Language and Image Processing (ICALIP). IEEE; 2018. pp. 321–6. Ao S. Sentiment analysis based on financial tweets and market information. In: 2018 International Conference on Audio, Language and Image Processing (ICALIP). IEEE; 2018. pp. 321–6.
17.
Zurück zum Zitat Vohra A, Garg R. Deep learning based sentiment analysis of public perception of working from home through tweets. J Intell Inf Syst. 2023;60(1):255–74.CrossRef Vohra A, Garg R. Deep learning based sentiment analysis of public perception of working from home through tweets. J Intell Inf Syst. 2023;60(1):255–74.CrossRef
18.
Zurück zum Zitat Theng CP, Othman NF, Abdullah RS, Anawar S, Ayop Z, Ramli SN. Cyberbullying detection in twitter using sentiment analysis. Int J Comput Sci Netw Secur. 2021;21(11):1–10. Theng CP, Othman NF, Abdullah RS, Anawar S, Ayop Z, Ramli SN. Cyberbullying detection in twitter using sentiment analysis. Int J Comput Sci Netw Secur. 2021;21(11):1–10.
19.
Zurück zum Zitat Lam NS, Meyer M, Reams M, Yang S, Lee K, Zou L, Mihunov V, Wang K, Kirby R, Cai H. Improving social media use for disaster resilience: challenges and strategies. Int J Digital Earth. 2023;16(1):3023–44.CrossRef Lam NS, Meyer M, Reams M, Yang S, Lee K, Zou L, Mihunov V, Wang K, Kirby R, Cai H. Improving social media use for disaster resilience: challenges and strategies. Int J Digital Earth. 2023;16(1):3023–44.CrossRef
20.
Zurück zum Zitat Gurajala S, Dhaniyala S, Matthews JN. Understanding public response to air quality using tweet analysis. Soc Med+ Society. 2019;5(3):2056305119867656. Gurajala S, Dhaniyala S, Matthews JN. Understanding public response to air quality using tweet analysis. Soc Med+ Society. 2019;5(3):2056305119867656.
21.
Zurück zum Zitat Feng Y, Huang X, Sester M. Extraction and analysis of natural disaster-related VGI from social media: review, opportunities and challenges. Int J Geogr Inf Sci. 2022;36(7):1275–316.CrossRef Feng Y, Huang X, Sester M. Extraction and analysis of natural disaster-related VGI from social media: review, opportunities and challenges. Int J Geogr Inf Sci. 2022;36(7):1275–316.CrossRef
22.
Zurück zum Zitat Hong D, Yokoya N, Xia G-S, Chanussot J, Zhu XX. X-modalnet: a semi-supervised deep cross-modal network for classification of remote sensing data. ISPRS J Photogramm Remote Sens. 2020;167:12–23.CrossRef Hong D, Yokoya N, Xia G-S, Chanussot J, Zhu XX. X-modalnet: a semi-supervised deep cross-modal network for classification of remote sensing data. ISPRS J Photogramm Remote Sens. 2020;167:12–23.CrossRef
23.
Zurück zum Zitat Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B. More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans Geosci Remote Sens. 2020;59(5):4340–54.CrossRef Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B. More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans Geosci Remote Sens. 2020;59(5):4340–54.CrossRef
24.
Zurück zum Zitat Pramanik P, Mondal T, Nandi S, Saha M. Aircalypse: can twitter help in urban air quality measurement and who are the influential users? In: Companion Proceedings of the Web Conference 2020. 2020. pp. 540–5. Pramanik P, Mondal T, Nandi S, Saha M. Aircalypse: can twitter help in urban air quality measurement and who are the influential users? In: Companion Proceedings of the Web Conference 2020. 2020. pp. 540–5.
25.
Zurück zum Zitat Kumbalaparambi TS, Menon R, Radhakrishnan VP, Nair VP. Assessment of urban air quality from twitter communication using self-attention network and a multilayer classification model. Environ Sci Pollut Res. 2023;30(4):10414–25.CrossRef Kumbalaparambi TS, Menon R, Radhakrishnan VP, Nair VP. Assessment of urban air quality from twitter communication using self-attention network and a multilayer classification model. Environ Sci Pollut Res. 2023;30(4):10414–25.CrossRef
26.
Zurück zum Zitat Sadiq R, Akhtar Z, Imran M, Ofli F. Integrating remote sensing and social sensing for flood mapping. Remote Sens Appl: Soc Environ. 2022;25:100697. Sadiq R, Akhtar Z, Imran M, Ofli F. Integrating remote sensing and social sensing for flood mapping. Remote Sens Appl: Soc Environ. 2022;25:100697.
27.
Zurück zum Zitat Khan Q, Kalbus E, Zaki N, Mohamed MM. Utilization of social media in floods assessment using data mining techniques. PLoS One. 2022;17(4):0267079.CrossRef Khan Q, Kalbus E, Zaki N, Mohamed MM. Utilization of social media in floods assessment using data mining techniques. PLoS One. 2022;17(4):0267079.CrossRef
28.
Zurück zum Zitat Liu X, Kar B, Zhang C, Cochran DM. Assessing relevance of tweets for risk communication. Int J Digital earth. 2019;12(7):781–801.CrossRef Liu X, Kar B, Zhang C, Cochran DM. Assessing relevance of tweets for risk communication. Int J Digital earth. 2019;12(7):781–801.CrossRef
29.
Zurück zum Zitat Periñán-Pascual C, Arcas-Túnez F. The analysis of tweets to detect natural hazards. Intell Environ. 2018;2018(23):87–96. Periñán-Pascual C, Arcas-Túnez F. The analysis of tweets to detect natural hazards. Intell Environ. 2018;2018(23):87–96.
30.
Zurück zum Zitat Arroni S, Galán Y, Guzmán-Guzmán X, Núñez-Valdez ER, Gómez A. Sentiment analysis and classification of hotel opinions in twitter with the transformer architecture. Int J Interactive Multimedia Artificial Intelligence. 2023. Arroni S, Galán Y, Guzmán-Guzmán X, Núñez-Valdez ER, Gómez A. Sentiment analysis and classification of hotel opinions in twitter with the transformer architecture. Int J Interactive Multimedia Artificial Intelligence. 2023.
31.
Zurück zum Zitat Mingua J, Padilla D, Celino EJ. Classification of fire related tweets on twitter using bidirectional encoder representations from transformers (BERT). In: 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM). IEEE; 2021. pp. 1–6. Mingua J, Padilla D, Celino EJ. Classification of fire related tweets on twitter using bidirectional encoder representations from transformers (BERT). In: 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM). IEEE; 2021. pp. 1–6.
32.
Zurück zum Zitat Ningsih A, Hadiana A. Disaster tweets classification in disaster response using bidirectional encoder representations from transformer (BERT). In: IOP Conference Series: Materials Science and Engineering (vol. 1115). IOP Publishing; 2021. p. 012032. Ningsih A, Hadiana A. Disaster tweets classification in disaster response using bidirectional encoder representations from transformer (BERT). In: IOP Conference Series: Materials Science and Engineering (vol. 1115). IOP Publishing; 2021. p. 012032.
33.
Zurück zum Zitat Yuan F, Liu R. Mining social media data for rapid damage assessment during hurricane Matthew: feasibility study. J Comput Civ Eng. 2020;34(3):05020001.CrossRef Yuan F, Liu R. Mining social media data for rapid damage assessment during hurricane Matthew: feasibility study. J Comput Civ Eng. 2020;34(3):05020001.CrossRef
34.
Zurück zum Zitat Macêdo JB, Chagas Moura M, Aichele D, Lins ID. Identification of risk features using text mining and Bert-based models: application to an oil refinery. Process Saf Environ Prot. 2022;158:382–99.CrossRef Macêdo JB, Chagas Moura M, Aichele D, Lins ID. Identification of risk features using text mining and Bert-based models: application to an oil refinery. Process Saf Environ Prot. 2022;158:382–99.CrossRef
35.
Zurück zum Zitat Zhou B, Zou L, Mostafavi A, Lin B, Yang M, Gharaibeh N, Cai H, Abedin J, Mandal D. Victimfinder: harvesting rescue requests in disaster response from social media with Bert. Comput Environ Urban Syst. 2022;95:101824.CrossRef Zhou B, Zou L, Mostafavi A, Lin B, Yang M, Gharaibeh N, Cai H, Abedin J, Mandal D. Victimfinder: harvesting rescue requests in disaster response from social media with Bert. Comput Environ Urban Syst. 2022;95:101824.CrossRef
36.
Zurück zum Zitat Kenton JDM-WC, Toutanova LK. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT. 2019. pp. 4171–86. Kenton JDM-WC, Toutanova LK. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT. 2019. pp. 4171–86.
37.
Zurück zum Zitat Wiegmann M, Kersten J, Klan F, Potthast M, Stein B. Analysis of detection models for disaster-related tweets. In: Proceedings of the 17th ISCRA. 2020. pp. 872–80. Wiegmann M, Kersten J, Klan F, Potthast M, Stein B. Analysis of detection models for disaster-related tweets. In: Proceedings of the 17th ISCRA. 2020. pp. 872–80.
38.
Zurück zum Zitat Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD. Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020. pp. 101–8. Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD. Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020. pp. 101–8.
39.
Zurück zum Zitat Barbieri F, Camacho-Collados J, Ronzano F, Anke LE, Ballesteros M, Basile V, Patti V, Saggion H. Semeval 2018 task 2: multilingual emoji prediction. In: Proceedings of the 12th International Workshop on Semantic Evaluation. 2018. pp. 24–33. Barbieri F, Camacho-Collados J, Ronzano F, Anke LE, Ballesteros M, Basile V, Patti V, Saggion H. Semeval 2018 task 2: multilingual emoji prediction. In: Proceedings of the 12th International Workshop on Semantic Evaluation. 2018. pp. 24–33.
40.
Zurück zum Zitat Winarno E, Hadikurniawati W, Rosso RN. Location based service for presence system using haversine method. In: 2017 International Conference on Innovative and Creative Information Technology (ICITech). IEEE; 2017. pp. 1–4. Winarno E, Hadikurniawati W, Rosso RN. Location based service for presence system using haversine method. In: 2017 International Conference on Innovative and Creative Information Technology (ICITech). IEEE; 2017. pp. 1–4.
41.
Zurück zum Zitat Kersten J, Bongard J, Klan F. Gaussian processes for one-class and binary classification of crisis-related tweets. ISCRAM. 2022:664–73. Kersten J, Bongard J, Klan F. Gaussian processes for one-class and binary classification of crisis-related tweets. ISCRAM. 2022:664–73.
Metadaten
Titel
A Perceived Risk Index Leveraging Social Media Data: Assessing Severity of Fire on Microblogging
verfasst von
Carmen De Maio
Giuseppe Fenza
Mariacristina Gallo
Vincenzo Loia
Alberto Volpe
Publikationsdatum
10.04.2024
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 5/2024
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-024-10266-4