Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2024

Open Access 01.12.2024 | Original Article

Twitter’s pulse on hydrogen energy in 280 characters: a data perspective

verfasst von: Deepak Uniyal, Richi Nayak

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2024

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Uncovering the public discourse on hydrogen energy is essential for understanding public behaviour and the evolving nature of conversations over time and across different regions. This paper presents a comprehensive analysis of a large multilingual dataset pertaining to hydrogen energy collected from Twitter spanning a decade (2013–2022) using selected keywords. The analysis aims to explore various aspects, including the temporal and spatial dimensions of the discourse, factors influencing Twitter engagement, user engagement patterns, and the interpretation of conversations through hashtags and ngrams. By delving into these aspects, this study offers valuable insights into the dynamics of public discourse surrounding hydrogen energy and the perceptions of social media users.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The surge in global energy demand, resulting from population growth, economic expansion, and urbanization, has led to heavy reliance on carbon-based fossil fuels. This in turn has contributed to detrimental effects such as high carbon emissions and global warming (Dawood et al. 2020). To address the urgent need for reducing greenhouse gas (GHG) emissions and combating climate change, significant technological advancements in energy generation and consumption systems are required (Dehler-Holland et al. 2022). Countries worldwide have acknowledged the significance of adopting sustainable solutions to decarbonize their economies and participate in the green energy revolution (Kar et al. 2022; Milani et al. 2020).
With its potential for near-zero GHG emissions, hydrogen is gaining global attention as an alternative energy source, particularly for industrial purposes (Lozano et al. 2022). Scaling up hydrogen energy production on a global scale has the potential to facilitate a transition from fossil fuel consumption to clean energy alternatives (Panchenko et al. 2022). Countries worldwide have developed their national strategies for hydrogen energy that have a significant impact on various aspects of society and people’s lives (Japan: First nation to form national hydrogen strategy 2017; Us national clean hydrogen strategy and roadmap 2021; The ten point plan for a green industrial revolution. 2020; Hydrogen strategy for scotland 2022; Uk hydrogen strategy 2021; Hydrogen strategy for Canada 2020; Eu’s hydrogen strategy 2020; The national hydrogen strategy—Germany. 2020; The national hydrogen strategy—Netherlands. 2020; The government’s strategy for power-to-x—Denmark. 2021; The Norwegian government’s hydrogen strategy 2020; Australia’s national hydrogen strategy 2019; National green hydrogen mission 2020). The decisions and policies related to hydrogen energy can influence energy markets, environmental sustainability, and overall economic development.
To achieve net-zero emissions, it is crucial not only to focus on the production, storage, and consumption of hydrogen energy, but also to address the potential obstacles that may impact progress. Governments, public and private industries, and various stakeholders must understand the factors that could hinder the acceptance and adoption of hydrogen energy and take proactive measures to mitigate them. Understanding the social discourse surrounding this emerging technology is essential for policy-making and successful adoption. Standard practice to obtain people perception of emerging technology is to use traditional methods like surveys (Ingaldi and Klimecka-Tatar 2020; Itaoka et al. 2017; Iribarren et al. 2016), opinion polls (Lozano et al. 2022; Han et al. 2022), interviews and focus groups (Jaramillo et al. 2019). However, these methods can be time-consuming, costly, and limited in scope, potentially lacking a global perspective (Corbett and Savarimuthu 2022).
To overcome these limitations and obtain a broader understanding, harnessing the power of widely used social media platforms like Twitter presents a modern and viable solution. Rapid technological advancements, widespread adoption of smart devices, and increased internet connectivity have fuelled the growth of online activities, including those on Twitter (Agarwal et al. 2021). In the digital realm, individuals find it easier to express their attitudes, opinions, and emotions globally. As per a recent study conducted in 2023 (The rise of social media 2023), about 4.9 billion people, constituting approximately 61% of the world population, are actively engaged in diverse social media platforms such as Twitter, Facebook, YouTube, Instagram, Reddit, Snapchat, and Pinterest. These platforms seamlessly gather vast amounts of data from individuals worldwide, spanning multiple languages, thereby generating multilingual citizen data (Xu et al. 2022) to understand people’s perceptions.
Twitter1 has emerged as a prominent platform for discussions on growing energy demands and climate change, attracting diverse participants including politicians, industrialists, scientists, journalists, celebrities, and the public. Researchers have leveraged Twitter to gain valuable insights into social perception and acceptance of energy policies (Corbett and Savarimuthu 2022; Pilar et al. 2019; Ibar-Alonso et al. 2022), identify the increasing polarization in social media discussions (Falkenberg et al. 2022), and examine how climate actions are shaping public perceptions worldwide (Debnath et al. 2022). These studies highlight the significance of Twitter as a source of information for understanding social discourse on energy and climate-related issues.
A handful of studies aim to understand the public perception of hydrogen energy or related technologies by primarily using traditional surveys, questionnaires, and interviews (Jaramillo et al. 2019). These studies often focus on specific regions or countries, such as Australia (Lozano et al. 2022), Eastern Europe (Ingaldi and Klimecka-Tatar 2020), Japan (Itaoka et al. 2017), and South Korea (Han et al. 2022). While valuable insights have been obtained from these studies, they are limited in their scope and may not capture the broader trends across different regions, languages, and timeframes.
To address these limitations and gain a more comprehensive understanding, this paper presents an in-depth analysis of user behaviour, content patterns, and their evolution over time on the topic of hydrogen energy. We propose using Twitter to understand the public discourse and present a comprehensive analysis of multilingual Twitter data pertaining to hydrogen energy. We gathered the data using the academic and research Twitter API spanning a decade (20,132,022). We aim to answer the following questions:
1.
RP1: (Temporal Analysis) How has the public discourse on hydrogen energy evolved over recent years, and when did this topic witness significant surges in user engagement?
 
2.
RP2: (Spatial Analysis) To what extent do different regions or countries contribute to this topic, and what factors explain the variations in their levels of participation?
 
3.
RP3: (Metrics Impacting Twitter Engagement) How do the public metrics of tweets, such as impression count, like count, retweet count, replies, and quote count, relate to the overall engagement and reach of the tweets on the topic?
 
4.
RP4: (User Engagement Analysis) How do public metrics of users, such as followers, following, tweet count, listed count, mention count and other relevant indicators, vary among user discussions? Can we identify any patterns or indicators of influential users based on these metrics?
 
5.
RP5: (Deciphering Twitter Conversations: Hashtags and Bigrams Analysis) What insights can be derived from the hashtag and bigram analysis for the evolving trends and emerging topics related to hydrogen energy?
 
To the best of our knowledge, this study presents the first comprehensive analysis of hydrogen energy discussions on Twitter. We present a data-driven approach to understanding a topic discourse on social media platforms.
The subsequent sections of this paper are organized as follows. Section 2 provides a review of the related research, while Sect. 3 outlines the methodology employed, including subsections for each part of the method. The analysis of the results is presented in Sect. 4. The paper concludes with Sect. 5.

2 Literature review

The study of hydrogen energy is intricately linked to the broader topic of climate change. As the world grapples with increasing environmental challenges, understanding the public discourse on hydrogen energy becomes crucial for successful implementation and policy formulation. Numerous studies have employed traditional survey methods in various localized regions, such as Australia, Japan, Spain, and South Korea (Lozano et al. 2022; Itaoka et al. 2017; Iribarren et al. 2016; Han et al. 2022).
A nationwide survey was conducted in Australia to investigate public perception and social acceptance of hydrogen energy in domestic applications (Lozano et al. 2022). This study examined factors such as cost, reduction in air pollution, health benefits, and support for the hydrogen export industry. Similarly, a study in Japan explored the perception of hydrogen infrastructure and fuel cell vehicles, comparing current survey responses to those from previous years (Itaoka et al. 2017). In South Korea, a survey-based study explored the public acceptance of building hydrogen fuelling stations near residential areas (Han et al. 2022). Although the study found a slightly higher approval rate, a significant portion of respondents remained neutral. To gain insights into public acceptance of hydrogen energy in Spain, a study focused on assessing the social acceptance of hydrogen as a key energy resource for transportation (Iribarren et al. 2016). The study highlighted respondents’ willingness to accept the technology while identifying current obstacles that hinder its success. A study conducted interviews with 12 Hydrogen Fuel Cell Vehicle (HFCV) drivers in Los Angeles, examining factors such as lifetime cost, comparison shopping, and evaluation of refuelling infrastructure adequacy (Jaramillo et al. 2019). The study revealed that environmental concerns were a significant motivator in considering HFCV adoption, as long as it made financial sense.
These traditional survey methods provide valuable insights but are time-consuming, costly, limited in scope and do not capture the global perspective. To overcome these limitations and gain a more comprehensive understanding, researchers have turned to modern approaches such as social media data analysis (Loureiro and Alló 2020). Twitter has been extensively studied and used across various domains, including user behaviour analysis, sentiment analysis (Dehler-Holland et al. 2022; Ibar-Alonso et al. 2022; Kušen and Strembeck 2018; Bashar et al. 2022), trend analysis (Agarwal et al. 2015), event detection and crisis management (Arazzi et al. 2023), identifying misinformation (Andreadis et al. 2021; Saxena et al. 2023), and hate speech detection (Balasubramaniam et al. 2021; Cruz et al. 2022). Twitter has been established as a significant data source, providing an extensive dataset of real-time events across the globe, encompassing various languages.
A handful of research in the field of sustainability or climate change employs modern methodologies such as newspaper analysis (Dehler-Holland et al. 2022), Twitter data analysis (Loureiro and Alló 2020) and studies on individual activists (Bashar et al. 2021) Jung et al. 2020), to gain a comprehensive understanding of public discourse and perceptions. A study on sustainability focused on the acceptance and viability of wind energy technology in Germany, analysing four prominent national newspapers over a span of nine years to identify the challenges faced by wind power (Dehler-Holland et al. 2022). Researchers have leveraged Twitter data to explore global perspectives on sustainability and climate change, offering invaluable insights into discussions taking place worldwide (Pilar et al. 2019) (Arce-García et al. 2023). Researchers have used hashtags for various purposes such as using hashtag networks to perform behavioural analysis towards green energy, climate change and carbon emissions, understanding how the discussion is being shaped around low emissions through various global climate actions (Debnath et al. 2022). Twitter-based studies have also delved into the discourse surrounding prominent activists such as Greta Thunberg, a young environmental activist (Bashar et al. 2021) (Jung et al. 2020). The findings highlight the existence of divergent opinions on social media. Moreover, these studies acknowledge celebrities’ influential role in amplifying such activism’s impact.
While limited in number, studies analysing Twitter data specifically related to hydrogen energy are emerging. A study (Ibar-Alonso et al. 2022) examined opinions extracted from Twitter data using the keyword green energy during the initial phase of the Ukrainian-Russia conflict, spanning from 16 February 2022 to 3 March 2022. This study has certain limitations: the analysis is restricted to a relatively short period of less than 20 days and is focused solely on one English keyword.
Distinct from prior works, we conduct an in-depth study of hydrogen energy discussions on Twitter. We introduce a data-driven multi-step method to comprehend the discourse surrounding a topic on social media.

3 Methodology

Figure 1 illustrates the methodology to collect and analyse a multilingual Twitter dataset, gathered using keywords of four languages English, Japanese, Korean, and Hindi, to understand people’s perception. The Twitter data used for this analysis include text content of tweets, hashtags, mentions, tweets’ and users’ public metrics, and tweets’ and users’ locations. The multi-step analysis process involves identifying relevant keywords, collecting, and preprocessing the data, and conducting various types of analysis. These analyses include language-specific analysis, spatiotemporal analysis, tweet engagement analysis, user interaction analysis, and analysis of hashtags and ngrams. The subsequent sections explain the analysis process in detail.

3.1 Keyword identification

Table 1 shows the list of keywords from multiple languages that are used to extract the data from Twitter. These keywords are selected to comprehend public discourse on hydrogen energy by delving into the viewpoints and global perspectives. To ensure comprehensive coverage, specific keywords were selected in English and Japanese, as these two languages encompass most of such tweets. According to the study (Moernaut et al. 2022) analysing 118 billion tweets between 2009 and 2019, English is the primary language on Twitter, representing 35.6% of the tweets, followed by Japanese with 17.8% of the tweets.
Table 1
Multilingual keyword list for twitter data retrieval
https://static-content.springer.com/image/art%3A10.1007%2Fs13278-023-01194-6/MediaObjects/13278_2023_1194_Tab1_HTML.png
Apart from having the second highest number of Japanese tweets on Twitter, it tops the competitiveness of hydrogen technologies since 2011, as per Astamuse’s ranking. This is attributed to Japan’s superior fuel cell patents, powering hydrogen-led applications in factories, homes, and automobiles (Alshaabi et al. 2021). Japan’s distinction as the first nation to create a national hydrogen strategy in 2017 (Japan: First nation to form national hydrogen strategy 2017) justifies its inclusion in our study. However, the rapid advancements of China in hydrogen-related technology pose a challenge, but we could not include Chinese keywords due to Twitter’s unavailability in China.
South Korea’s significant contributions to hydrogen technology, including a third of the world’s installed utility-scale fuel cells and pioneering commercial fuel cell vehicles, highlight its commitment to hydrogen advancements. Including the Korean language in our study is essential for a comprehensive analysis (The hydrogen economy South Korea: Market intelligence report2021).
We have also included Hindi in our study. India has the third-highest number of Twitter users (23.6 million) (Leading countries based on number of twitter users as of January 2022), and Hindi is the most spoken language in India, with over 584 million speakers. Additionally, India is a major player in the global hydrogen energy market with the Indian government launching the National Green Hydrogen Mission (National green hydrogen mission 2020) and Indian companies heavily investing in hydrogen energy (A fully integrated renewable energy ecosystem by reliance. 2023).

3.2 Twitter data collection

We utilized the Academic and Research Twitter API v2 (Twitter api v2. 2023) to download about 30 million (30,758,600) tweets from January 2013 to December 2022, using keywords listed in Table 1. This ten-year period provides a comprehensive dataset to analyse hydrogen-energy-related discussions, capturing key milestones and the evolving landscape of the topic. Including data from 2013 (the launch year of the world’s first commercial fuel cell vehicle (The hydrogen economy South Korea: Market intelligence report 2021) allows us to track the early stages of hydrogen technology development while extending the analysis until 2022 ensures our study reflects up-to-date information and recent advancements.

3.3 Data preprocessing

The downloaded Twitter data was processed using a combination of Python’s inbuilt standard library and user-defined functions to remove the noise and convert it into a representation suitable for analysis. The following preprocessing steps were performed.
1.
Data extraction and mapping of tweet metadata—The downloaded response from Twitter API (Twitter api v2. 2023) contains an includes object that holds metadata such as an array of user objects representing mentioned or referenced users and their detailed information of profiles and follower counts. Additionally, if the tweets contain media attachments, this object will contain an array of media objects with metadata such as media type, URLs, and dimensions. We have extracted and mapped this information to each corresponding tweet.
 
2.
Removing duplicate id and duplicate text instances—We used various combinations of hydrogen-related keywords to collect the data, hence there is a possibility that one tweet may have been downloaded multiple times. To avoid redundancy in the data, tweets with identical tweet ids were retained only once, and duplicate tweets with identical text content were removed. UICount (the counts of tweets with unique ids) and UTCount (the counts of tweets with unique text) become 21,811,047 and 9,947,254, respectively.
 
3.
Omitting usernames and hashtag symbols—Usernames starting with @ and the hashtag symbol # (but retaining the remaining words intact) were removed from the text as they do not contribute meaningfully to the analysis.
 
4.
Eliminating hyperlinks from tweets—To streamline the analysis process, hyperlinks or URLs were eliminated from the tweets.
 
5.
Sanitizing tweets by removing special characters and trimming extra spaces—Special characters such as ”!$%)(}{][> < ?’*&:,̃̂ + _-̃|/\@amp and leading, trailing, and interstitial white spaces were eliminated from the tweets. The goal is to simplify tokenization, reduce noise, and enable efficient feature extraction, potentially leading to meaningful outcomes.
 
6.
Converting tweets to lowercase format—To ensure uniformity and consistency in the text, all the tweets were converted to lowercase format. This conversion does not apply to Japanese, Korean, and Hindi content as these languages do not have these variations as in English.
 
7.
Stopword Removal and Tokenization—To enhance the quality of text for ngram analysis and minimize unnecessary noise, stopwords were eliminated from the tweets using the stopwords list from the NLTK library for English and a list of customized stopwords (Stopwords collections. 2022) for other languages. After conducting the ngram analysis, we compiled an additional list of stopwords to refine the analysis. Tokenization involves splitting text into individual tokens that help in performing ngram analysis. While tokenizing English, Korean, and Hindi text is as simple as splitting with spaces, Japanese poses a unique challenge due to the absence of spaces in the same sense as other languages. We employed the MeCab Python library (Mecab: Text segmentation library for Japanese text. 2007), specifically designed for Japanese text segmentation and tokenization.
 

3.4 Language identification

The downloaded Twitter data includes a lang attribute indicating the tweet’s language code identified by Twitter. By utilizing keywords from four languages, we increase the likelihood of capturing tweets in those languages as well as others. A tweet in any language with English hashtags can be identified even without specific language keywords. For example, the Hindi tweet भारत में आई पहली हाइड्रोजन कार, परिवहन मंत्री ने की संसद तक की सवारी (i.e. India got its first hydrogen car, transport minister rides till Parliament) contains several English hashtags (#hydrogen #hydrogencar #hydrogenfuel #hydrogenwater #NitinGadkari #toyotahydrogencar #toyotacars #hydrogenenergy). To analyse language distribution in the unique id dataset, we examined two approaches: (1) tweets downloaded using only English keywords, and (2) tweets downloaded using keywords from all four languages (English, Japanese, Korean, and Hindi).

3.5 Location identification and geographic plotting

In order to perform spatial analysis, it is essential to identify tweet locations and plot them geographically. The spatial analysis involves examining the geographic locations of unique id tweets and associated metadata to understand various phenomena, including events and trends that occur in specific locations (Shah and Dunn 2019). By combining spatial locations with a temporal dimension, we conduct spatiotemporal analysis.
The Twitter API offers two different methods to obtain user locations.
1.
Tweet Location: A tweet can disclose its location in two ways: (1) by providing Point coordinates (latitude/longitude) from GPS-enabled devices and (2) by indicating a Twitter Place, defined by a boundary of four coordinates, which may include additional information such as city and country. Notably, Twitter Places are only attached to original tweets, not to retweets (Twitter locations 2023).
 
2.
Profile Location: Users sometimes provide their location information in their profiles, but it may not necessarily reflect real-time tweet location. This profile location is a general indication or connection to a particular location.
 
Using the GeoPy Python library (Twitter locations. 2023; Python geopy library 2023), we mapped most of the user locations from their profiles to corresponding coordinates (latitude/longitude). However, the presence of a value does not guarantee accurate location information, as some values may be noisy or invalid. In cases of invalid locations, the GeoPy library either provides an empty value or occasionally maps to a false location with valid coordinates.
Since only a minimal percentage (0.58%) of tweets contain exact coordinates (i.e. GPS-enabled location), we expanded our coverage by leveraging profile locations. Approximately 61.58% of tweets had some form of value in their profile location field. The GeoPy Python library identified accurate GPS coordinates for approximately 42% (9,189,714) tweets. Notably, most of the tweets’ location is expressed as a country name in their profile and GeoPy provides the approximate centre of the country as a default coordinate. For example, the coordinates 36.57,139.24 correspond to Kuroganecho for Japan, -24.78, 134.75 correspond to Ghan NT for Australia, 22.35,78.67 correspond to Madhya Pradesh for India, and 39.78, -100.44 correspond to Oberlin for the USA.
Utilizing geospatial coordinates and their corresponding frequencies, we utilized the Geopandas Python library (Geopandas python library 2023) along with Matplotlib to generate visual representations of geospatial data on world maps for each year. A world map is created using scatter plots, where a marker represents each data point on a two-dimensional coordinate system (i.e. longitude and latitude of a location).

3.6 Deriving tweet-specific metrics

The tweet object retrieved from the Twitter API includes essential metadata about individual tweets, such as tweet id, creation date, text content, author information, and associated public metrics (e.g. impression count, like count, retweet count, replies, and quote count). The impression count denotes the number of times the tweet has been viewed; the like count indicates the number of users who liked the tweet; the retweet count represents the number of times the tweet has been retweeted; the reply count signifies the number of replies received by the tweet, and the quote count indicates the number of times the tweet has been quoted (i.e. retweeted with a comment). These public metrics are particularly interesting as they provide insights into the reach, engagement, and popularity of tweets within the Twittersphere, shedding light on how well tweets resonate with the Twitter audience. We extracted tweet-level public metrics for each original tweet in our dataset of unique id and unique text tweets.

3.7 Deriving user-specific metrics

The user object retrieved from the Twitter API provides valuable metadata about user accounts, including user id, account creation date, user location, user description, verification status, and public metrics like follower count, following count, tweet count, and listed count since the creation of the account. We are particularly interested in the verification status and public metrics, which offer insights into user behaviour and influence on Twitter (2023). To facilitate user-specific analysis, we eliminated duplicate users from both the unique id and unique text tweets and examined the publicly available metrics associated with each user. This analysis involved categorizing users based on different ranges of these metrics to a bin. For example, the bin [0,500] for follower count includes users with followers below 500, including those with no followers. The bin ranges begin from zero and go up to the maximum number observed for each criterion, with the highest follower count recorded in our dataset being 133,996,064.

3.8 Hashtag and mention extraction from tweets

Given the structure of Twitter’s API v2, hashtags and mentions are not only found within the tweet content, but are also stored in the same entities field under the hashtags attribute and the mentions attribute of a tweet (Twitter api v2. 2023). Considering their shared placement within the tweet object, we processed both hashtags and mentions simultaneously. We processed the unique id tweets dataset by dividing it into yearly segments. We identified the most frequently occurring hashtags and mentions for each year and visualized them on a heat map. We extracted the top 200 hashtags and mentions to gain further insights and visualized them using a word map. Non-English hashtags were translated into English equivalents, distinguished with the suffix _x.
Hashtag analysis reveals trending topics and themes, reflecting the dynamics of public discourse. Mention analysis identifies influential users or organizations and their relationships, offering insights into user engagement and active participation in discussions. The combined analysis of hashtags and mentions provides an understanding of their prominence and occurrence over time.

3.9 Ngram extraction from tweets

In our dataset, tweets with a total of 72 languages appear. Due to the diverse preprocessing techniques required for each language, it becomes challenging to preprocess all of them. In this analysis, we concentrate on four languages, English, Japanese, Korean, and Hindi, and process these tweets to extract unigrams and bigrams. Afterwards, we identified the top 20 most commonly occurring bigrams for each year and presented them on a heatmap. We translated non-English unigrams and bigrams to English and distinguished them with the suffix _x for visualization purposes. In the bigram heatmap, each row and column represent an individual bigram and a year, respectively. The frequencies are normalized row-wise, allowing us to assess each bigram uniformly.
We also extracted the top 200 unigrams and bigrams using separate word maps. For unigrams, the term hydrogen (English only) appeared prominently and was removed to improve the visibility of other words. For bigrams, we observed noise from symbols or characters (▽,σ,♨,♀,☀), especially due to the inclusion of Japanese text. Thus, we removed these bigrams from the visualization. Additionally, three specific bigrams (水素 水 (hydrogen water), フッ化 水素 (hydrogen fluoride), and 水素 音 (hydrogen sound)) dominated the frequencies and were removed.

4 Results and discussion: discourse surrounding hydrogen energy on twitter

We present findings on the spatiotemporal evolution of discussions, identifying significant surges in user engagement and understanding the variations in participation across different regions or countries. We also present the relationships obtained between tweet metrics and overall engagement, as well as user engagement patterns and indicators of influential users. Additionally, we present results of the hashtag and ngram analysis that uncover evolving trends related to hydrogen energy discussions on Twitter.

4.1 Language representation analysis results

The analysis of the lang attribute on unique id and unique text tweets reveals that the dataset encompasses a wide range of 72 language codes, including a small proportion that do not correspond to actual spoken languages. These language codes,2 namely qam, qct, qht, qme, qst and zxx, signify the tweets that contain only mentions, cashtags, hashtags, media links, very short texts, and media or Twitter card, respectively.
To assess the impact of including keywords from different languages on tweet volumes, we analysed the distribution of tweets per language downloaded with English language keywords only. As shown in Fig. 2, most tweets were posted in English and Japanese in both cases. As seen in Fig. 2a, based on the data collected using English-only keywords, approximately 83.5% of the tweets were in English. Japanese accounted for the second largest portion, comprising 5–6% of the tweets, and other languages like Korean, German, French, and Spanish made up around 4%. Hindi ranked 14th with 0.19% of tweets. As seen in Fig. 2b, using the unique id tweets gathered using keywords from four languages, the top five most expressed languages are Japanese (55.24%), English (37.99%), Korean (2.31%), German (0.60%), and Hindi (0.38%), accounting for 96.53% of hydrogen-related tweets. However, when considering the unique text tweets, English becomes the top language with 47.94% tweets, followed by 43.87% Japanese tweets.
Discussion: It is interesting to note that using language-specific keywords for data extraction does not necessarily limit obtaining the tweets to the same language, as code-mixing can occur where keywords are used alongside words from different languages (Rijhwani et al. 2017). While the data collected using English-only keywords contains predominantly English tweets, Japanese emerged as the predominant language when using keywords in four languages. A closer examination of tweets revealed that some Japanese tweets with different ids shared the same content, potentially indicating the usage of bots for campaigns, advertisements, propaganda, or other purposes (Keller and Klinger 2019). Acknowledging the potential bias introduced into the data by repeated content is important. The lower number of tweets in Hindi may be attributed to a significant portion of Indian tweets being posted in English and other Indian languages (Indian tweets in English 2019). Furthermore, unexpected numbers of German, French, and Spanish tweets warrant further investigation.
Relying solely on English keywords limits the representation of non-English perspectives and fails to capture the nuances specific to different language groups fully. Using multiple language keywords ensures a more inclusive dataset for diverse and global insights on hydrogen energy.

4.2 Spatiotemporal analysis

This analysis presents spatial and temporal patterns in tweets and associated metadata. The outcome illustrates the varying levels of tweet activity over time and regions and offers valuable insights into the dynamics of online conversations.
Temporal analysis: Figure 3 depicts the count of tweets and media content over a ten-year period. The heat maps of hashtags, mentions and bigrams, shown in Figures 7 and 9, further reveal how the discussions related to hydrogen and related topics have evolved over time.
Paris Agreement (COP21, December 2015) likely increased global environmental awareness (Paris agreement 2015) and its impact through increased Twitter activities can be seen in Fig. 3. The tweet count steadily increased over time, with a notable spike in May 2016. This spike was primarily driven by Japanese tweets (94.28%), while English tweets accounted for only 4.52%. During this period, the most frequently used hashtags included #Yahooニュース (#Yahoonews), # 水素 (#hydrogen), #niconews, #hydrogenwater, and #fuelcell, among others. This suggests widespread sharing of Yahoo News articles and topics particularly emphasizing hydrogen-related discussions, such as hydrogen water and fuel cells. Several spikes occurred in September 2017, April 2018, September 2018, July 2019, and April 2021. Most tweets were in Japanese, accounting for 34.79, 78%,
58.54, 96.49, and 42.87% respectively. English tweets constituted 54.08, 16.59, 35.29, 3.19, and 53.01% during the same period. Tweets in other languages were minimal, representing only 11.12, 5.37, 6.15, 0.32, and 4.11%, respectively.
In September 2017, the trending hashtags included #hydrogen, #northkorea, #hydrogenbomb, and #fuelcell. Global concerns and discussions on Twitter were sparked by North Korea’s missile tests and their claim of a successful hydrogen bomb test (North Korean nuclear test 2017). Likewise, in April 2018, popular hashtags included #hydrogen, #fuelcell, #水素の音 (the sound of hydrogen), #超会議2018 (#Chokaigi2018) and #超会議コスプレ (#super conference cosplay). In September 2018, hashtags such as #水素 (#hydrogen), #fuelcell, #germany, #energy, #environment, #水素水 (#hydrogen water), #transport, #missionh24, #健康 (#health), #fuelcells, #renewableenergy, and #hydrogentrain were prominent. This indicates a growing interest and discussion surrounding hydrogen energy and related topics. The launch of the world’s first hydrogen-powered train (Coradia iLint) in Germany during that time, built by French TGV-maker Alstom, further highlights the advancements and recognition of hydrogen as a viable and sustainable energy solution (Germany launched world’s first hydrogen-powered train 2018). In July 2019, hashtags such as #fuelcell, #climatechange, #zeroemission, #greenhydrogen appeared. In April 2021, several hashtags related to hydrogen, environmental initiatives, and collaborations between Hyundai and BTS were trending on social media. The popularity of the #hydrogen hashtag could be attributed to increased awareness and discussions around hydrogen as a clean and sustainable energy source. The #hyundaixbts, #hyundai, #bts, #nexo, #btsxhyundai, #nexoxbts, and #ioniqxbts hashtags likely gained traction due to a collaboration or promotional campaign between #Hyundai, a car manufacturer, and BTS, a popular South Korean music group. Additionally, hashtags like #earthday, #fortomorrow, #wewontwait, and #energy were likely trending in connection with Earth Day and discussions about renewable energy and sustainability. The presence of the #トヨタ (#toyota), #greenhydrogen, and #fuelcell suggests Toyota’s involvement.
We also identify and show the count of media files (e.g. images, animated gifs, or videos) posted by users in Fig. 3. To simplify representation, tweets with multiple media files, like images, videos, and animated gifs, are counted as distinct tweets. On average, tweets with media files typically have around 1.41 images, 1.3 videos, and 1.07 animated gifs per tweet. However, when considering all the unique id tweets, the average number of media files decreases significantly to 0.34 images, 0.04 videos, and 0.006 animated gifs per tweet.
Discussion: The temporal analysis (RP1) provided information or insights into how the public discourse on hydrogen energy evolved over recent years, and when did this topic witness significant surges in user engagement. Analysing this data reveals patterns, trends, and dynamics of online conversations over time, including peak periods of engagement and significant event detection. The tweet count steadily increased over time with several spikes primarily driven by Japanese tweets. Many of these spikes were caused by major hydrogen energy-related events and were identified by trending hashtags. While certain trending hashtags indicated relevant content, others did not, prompting a deeper examination of content relevance and irrelevance. This analysis also revealed a significant presence of the media files in tweets. Studying the images along with the accompanying text could provide a more comprehensive understanding of the content being shared and discussed on the platform.
Spatial or geospatial analysis: Figure 4 shows the top 10 countries making up 80% tweets. The tweet distribution across top countries is about 33% Japan, 19.3% USA, 9.4% UK, 3.9% India, 3.6% Canada, 3.5% China and 2.9% Australia. The leadership in Hydrogen energy of these top countries is evident through their partnerships and initiatives such as the International Partnership for Hydrogen and Fuel Cells in the Economy (2003), the Hydrogen Council (2017), Hydrogen Europe (2000), and Twitter activities. Note that these countries have a sizable active Twitter user base (January 2022) (Leading countries based on number of twitter users as of January 2022). Surprisingly, despite Korean keywords and presence in the top five languages (Fig. 2(a)), South Korea ranks 11th. This suggests that Japanese and English-speaking communities are more active in hydrogen energy discussions on Twitter compared to the Korean-speaking community.
Global engagement of various nations in hydrogen energy is evident from the world map in Fig. 5. Japan’s highest tweet count (74,639) in 2019 serves as the (maximum) reference for the right-side colourbar. Japan’s sustained dominance from 2013–2022 is noticeable due to its clean energy commitment, pioneering hydrogen strategy in 2017 (Japan: First nation to form national hydrogen strategy 2017), and technology leadership evident by its numerous fuel cell patents (Alshaabi et al. 2021). Including the Japanese language in our dataset may also introduce bias towards Japan.
Figure 5 also reveals that the USA and the UK show increasing tweet activity over time. The USA’s high tweet counts stem from English language inclusion in our dataset and its prominent global hydrogen energy leadership. The USA has an ambitious hydrogen production goal: 10 million metric tons (MMT) of clean hydrogen by 2030, 20 MMT by 2040, and 50 MMT by 2050 (Us national clean hydrogen strategy and roadmap 2021). The USA drives adoption through initiatives like H2USA (H2usa 2013) and California Fuel Cell Partnership (California fuel cell partnership (cafcp) 2013), emphasizing hydrogen technology advancement and commercialization. Similarly, the UK’s global leadership is displayed through government plans like Ten Point Plan (The ten point plan for a green industrial revolution 2020), Hydrogen Strategy for Scotland (Hydrogen strategy for Scotland 2022) and the UK Hydrogen Strategy (Uk hydrogen strategy 2021), promoting hydrogen’s role in sustainable energy vision.
Canada’s notable presence is evident with its 2020 hydrogen strategy (Hydrogen strategy for Canada 2020), aiming for low-carbon hydrogen using abundant natural resources. Canada’s participation in international collaborations such as IPHE (2003) reflects its proactive approach. Europe’s prominence is shown by EU’s Hydrogen Strategy (2020), REPowerEU (2022) plan, targeting 6 GW of hydrogen electrolysers by 2024, 40 GW by 2030, emphasizing cross-border collaboration and international partnerships. Countries like Germany (The national hydrogen strategy—Germany. 2020), the Netherlands (The national hydrogen strategy—Netherlands. 2020), Denmark (The government’s strategy for power-to-x—Denmark. 2021) and Norway (The Norwegian government’s hydrogen strategy 2020) are also actively investing in hydrogen energy.
India and Australia’s growing interest in hydrogen energy can be seen on the world map of tweets. Minimal tweet activities in 2013, but a notable rise from 2016, reflect increased involvement in hydrogen-related discussions. This trend aligns with events such as the Paris Agreement signing in 2015 (Paris agreement 2015) pushing clean energy solutions. Australia’s 2019 National Hydrogen Energy Strategy (Australia’s national hydrogen strategy 2019) spurred interest and increased Twitter activities (Fig. 5g–j). Similarly, India’s 2020 National Green Hydrogen Mission (2020; Iea 2019) boosted Twitter presence (Fig. 5h–j). Figure 5d–j show rising activities, particularly in Japan and the USA (greenish dots). The trend intensified in 2019, with the United Kingdom, India, Australia, and various European countries showing increased engagement (greenish dots and larger blue dots).
Discussion: The spatial analysis (RP2) provided information on the extent to which different regions or countries contribute to this topic, and what factors explain the variations in their levels of participation. The spatial analysis of hydrogen-related tweets reveals that ten countries dominate the conversation on Twitter, accounting for roughly 80% of tweets. Japan emerges as the most active country followed by the USA, the United Kingdom, India, Canada, China, and Australia. This indicates the prominence of Japanese and English-speaking communities in hydrogen energy discussions on Twitter, with a noticeable contrast in activity levels for the Korean-speaking community, even using Korean keywords for data collection. While acknowledging the bias towards English and Japanese-speaking nations due to keyword choices, this bias does not extend to Korean and Hindi. Despite India ranking fourth, the contribution of Hindi tweets, India’s most widely spoken language, is notably low in the dataset. Moreover, Korea does not appear in the top ten, and the Korean language exhibits minimal contribution to language-specific analysis, suggesting the potential use of alternative social media platforms for such discussions.

4.3 Decoding online interactions: user mentions and engagement analysis

To perform the tweets and user-related engagement analysis, various public metrics from the data are discussed in this section to understand the impact, engagement, and interaction levels of tweets.
Metrics impacting twitter engagement: It is notable that a large portion of unique id dataset is original tweets: 44.7% (9,750,520) were original, 41.24% (8,994,728) retweets, 12.5% (2,718,330) replies, and 1.6% (349,282) quoted content. In unique text subset, 58.9% (5,859,413) were original, 10.9% (1,082,852) retweets, 26.8% (2,663,529) replies, and 3.45% (343,190) quoted content.
We analyse only original tweets for insights into unique content. Examination of public metrics of original tweets reveals that each tweet garnered an average of 499 impressions, 173.5 likes, 39 retweets, 20 replies and 7.25 quotes, indicating its potential reach. Most tweets received few responses: 99.45% had ≤ 10 impressions, 94.68% ≤ 10 likes, 97.13% ≤ 10 retweets, 98.96% ≤ 10 quotes, and 98.24% ≤ 10 replies. This follows the power law, where few tweets have high engagement, and the majority have minimal (Chen and Nayak 2011).
User engagement analysis: User engagement analysis in Fig. 6 provides insights into public user metrics like follower count, following count, tweet count, and listed count, revealing diversity in participants. Most users (94.4% for unique id and 92.5% for unique text) have < 5000 followers, while a small fraction (0.07% for unique id and 0.12% for unique text) have > 1 million (Fig. 6a), following the power law distribution. These users with a high number of followers, often referred to as influencers, tend to play a significant role in shaping discussions and exerting influence within the social media landscape.
When examining the following counts, 2.2% of unique id and 2.6% of unique text tweet users follow ≥ 5000 accounts, while the remaining users follow fewer accounts as shown in Fig. 6b. Analysing tweet counts, 48%-50% have posted < 10,000 tweets, while the remaining users tweet aggressively (10,000 to 115 million tweets) as shown in Fig. 6c. Although higher tweet counts may be expected for long-term active users, a count of 115 million tweets is still exceptionally high for any account.
Further analysis of the top 10 user accounts with the highest tweet counts reveals that all of them are Japanese accounts. Some have been active since as early as 2009, while others joined Twitter more recently in 2019–2020. These accounts primarily belong to various organizations, such as retail stores or food chains, where they identify themselves as official accounts for user interaction or promotional campaigns. Few (0.03%-0.05% unique id and unique text) users have > 1 million tweets, rising to 7.43% and 8.09% with counts > 0.1 million.
Listed counts showcase influence, reputation, and reach. Only 0.06% of unique id and 0.12% of unique text users have > 5000 listed counts as shown in Fig. 6d. Distribution of verified and non-verified users reveals that only 1.4% (64,906) users were verified, while a significant 98.6% (4,574,333) were non-verified among unique id users. Similarly, for unique text users, 2.06% (55,406) were verified users and 97.9% (2,632,874) were non-verified users. Verification status does not solely indicate the credibility or quality of the content posted by users. Verification is primarily a means for Twitter to confirm the identity of accounts that are of public interest.
Mention analysis: Another useful aspect of Twitter data analysis is exploring the mention network as it reveals the frequency of a user being mentioned by others in their tweets and reflects the user’s ability to engage in conversations (Cha et al. 2010). Having covered follower and tweet statistics, our focus now shifts to user mentions, offering significant insights into key influencers within the hydrogen energy sector.
Figure 7a presents the mention heat map, showcasing the top 10 mentioned users each year, resulting in a total of 83 mentions over the span of 10 years. Mentions vary in frequencies, for example, hyundai_global being the most used mentions with a frequency of 207,028 and bts_jp_official appearing in top mentions with a frequency of 23,714 in 2020. The heat map displays equal strength for both due to row-wise normalization. This ensures each word’s independent analysis, preventing less frequent mentions from disappearing, despite high positioning on the list each year.
Additionally, Fig. 8a depicts the top 200 mentioned users during the same period. Notably, several users consistently appeared recently and ranked among the top 200 mentioned users of all time across various categories. These categories include industries and industrialists (elonmusk, youtube, hyundai_global, hyundai_japan, toyota, mliebreich, cafcp, uberfacts), politicians (whitehouse, nitin_gadkari, narendramodi), news and information portals (fuelcellsworks, cnn, h2_view, livedoornews), science and scientists (gatapi21, aldiallaboratoy), artists (bts_twt, kugatsu_main) and economists (daitojimari).
Discussion: Metrics Impacting Twitter Engagement (RP3) provided information on how public metrics of tweets, such as impression count, like count, retweet count, replies, and quote count, relate to the overall engagement and reach of the tweets on the topic. The presence of a high number of original tweets reflects diverse opinions, perspectives, and information. Twitter engagement analysis of the tweets shows that only a small fraction received high engagement while the majority garnered minimal attention. This indicates the importance of careful crafting of tweets for meaningful engagement.
Discussion: The User Engagement Analysis (RP4.) provided insights into how public metrics of users, such as followers, following, tweet count, listed count, mention count and other relevant indicators, vary among user discussions. It helped us to identify patterns or indicators of influential users based on these metrics. On analysing user engagement, we notice that only about 0.1% of users engaged in conversations have a substantial number of followers, suggesting the potential influence of only a limited number of users within the network. Additionally, most users follow very few accounts, while less than 3% of users follow a sizeable number of accounts (up to 5000). These users following a large number of accounts may have broad interests, influence, engage in extensive networking, or potentially be associated with a bot or automated accounts, but further analysis is required for definitive conclusions. Notably, less than 0.05% of users exhibit aggressive tweeting behaviour, potentially indicating only a small group contributes to tweet volume, potentially power users, influencers, or bots. In an examination of the top 10 user profiles with the highest lifetime tweet counts, it is observed that these users are Japanese accounts associated with retail stores or food chains, underscoring the notable influence of Japanese users. Notably, their prolific Twitter activity extends beyond the current discourse, encompassing user interactions, customer service, and promotional campaigns across diverse topics throughout their online presence. Further detailed analysis is needed to explore the specific content posted by these users that led to their presence in hydrogen-related discussions. Interestingly, most users in this dataset have lower visibility and recognition, as reflected by the listed count. Further scrutiny of tweet mentions identifies top users falling into categories such as industries, industrialists, politicians, news portals, scientists, artists, and economists.

4.4 Unveiling digital language trends: hashtag and NGram exploration

As discussed in Sect. 3, we employed multiple subsets of data for this analysis: (1) unique id tweets for hashtag analysis, (2) a subset of unique text tweets containing English, Japanese, Korean, and Hindi tweets for ngram (unigram and bigram) analysis, and (3) an English-only subset of unique text tweets for additional ngram (unigram and bigram) analysis.
The heatmap focuses on showing the top 15 hashtags per year (Fig. 7b), but there are a total of 78 hashtags over the decade, indicating a dynamic and diverse topic landscape. Despite selecting 10 mentions per year, the overall count of hashtags and mentions has remained the same over the decade. For the top 20 bigrams in English-only and all four languages tweets (Fig. 9), we see 83 and 88 bigrams, respectively. This may be because hashtags and bigrams are often reused over time, unlike user mentions associated with a specific hashtag globally.
Non-English words are translated into English in the visualization for clarity. In Figs. 7) and 9, each row represents a word’s yearly evolution. Despite varying frequencies, e.g. #hydrogen (the most used hashtag with a frequency of 267,728 in 2021) and #水素 (#hydrogen) (a Japanese hashtag with a frequency of 12,882 in 2021), both display similar strength in the heat map as shown in Fig. 7b. Similarly, the bigram hydrogen fuel appeared in English tweets in 2022 with a frequency of 45,632, while fossil fuel appeared 9,055 times in the same year as shown in Fig. 9b. The row-wise normalization allows us to analyse each word independently.
Some main findings are as follows:
1.
Popular hashtags such as #mycleannature, #cleanenergy, #netzero, #energytransition, #renewableenergy, #hydrogennow, #hydrogennews, #hydrogeneconomy, #greenhydrogen, #hyundai, and #fuelcell (Figs. 7b and 8b) indicate climate change discussions, efforts towards net-zero emissions and energy transition via renewable and green hydrogen sources.
The bigrams heat map (Fig. 9) and word clouds for top 200 unigrams (Fig. 10) and bigrams (Fig. 11) reveal a trend in recent years. English only tweets bigrams such as hydrogen powered, climate change, fuel cell, hydrogen production, toyota mirai, green hydrogen, renewable energy, and hydrogen cars (Fig. 9b) highlight the evolving conversation and the focus on various aspects of hydrogen energy and its implications for addressing climate change and sustainable energy solutions. Upon analysing tweets in the dataset of English, Japanese, Hindi, and Korean, bigrams such as 自動車 (car), 燃料 電池 (fuel cell), hydrogen fuel, ニッケル 水素 (nickel hydrogen), 水素ステーション (hydrogen station), 水素爆発 (hydrogen explosion), hydrogen production, 水素 エンジン (hydrogen engine), and 水素 電池 (hydrogen battery) (Fig. 9a) can be seen.
Language-specific patterns are crucial for grasping nuances in cross-lingual discussions. Researchers and policymakers can comprehensively understand discussed topics and develop targeted strategies for effective communication and policy development within specific language communities.
 
2.
Certain hashtags and bigrams exhibit consistent and increasing usage over time. Hashtags such as #fuelcell, #greenhydrogen, #hydrogen, #renewableenergy, #hydrogennow, #h2, #cleanenergy, #solar, #netzero, and #hydrogeneconomy reflects a sustained emphasis on achieving a net-zero economy through the utilization of green hydrogen. Similarly, bigrams such as fuel cells, hydrogen energy, natural gas, hydrogen cars, hydrogen economy, 水素 製造(hydrogen production), renewable hydrogen, hydrogen fuel, green hydrogen, hydrogen powered, fossil fuel, climate change, and cell vehicles reflects a sustained focus on hydrogen energy’s diverse aspects: production, fuel cells, vehicles, climate change and transitioning from fossil fuels.
 
3.
Although not visible in heat maps based on top 15 or top 20 (Figs. 7b and 9), there exist several highly utilized hashtags and bigrams present in the top 200 list, as shown in Figs. 8b, 11a, and b. Hashtags such as #Japan, #Australia, #Russia, #China, #Germany, #UK, and #Europe signifies the global interest and involvement of these countries in hydrogen energy initiatives. Hashtags like #solarenergy, #energytransition, and #decarbonization highlight the broader context of transitioning to a sustainable and low-carbon economy with hydrogen technologies. Additionally, hashtags like #EV, #transportation, and #electricvehicles underscore the integration of hydrogen in the transportation sector. Relevant bigrams such as energy storage, carbon capture, and hydrogen supply further emphasize hydrogen as an energy carrier and its potential applications in addressing climate change and achieving zero emissions.
 
4.
Initially, it may seem that hashtags like #bts, #jungkook, #becauseofyou, and #wewontwait were noise as they are related to the Korean boy band BTS and one of its members Jungkook appear due to using the keyword hydrogen in some of their popular songs. However, upon closer analysis, these hashtags were found to be related to hydrogen energy discussions. More specifically, hashtags such as #hyundaixbts, #btsxhyundai, #nexoxbts, #ioniqxbts, #worldenvironmentday, #sustainability, #fortomorrow, #cleanmobility #earthday, and #earthday2020 were related to the Earth Day celebration by Hyundai Motor in 2020 with BTS in a new global hydrogen campaign film to promote their commitment towards a sustainable future.
 
5.
The hashtag #auspol has been widely used to discuss Australian politics for over a decade. A study (McKinnon et al. 2016) found that climate change was one of the most important science-based topics during the 2013 elections, possibly explaining its prominent appearance in the word cloud (Fig. 8b).
 
6.
As reported in Sect. 3.4, the unique Japanese tweets greatly outnumber English, Korean and Hindi tweets. Despite this, English tweets have more hashtags, seen in Fig. 7b. This can be attributed to lower hashtags usage in Japanese tweets and fewer Korean tweets. Moreover, Hindi tweets often use English hashtags, boosting their count.
 

4.5 Discussion

The Hashtags and Bigrams Analysis Deciphering Twitter Conversations (RP5) provided insights into deriving the evolving trends and emerging topics related to hydrogen energy. The analysis of hashtags, unigrams, and bigrams on Twitter data unveils the evolving trends in climate-related discussions spanning a decade. Notably, predominant ngrams centre around discussions on transportation systems and associated technologies aimed at achieving net-zero targets. Key topics include fuel cells, hydrogen stations, hydrogen batteries, and hydrogen-powered cars. The majority of hashtags and biagrams appear in English and Japanese language. The limited presence of Korean and Hindi tweets, as revealed in language-specific analysis, hinders their representation in top hashtags and bigrams. This may be attributed to a lack of awareness among Korean and Hindi speakers regarding hydrogen energy or using alternative platforms or languages for such discussions. This trend is exemplified in India, where 50% of tweets are in English, and the rest are in various Indian languages.

5 Conclusion and future work

In this study, we conducted an in-depth analysis of public discourse on hydrogen energy using Twitter data, offering unique insights. Our findings unveiled the evolving nature of public discourse on hydrogen energy, revealing trends, patterns, and variations in discussions across time and regions. Notably, user engagement surged at certain periods, emphasizing the significance of temporal dynamics. Spatial analysis highlighted differing participation levels across regions, suggesting the need for tailored approaches based on geography.
Language-wise, Japanese and English dominated, comprising 91% of the dataset collected with four language keywords. Solely using English keywords resulted in 83.5% English tweets. Incorporating diverse languages is vital for comprehensive insights as relying solely on English keywords could limit diverse perspectives, narrowing representation. Engagement metrics analysis highlighted factors driving interactions and reach such as impressions, likes, retweets, replies, and quotes. User engagement analysis uncovered discourse variations based on user traits and engagement levels. Identifying influential users aids in understanding key voices and targeting specific groups for effective communication. These findings will benefit policymakers, researchers, and stakeholders in hydrogen energy and related technologies, aiding informed decisions and strategy formation.
This study is one of the first studies on this large multi-faceted data, and there remain many directions to be further explored. Future work involves developing and applying advanced techniques to classify tweets by relevance, refining analysis for more precise insights, and sharing our dataset with researchers. While analysing tweets, it is imperative to acknowledge the existence of irrelevant ones that necessitate filtering in future work. Establishing criteria for distinguishing between relevant and irrelevant content is imperative to ensure the precision of findings in future analyses. Further research will also explore the role and impact of media content and its interplay with textual information. An intriguing avenue for future research is the investigation of bots and their impact on shaping discourse.
Some in-depth analysis can also be performed to find the traits of users prone to posting aggressively, the attributes defining users with a substantial follower count or frequent mentions, and the distinguishing features of tweets that attain high engagement. Exploring these aspects can enhance our understanding of Twitter user engagement, inform effective awareness campaign strategies, and promote meaningful interactions on the platform. Additional areas for exploration will involve assessing awareness levels across diverse regions and languages on various platforms, along with strategies to enhance participation in regions with lower engagement. Exploring alternative platforms commonly used for similar discussions is crucial to understanding the broader landscape. This includes examining factors that might influence language preferences and the dynamics of participation in climate-related discussions.

Acknowledgements

The authors express their gratitude to FEnEx CRC for partial funding support (Project ID: 6467) for this work. The views presented herein are those of the authors and not necessarily reflective of the organizations involved. Special thanks to Dr. Sangeetha Kutty and Dr. Ellen Tyquin for their assistance in finalizing keywords for data extraction. Profound appreciation goes to Prof. Amisha Mehta and Prof. Cameron Newton for their valuable feedback, which significantly contributed to shaping the final version of this paper.

Declarations

Competing interests

The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fußnoten
1
Recently changed its name to X. In this paper, we refer to this platform by Twitter.
 
2
available for tweets originating after June 14, 2022 (Twitterlanguageunknowncodes.https:, , twittercommunity.com, t, unkown-language-code-qht-returned-by-api 172819, 22019. (Accessed: 202310-18). ,2019).
 
Literatur
Zurück zum Zitat Agarwal B, Mittal N, Bansal P, Garg S (2015) Sentiment analysis using common-sense and context information. Comput Intell Neurosci 30–30:2015 Agarwal B, Mittal N, Bansal P, Garg S (2015) Sentiment analysis using common-sense and context information. Comput Intell Neurosci 30–30:2015
Zurück zum Zitat Agarwal A, Uniyal D, Toshniwal D, Deb D (2021) Dense vector embedding based approach to identify prominent disseminators from twitter data amid covid-19 outbreak. IEEE Trans Emerg Top Comput Intell 5(3):308–320CrossRef Agarwal A, Uniyal D, Toshniwal D, Deb D (2021) Dense vector embedding based approach to identify prominent disseminators from twitter data amid covid-19 outbreak. IEEE Trans Emerg Top Comput Intell 5(3):308–320CrossRef
Zurück zum Zitat Alshaabi T, Dewhurst DR, Minot JR, Arnold MV, Adams JL, Danforth CM, and Dodds PS (2021) The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on twitter for 2009–2020. EPJ Data Sci 10(1):15, Alshaabi T, Dewhurst DR, Minot JR, Arnold MV, Adams JL, Danforth CM, and Dodds PS (2021) The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on twitter for 2009–2020. EPJ Data Sci 10(1):15,
Zurück zum Zitat Arazzi M, Ferretti M, Nicolazzo S, Nocera A (2023) The role of social media on the evolution of companies: A twitter analysis of streaming service providers. Online Soc Netw Media 36:100251CrossRef Arazzi M, Ferretti M, Nicolazzo S, Nocera A (2023) The role of social media on the evolution of companies: A twitter analysis of streaming service providers. Online Soc Netw Media 36:100251CrossRef
Zurück zum Zitat Andreadis S, Antzoulatos G, Mavropoulos T, Giannakeris P, Tzionis G, Pantelidis N, Ioannidis K, Karakostas A, Gialampoukidis I, Vrochidis S et al (2021) A social media analytics platform visualising the spread of covid-19 in italy via exploitation of automatically geotagged tweets. Online Soc Netw Media 23:100134CrossRef Andreadis S, Antzoulatos G, Mavropoulos T, Giannakeris P, Tzionis G, Pantelidis N, Ioannidis K, Karakostas A, Gialampoukidis I, Vrochidis S et al (2021) A social media analytics platform visualising the spread of covid-19 in italy via exploitation of automatically geotagged tweets. Online Soc Netw Media 23:100134CrossRef
Zurück zum Zitat Arce-García S, Díaz-Campo J, Cambronero-Saiz B (2023) Online hate speech and emotions on twitter: a case study of greta thunberg at the un climate change conference cop25 in 2019. Soc Netw Anal Min 13(1):48CrossRef Arce-García S, Díaz-Campo J, Cambronero-Saiz B (2023) Online hate speech and emotions on twitter: a case study of greta thunberg at the un climate change conference cop25 in 2019. Soc Netw Anal Min 13(1):48CrossRef
Zurück zum Zitat Balasubramaniam T, Nayak R, Luong K, Bashar MA (2021) Identifying covid-19 misinformation tweets and learning their spatiotemporal topic dynamics using nonnegative coupled matrix tensor factorization. Soc Netw Anal Min 11(1):57CrossRef Balasubramaniam T, Nayak R, Luong K, Bashar MA (2021) Identifying covid-19 misinformation tweets and learning their spatiotemporal topic dynamics using nonnegative coupled matrix tensor factorization. Soc Netw Anal Min 11(1):57CrossRef
Zurück zum Zitat Bashar MA, Nayak R, Luong K, Balasubramaniam T (2021) Progressive domain adaptation for detecting hate speech on social media with small training set and its application to covid-19 concerned posts. Soc Netw Anal Min 11:1–18CrossRef Bashar MA, Nayak R, Luong K, Balasubramaniam T (2021) Progressive domain adaptation for detecting hate speech on social media with small training set and its application to covid-19 concerned posts. Soc Netw Anal Min 11:1–18CrossRef
Zurück zum Zitat Bashar MA, Nayak R, Balasubramaniam T (2022) Deep learning based topic and sentiment analysis: Covid19 information seeking on social media. Soc Netw Anal Min 12(1):90CrossRef Bashar MA, Nayak R, Balasubramaniam T (2022) Deep learning based topic and sentiment analysis: Covid19 information seeking on social media. Soc Netw Anal Min 12(1):90CrossRef
Zurück zum Zitat Cha M, Haddadi H, Benevenuto F, Gummadi K (2010) Measuring user influence in twitter: The million follower fallacy. Proc Int AAAI Conf Web Social Media 4(1):10–17CrossRef Cha M, Haddadi H, Benevenuto F, Gummadi K (2010) Measuring user influence in twitter: The million follower fallacy. Proc Int AAAI Conf Web Social Media 4(1):10–17CrossRef
Zurück zum Zitat Chen L, Nayak R (2011) Social network analysis of an online dating network. In: Proceedings of the 5th international conference on communities and technologies, pp 41–49 Chen L, Nayak R (2011) Social network analysis of an online dating network. In: Proceedings of the 5th international conference on communities and technologies, pp 41–49
Zurück zum Zitat Corbett J, Savarimuthu BTR (2022) From tweets to insights: A social media analysis of the emotion discourse of sustainable energy in the united states. Energy Res Soc Sci 89:102515CrossRef Corbett J, Savarimuthu BTR (2022) From tweets to insights: A social media analysis of the emotion discourse of sustainable energy in the united states. Energy Res Soc Sci 89:102515CrossRef
Zurück zum Zitat Cruz RMO, de Sousa WV, Cavalcanti GDC (2022) Selecting and combining complementary feature representations and classifiers for hate speech detection. Online Soc Netw Media 28:100194CrossRef Cruz RMO, de Sousa WV, Cavalcanti GDC (2022) Selecting and combining complementary feature representations and classifiers for hate speech detection. Online Soc Netw Media 28:100194CrossRef
Zurück zum Zitat Dawood F, Anda M, Shafiullah GM (2020) Hydrogen production for energy: an overview. Int J Hydrogen Energy 45(7):3847–3869 Dawood F, Anda M, Shafiullah GM (2020) Hydrogen production for energy: an overview. Int J Hydrogen Energy 45(7):3847–3869
Zurück zum Zitat Debnath R, Bardhan R, Shah DU, Mohaddes K, Ramage MH, Alvarez RM, Sovacool BK (2022) Social media enables people-centric climate action in the hard-to-decarbonise building sector. Sci Rep 12(1):19017CrossRef Debnath R, Bardhan R, Shah DU, Mohaddes K, Ramage MH, Alvarez RM, Sovacool BK (2022) Social media enables people-centric climate action in the hard-to-decarbonise building sector. Sci Rep 12(1):19017CrossRef
Zurück zum Zitat Dehler-Holland J, Okoh M, Keles D (2022) Assessing technology legitimacy with topic models and sentiment analysis–the case of wind power in germany. Technol Forecast Soc Chang 175:121354CrossRef Dehler-Holland J, Okoh M, Keles D (2022) Assessing technology legitimacy with topic models and sentiment analysis–the case of wind power in germany. Technol Forecast Soc Chang 175:121354CrossRef
Zurück zum Zitat Falkenberg M, Galeazzi A, Torricelli M, Di Marco N, Larosa F, Sas M, Mekacher A, Pearce W, Zollo F, Quattrociocchi W et al (2022) Growing polarization around climate change on social media. Nat Clim Change 11:1–8 Falkenberg M, Galeazzi A, Torricelli M, Di Marco N, Larosa F, Sas M, Mekacher A, Pearce W, Zollo F, Quattrociocchi W et al (2022) Growing polarization around climate change on social media. Nat Clim Change 11:1–8
Zurück zum Zitat Han S-M, Kim J-H, Yoo S-H (2022) The public’s acceptance toward building a hydrogen fueling station near their residences: the case of south korea. Int J Hydrogen Energy 47(7):4284–4293CrossRef Han S-M, Kim J-H, Yoo S-H (2022) The public’s acceptance toward building a hydrogen fueling station near their residences: the case of south korea. Int J Hydrogen Energy 47(7):4284–4293CrossRef
Zurück zum Zitat Hydrogen strategy for Canada (2020). https://natural-resources.canada.ca/sites/nrcan/files/environment/hydrogen/NRCan_ Hydrogen-Strategy-Canada-na-en-v3.pdf. Accessed: 18–10–2023 Hydrogen strategy for Canada (2020). https://​natural-resources.​canada.​ca/​sites/​nrcan/​files/​environment/​hydrogen/​NRCan_​ Hydrogen-Strategy-Canada-na-en-v3.pdf. Accessed: 18–10–2023
Zurück zum Zitat Ibar-Alonso R, Quiroga-García R, Arenas-Parra M (2022) Opinion mining of green energy sentiment: a russia-ukraine conflict analysis. Mathematics 10(14):2532CrossRef Ibar-Alonso R, Quiroga-García R, Arenas-Parra M (2022) Opinion mining of green energy sentiment: a russia-ukraine conflict analysis. Mathematics 10(14):2532CrossRef
Zurück zum Zitat Iribarren D, Martín-Gamboa M, Manzano J, Dufour J (2016) Assessing the social acceptance of hydrogen for transportation in spain: An unintentional focus on target population for a potential hydrogen economy. Int J Hydrogen Energy 41(10):5203–5208CrossRef Iribarren D, Martín-Gamboa M, Manzano J, Dufour J (2016) Assessing the social acceptance of hydrogen for transportation in spain: An unintentional focus on target population for a potential hydrogen economy. Int J Hydrogen Energy 41(10):5203–5208CrossRef
Zurück zum Zitat Ingaldi M, Klimecka-Tatar D (2020) People’s attitude to energy from hydrogen—from the point of view of modern energy technologies and social responsibility. Energies 13(24):6495CrossRef Ingaldi M, Klimecka-Tatar D (2020) People’s attitude to energy from hydrogen—from the point of view of modern energy technologies and social responsibility. Energies 13(24):6495CrossRef
Zurück zum Zitat Itaoka K, Saito A, Sasaki K (2017) Public perception on hydrogen infrastructure in japan: influence of rollout of commercial fuel cell vehicles. Int J Hydrogen Energy 42(11):7290–7296CrossRef Itaoka K, Saito A, Sasaki K (2017) Public perception on hydrogen infrastructure in japan: influence of rollout of commercial fuel cell vehicles. Int J Hydrogen Energy 42(11):7290–7296CrossRef
Zurück zum Zitat Jaramillo OL, Stotts R, Kelley S, Kuby M (2019) Content analysis of interviews with hydrogen fuel cell vehicle drivers in los angeles. Transp Res Record 2673(9):377–388CrossRef Jaramillo OL, Stotts R, Kelley S, Kuby M (2019) Content analysis of interviews with hydrogen fuel cell vehicle drivers in los angeles. Transp Res Record 2673(9):377–388CrossRef
Zurück zum Zitat Jung J, Petkanic P, Nan D, Kim JH (2020) When a girl awakened the world: A user and social message analysis of greta thunberg. Sustainability 12(7):2707CrossRef Jung J, Petkanic P, Nan D, Kim JH (2020) When a girl awakened the world: A user and social message analysis of greta thunberg. Sustainability 12(7):2707CrossRef
Zurück zum Zitat Kar SK, Sinha ASK, Bansal R, Shabani B, Harichandan S (2022) Overview of hydrogen economy in australia. Wiley Interdiscip Rev Energy Environ, p e457 Kar SK, Sinha ASK, Bansal R, Shabani B, Harichandan S (2022) Overview of hydrogen economy in australia. Wiley Interdiscip Rev Energy Environ, p e457
Zurück zum Zitat Keller TR, Klinger U (2019) Social bots in election campaigns: Theoretical, empirical, and methodological implications. Polit Commun 36(1):171–189CrossRef Keller TR, Klinger U (2019) Social bots in election campaigns: Theoretical, empirical, and methodological implications. Polit Commun 36(1):171–189CrossRef
Zurück zum Zitat Kušen E, Strembeck M (2018) Politics, sentiments, and misinformation: An analysis of the twitter discussion on the 2016 austrian presidential elections. Online Soc Netw Media 5:37–50CrossRef Kušen E, Strembeck M (2018) Politics, sentiments, and misinformation: An analysis of the twitter discussion on the 2016 austrian presidential elections. Online Soc Netw Media 5:37–50CrossRef
Zurück zum Zitat Lozano LL, Bharadwaj B, de Sales A, Kambo A, Ashworth P (2022) Societal acceptance of hydrogen for domestic and export applications in australia. Int J Hydrogen Energy 47(67):28806–28818 Lozano LL, Bharadwaj B, de Sales A, Kambo A, Ashworth P (2022) Societal acceptance of hydrogen for domestic and export applications in australia. Int J Hydrogen Energy 47(67):28806–28818
Zurück zum Zitat McKinnon M, Semmens D, Moon B, Amarasekara I, Bolliet L (2016) Science, twitter and election campaigns: tracking# auspol in the australian federal elections. J Sci Commun 15(6):2016CrossRef McKinnon M, Semmens D, Moon B, Amarasekara I, Bolliet L (2016) Science, twitter and election campaigns: tracking# auspol in the australian federal elections. J Sci Commun 15(6):2016CrossRef
Zurück zum Zitat Milani D, Kiani A, McNaughton R (2020) Renewable-powered hydrogen economy from australia’s perspective. Int J Hydrogen Energy 45(46):24125–24145CrossRef Milani D, Kiani A, McNaughton R (2020) Renewable-powered hydrogen economy from australia’s perspective. Int J Hydrogen Energy 45(46):24125–24145CrossRef
Zurück zum Zitat Moernaut R, Mast J, Temmerman M, Broersma M (2022) Hot weather, hot topic. polarization and sceptical framing in the climate debate on twitter. Inf Commun Soc 25(8):1047–1066 Moernaut R, Mast J, Temmerman M, Broersma M (2022) Hot weather, hot topic. polarization and sceptical framing in the climate debate on twitter. Inf Commun Soc 25(8):1047–1066
Zurück zum Zitat VA Panchenko, Daus YV, Kovalev AA , Yudaev IV, Litti YV (2022) Prospects for the production of green hydrogen: review of countries with high potential. Int J Hydrogen Energy VA Panchenko, Daus YV, Kovalev AA , Yudaev IV, Litti YV (2022) Prospects for the production of green hydrogen: review of countries with high potential. Int J Hydrogen Energy
Zurück zum Zitat Pilař L, Stanislavská LK, Pitrová J, Krejčí I, Tichá I, Chalupová M (2019) Twitter analysis of global communication in the field of sustainability. Sustainability 11(24):6958CrossRef Pilař L, Stanislavská LK, Pitrová J, Krejčí I, Tichá I, Chalupová M (2019) Twitter analysis of global communication in the field of sustainability. Sustainability 11(24):6958CrossRef
Zurück zum Zitat Repowereu (2022). https://commission.europa.eu/strategy-and-policy/priorities-2019–2024/european-green-deal/repowereu-affordable-secure-and-sustainable-energy-europ en. Accessed: 18–10–2023 Repowereu (2022). https://​commission.​europa.​eu/​strategy-and-policy/​priorities-2019–2024/european-green-deal/repowereu-affordable-secure-and-sustainable-energy-europ en. Accessed: 18–10–2023
Zurück zum Zitat Rijhwani S, Sequiera R, Choudhury M, Bali K, Maddila CS (2017) Estimating code-switching on twitter with a novel generalized word-level language detection technique. In: Proceedings of the 55th annual meeting of the association for computational linguistics (vol 1: long papers), pp 1971–1982 Rijhwani S, Sequiera R, Choudhury M, Bali K, Maddila CS (2017) Estimating code-switching on twitter with a novel generalized word-level language detection technique. In: Proceedings of the 55th annual meeting of the association for computational linguistics (vol 1: long papers), pp 1971–1982
Zurück zum Zitat Saxena N, Sinha A, Bansal T, Wadhwa A (2023) A statistical approach for reducing misinformation propagation on twitter social media. Inf Process Manage 60(4):103360CrossRef Saxena N, Sinha A, Bansal T, Wadhwa A (2023) A statistical approach for reducing misinformation propagation on twitter social media. Inf Process Manage 60(4):103360CrossRef
Zurück zum Zitat Shah Z, Dunn AG (2019) Event detection on twitter by mapping unexpected changes in streaming data into a spatiotemporal lattice. IEEE Trans Big Data 8(2):508–522 Shah Z, Dunn AG (2019) Event detection on twitter by mapping unexpected changes in streaming data into a spatiotemporal lattice. IEEE Trans Big Data 8(2):508–522
Zurück zum Zitat The ten point plan for a green industrial revolution (2020). https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/ file/936567/10_POINT_PLAN_BOOKLET.pdf. Accessed: 18–10–2023 The ten point plan for a green industrial revolution (2020). https://​assets.​publishing.​service.​gov.​uk/​government/​uploads/​system/​uploads/​attachment_​data/​ file/936567/10_POINT_PLAN_BOOKLET.pdf. Accessed: 18–10–2023
Zurück zum Zitat Uk hydrogen strategy (2021). https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1011283/ UK-Hydrogen-Strategy_web.pdf2021. Accessed: 18–10–2023. Uk hydrogen strategy (2021). https://​assets.​publishing.​service.​gov.​uk/​government/​uploads/​system/​uploads/​attachment_​data/​file/​1011283/​ UK-Hydrogen-Strategy_web.pdf2021. Accessed: 18–10–2023.
Zurück zum Zitat Xu Y, Cao H, Du W, Wang W (2022) A survey of cross-lingual sentiment analysis: methodologies, models and evaluations. Data Sci Eng, pp 1–21. Xu Y, Cao H, Du W, Wang W (2022) A survey of cross-lingual sentiment analysis: methodologies, models and evaluations. Data Sci Eng, pp 1–21.
Metadaten
Titel
Twitter’s pulse on hydrogen energy in 280 characters: a data perspective
verfasst von
Deepak Uniyal
Richi Nayak
Publikationsdatum
01.12.2024
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2024
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-023-01194-6

Weitere Artikel der Ausgabe 1/2024

Social Network Analysis and Mining 1/2024 Zur Ausgabe

Premium Partner