1 Introduction
1.1 Methodology
-
Literature sources: all corpora included in the MEDLINE/ PubMed database
-
Time frame: 2007 early 2018 (covering all eligible literature in last decade)
-
Geographic coverage: all inclusive
-
Literature selection: the literature search (covering a time window of the last 10 years) used two groups of keywords. The first group included the following terms as approximate synonyms for social data: social media, social networking, forum, Twitter, Facebook, search log and social data. The second group referred to ADR detection and included the terms: adverse drug reaction, side effect and pharmacovigilance. Thus, the literature search query for article selection had the logical form of:Social Data, or equiv. AND Adverse Drug Reaction detection, or equiv.
1.2 Pharmacovigilance background
-
Early detection of unknown adverse reactions and interactions;
-
Detection of increases in frequency of known adverse reactions;
-
Identification of risk factors and possible mechanisms;
-
Estimation of quantitative aspects of benefit/risk analysis and dissemination of information needed to improve medicine prescribing and regulation.
1.3 Social networking sites and their relevance to pharmacovigilance
-
Big public platform SNS, such as Facebook, Twitter, Flicker and Tumblr, which host a plethora of health-related communities/groups, and also contain big volumes of posts by individual users related to health issues.
-
Generic health-centred SNS (generic networking sites on general health topics and disease support, usually requiring user profiles), such as PatientsLikeMe (/www.patientslikeme.com), DailyStrength (www.dailystrength.org), MedHelp (www.medhelp.org), WebMD (https://exchanges.webmd.com/), and CureTogether (http://curetogether.com/), where users discuss their health-related experiences, including use of prescription drugs, side effects and treatments,
-
Medicine-focused sharing platforms (patient forums), like Ask a Patient (http://www.askapatient.com) and Medications.com (http://www.medications.com/), which allow patients to share and compare medication experiences.-Disease-specific online health forums focused on specific diseases, e.g. the TalkStroke forum (https://www.stroke.org.uk/forum) for stroke survivors and caregivers hosted by UK’s Stroke Association [30], Australia’s ReachOut.com (https://au.reachout.com/forums) forum for mental health support, etc.
-
Collected (back-office) data: Data collected by the service provider, which usually include profile and network data explicitly provided by the user, and click history implicitly provided. Users can assume that everything they do and upload in the browser tab of the service is collected, if the privacy policy of the service does not state it otherwise.
-
Front-end Data: Data that are knowledgeably shared. This includes: (i) Public/disclosed data. Data that are published openly, such as complete name or e-mail. It can be useful for users trying to contact other users. (ii) Social data. Data that are openly shared with the users trusted contacts. Unless these contacts are inside their circle of trust, they cannot access it.
2 An overview of applications of social data in pharmacovigilance
-
Data harvesting: collection of raw data;
-
Translation: standardisation of drug names and vernacular symptom/event descriptions;
-
Filtering: identification of relevant informative posts and data cleaning (removal of duplicates and noise);
-
De-identification: removal of personally identifying information;
-
Supplementation: addition of other data sources to facilitate the review process and contextualise the results, in order to assist interpretation (e.g. product label, sales data).
2.1 Specialised healthcare social networks and forums
2.2 Generic SNS
2.3 Search logs
Attribute | Findings: advantages and limitations | Relevant references |
---|---|---|
Population coverage | Advantages | |
Social media penetration | ||
Size and growth of social media data | ||
Coverage of a large and diverse population and large geographic areas | ||
Limitations | ||
Social media bias and user representativeness concerns | ||
Social media user bases are skewed in terms of age, gender, ethnicity and physical location | ||
Under-represented segments of the population of drug users exist (e.g. non-social media users) | ||
Lack of generalisability | ||
Usefulness | Advantages | |
Efficient collection of patient perspectives: a direct source of users personal experiences, including their preferences and the effectiveness of interventions | ||
Social media data are sensitive to underlying changes in patients’ functional status | ||
New knowledge for the safety profile of drugs can be generated | ||
Rare diseases can be targeted | ||
Possible insight on patients reluctant to report or share their adverse drug experiences on social media (search logs) | ||
Limitations | ||
Representation of specific types of adverse events in social media differs from that in traditional sources | ||
Gender and cultural background may affect the quality and frequency of disclosures | ||
Eponymity may inhibit disclosures (e.g. in Facebook) | ||
Timeliness | Advantages | |
High velocity of data generation | ||
Data recency allows for timeliness of the information and insights obtained via social media (near-instantaneous information) | ||
Accessibility | Advantages | |
Open social media | ||
Research partnerships with thematic (specialised health care) SNS through which patient-contributed data can be shared | ||
Limitations | ||
Privacy and security issues: | ||
Privacy settings of social media: inaccessibility of some social media channels, (e.g. closed Facebook groups), restricted access to user posts | ||
Possibility of unanticipated changes in availability due to personal user settings and platform modifications | ||
Quality | Limitations | |
Cross-channel information diffusion: multiple postings (duplicates) | ||
Use of colloquial language: misspellings, use of non-medical terms and slang | ||
Non-experiential reporting | ||
Content non-validated by medical experts: users may report ADR by mistake, or they may be mistaken in their perception of the ADR | ||
Data authenticity cannot be verified | ||
Partial data (e.g. on Twitter due to character length limitations) | ||
Incomplete data: missing important information | ||
High signal-to-noise ratio: a small proportion of drug-associated data collected from social media tend to contain information associated with ADRs | ||
Processability | Advantages | |
Advances in big data analytics | ||
Availability of advanced NLP and machine learning techniques | ||
Advances in the data processing capabilities of machines | ||
Trends analysis techniques and tools | ||
Limitations | ||
Big data analysis issues: e.g. high volume of data | ||
Lack of standards and uniformity across different SNS | ||
Other structural barriers that impede access and analysis | ||
Barriers in analysing data across languages (lack of relevant lexical resources) |
Attribute | Specialised healthcare social networks and forums | Generic SNS | Search logs |
---|---|---|---|
Population coverage | Drug user population and specific cohorts | Large and diverse populations and geographic areas | Large and diverse populations and geographic areas |
Data usefulness | Smaller volumes of data but greater salience of drug-specific information. More consistent and complete information regarding demographics, exposure, outcome and possible contextual factors | Incomplete information. Large volumes of data. High noise-to-data ratio | Search frequency as early indication of potential drug–symptom relationship |
Data timeliness | Near real time | Near real time | Near real time |
Data accessibility | Research partnerships | Open social media and/or sources with conditioned access; privacy concerns | Available |
Data quality | Adequate. Availability of structured data possible. Often lengthier postings with greater potential for analysis | Important quality concerns. Lack of quality control. Unstructured data | Adequate for the detection of events and for relating these events to drugs. Inadequate for more rigorous analysis of the ADR, only serves as indicative. Possibility of false positives |
Data processability | Text and Data mining, NLP, machine learning methods | Text mining, NLP, machine learning methods | Search analytics |
n/n | Challenges | Type (dimension) | Solution timescale (estimated) |
---|---|---|---|
1 | Value/utility of social media | Conceptual | Short term |
2 | Information extraction from social media | Technical | Long term |
3 | Analysis of social data | Technical | Long term |
4 | Data privacy | Environmental | Short term |
5 | Regulatory framework | Environmental | Short term |
3 Current challenges and the way forward
-
Conceptual: Challenges that relate to the purpose and value of social media use in pharmacovigilance (value/utility of social media as knowledge sources for pharmacovigilance),
-
Technical: Challenges that relate to the feasibility of the process (information extraction from social media and analysis of social data) and
-
Environmental: Challenges that relate to compliance concerns and affect the acceptability of any new pharmacovigilance process proposed (data privacy and regulatory framework).
3.1 Value/utility of social media as knowledge sources for pharmacovigilance
3.2 Information extraction from social media
3.3 Analysis of social data
3.4 Data privacy
3.5 Regulatory framework
-
What is the limit of the industry’s responsibility in collecting and reviewing social media data?
-
How can pharmacovigilance teams confirm the identifiability of the reporter and patient in safety data obtained via social media and establish safeguards against faulty adverse event reporting?
-
What will be acceptable practices for following up on potential signals within the context of data privacy?
-
What are the protocols for big data integration, analysis and interpretation, and reporting of follow-up results?
Attribute | Challenges and research areas | References |
---|---|---|
Data sourcing | Challenge \({::}\) (near) real-time social listening: | |
Value/utility concerns of social data (quality and credibility of individual social data sources, data redundancy and correlation, etc.) | ||
Inherent challenges of big data: big volumes of data of high variety and velocity; | ||
Principal research area | ||
In-depth investigation and comparative analysis of individual social data sources (i.e. analysis of the number, type and quality disclosures made on each social media, employing natural language processing (NLP), computational linguistics and/or text mining techniques) | ||
Complementary research areas | ||
Health and medicine knowledge and engagement of citizens, in order to improve the quality and increase the number of solicited/ unsolicited reports | ||
Information extraction | Challenge \({::}\) concept/relation extraction and ADR classification: | |
Concept and relation extraction (identification of drugs and symptoms (named entity recognition) and of drug/event relationship) | ||
Principal research area | ||
Development of automated or semi-automated drug term discovery systems and algorithms. Investigated solutions build on text mining, computational linguistics, natural language processing (NLP) techniques and/or other rule-based approaches (e.g. lexicon-based solutions) and/or on machine learning (e.g. machine learning classifiers) and artificial intelligence; | ||
Complementary research areas | ||
ADR detection in longitudinal databases and other sources | ||
Automated mining of demographic information | ||
Sentiment analysis | ||
Harnessing behavioural data (e.g. from Internet search logs ]) | ||
Adherence to medicine therapy | ||
Development of algorithm training corpora | ||
Linguistic aspects | ||
Data analysis | Challenge \({::}\) signal detection and causality hypothesis: | |
Identification of previously unsuspected safety signals: quantitative signal detection techniques applied to social data in aggregate (pattern discovery) | ||
Detection of valid individual case safety reports (ICSRs) within social data: most social media reports that mention use of a drug and potential adverse event do not meet the basic regulatory definitions of an individual case safety report | ||
Principal research area | ||
In-depth investigation and comparative analysis of individual social data sources (e.g. analysis of the type and quality of adverse events reported on each SNS) | ||
Development of best practices and methods for determining what constitutes a safety signal in social media | ||
Development of methods for the joint analysis of multiple data sources (multimodal signal detection or signal fusion) | ||
Privacy | Challenge \({::}\) balancing the interests of patient data protection and medication safety monitoring | |
Principal research area | ||
Effective verification of adverse reaction allegations, respecting data privacy and ensuring responsible use of data | ||
Regulation | Challenge \({::}\) regulatory acceptance of social data: | |
New regulatory paradigms for the incorporation of social data into pharmacovigilance systems for de novo signal detection or to complement and enrich primary data sources | ||
Principal research areas | ||
Understanding of the strengths and limitations of social media in post-market safety surveillance and establishment of best practices | ||
Investigation of complementarities between social data and traditional data sources | ||
Development of regulatory guidances and initiatives for social data incorporation across the entire value chain of pharmacovigilance | ||
International harmonisation and regulatory enforcement |