Introduction
Related works
Methods
The proposed prediction system
-
An effective communication with the users. They receive relevant information about their actual status.
-
Integration and management of a huge amount of data.
-
Use of real-time data: receive real-time information such as location, activity, sms, voice call …etc.
-
Use of advanced big data management tool.
-
Prediction of patterns: using both personal data and real time data.
Reality mining for gathering data
Gathering process overview
-
Users register personal information that could be updated in case of any required change.
-
Mobile phones collect real-time data about users such as: activity, location, voice call …etc.
-
Wearable healthcare sensors generate various data about the behavior of the user.
IoT systems: sensors integration and implementation
Sensor | Attributes |
---|---|
Pulse sensora | Heart rate variability, blood pressure and emotion |
Temperature sensorb | Temperature variation |
Acceleration sensorc | Activity degree |
Prediction process
Data conceptual modeling and implementation
Big data management tool
NOSQL database conception
Data mining model
System implementation
Results and discussion
Miscarriage risk factors definition
-
Data from sensors:
-
Heart rate variability [number of beat per minute (BPM)]; It represents a great marker of stress and health. An elevated heart rate is always associated with an increased risk of hypertension and an elevated blood pressure [25].
-
Stress and blood pressure; they are deduced from the value of heart rate variability. In fact, a maximum heart rate variability (HRmax) is calculated with the equation of
-
“208 − 0.7 * age” [26]. Depending on the value of HRmax, stress emotions and blood pressure values are defined.
-
Temperature variation; as in any systemic infection in pregnant women and other viral infections, there is an increased risk of spontaneous miscarriage or premature delivery. In previous epidemics, miscarriages related to increased body temperature and flu, have been reported [27].
-
Physical Activity; as mentioned in “Introduction” section, a strong exercise of the body increases the risk of getting miscarriage.
-
-
Data from mobile phone:
-
BMI (body mass index); obese and underweight women have an increased risk of miscarriage. It is a value derived from height (H) and weight (W) filled by mobile application’s user. BMI equation is presented as:
-
BMI = W (kg)/W2 (m) [28].
-
Number of previous miscarriages; woman who got a previous miscarriage is more likely to have another one. It is a data collected from mobile application’s user profile.
-
Maternal age; increased maternal age increases probability of getting miscarriages. It is a data collected from mobile application’s user profile.
-
Location; all the places visited by users are defined through GPS and google place service. This parameter helps to define the quality of eating well. In fact, by knowing the location of pregnant woman, we can expect her activity: for example, if she stays in restaurant for a long time, there is a high probability that she is eating in restaurant. Also, by knowing the frequency of being in a place, we can deduce here food quality: if the frequency of being in a place tends to be 1, it means that her food quality is not good since she is pregnant. Pregnant women are asked to eat at home for some hygiene protections.
-
Current activity; thanks to the accelerometer in the smartphone and vibration sensor, we can recognize activities like stilling, driving, walking, running or biking, among others.
-
Experiment
Experiment environment
-
Database server: Couchbase Database Server was installed in a network machine with a public IP address.
-
Big data platform: Databricks Spark; Spark 2.1 and Scala 2.11 as a programming language.
-
Mobile tools: Android Studio 2.2.2 for coding the mobile application and smart phones for running the application.
-
Sensors managers: Arduino UNO and Raspberry PI 3.
System dataset
Attribute | Type | Description | |
---|---|---|---|
1 | ID | Integer | The key of JSON document |
2 | Activity | Integer | The level of the activity of the woman during the day |
3 | Location | Integer | Location where the woman spends her time |
4 | BMI | Double | Body mass index: it is an attempt to quantify the amount of tissue mass (muscle, fat, and bone) in an individual, and then categorize him/her |
5 | nMisc | Integer | The number of previous miscarriages of the woman during her pregnancies |
6 | Age | Double | The maternal age of the woman |
7 | Weight | Double | The weight of the woman: the quantity of heaviness or mass. It is used in BMI calculation |
8 | Height | Double | The height of the woman. It is used in BMI calculation |
9 | Temp | Double | Body temperature of the woman |
10 | BPM | Long | Heart rate variability (HRV) per minute |
11 | Stress | Long | Stress emotions |
12 | BP | Long | Blood Pressure indicator |
13 | Time | String | The time to save the file in the database server |
14 | User_email | String | The ID of the woman to whom belongs the current document. It is used to extract the right data about woman |
15 | Type | String | The type of document. It is used to differentiate between authentication documents and documents that contain prediction attributes |
Age (years) | Heart range | HR state | Stress state | Blood pressure state |
---|---|---|---|---|
20 | 100 < HR < HRmax | HR | Normal | Normal |
HR+ | Low | High | ||
HR− | High | Low | ||
30 | 95 < HR < HRmax | HR | Normal | Normal |
HR+ | Low | High | ||
HR− | High | Low |
Value (kg/m2) | Meaning |
---|---|
18.5–24.9 | Normal |
25–29.9 | Overweight |
30–39.9 | Obese |
> 40 | Morbidly obese |
Value of activity | Meaning |
---|---|
1 | Low |
2 | Medium |
3 | High |
4 | Very high |
Valued attribute | Meaning |
---|---|
0 | Others (unknown) |
1 | Restaurant |
2 | Café |
3 | Snack |
4 | Pharmacy |
5 | Bank |
6 | Bankery/pastry |
Algorithm metrics
Cluster | Value | Prediction result |
---|---|---|
Cluster 1 | 0 | Miscarriage |
Cluster 2 | 1 | No miscarriage |
Experiment results
Performance and efficiency
Parameters | Value (s) |
---|---|
Time to build model | 0.51 |
Centers definition | 0.30 |
Cluster distribution | 0.60 |
Model evaluation | 0.19 |
Clusters definition and distribution
Kmeans scatter plot
-
The pregnant woman can do many types of activities during the day,
-
She can go to different places during the day,
-
Her body temperature can increase or decrease,
-
HRV can increase or decrease, so do stress and blood pressure values.
-
The body of pregnant woman evolves during pregnancy; she can lose or take weight. So BMI indicator changes also even if it is not a representative feature in this case, but it showed his effectiveness in a previous experience.
Summary | Features | ||||||||
---|---|---|---|---|---|---|---|---|---|
Feature 0 | Feature 1 | Feature 2 | Feature 3 | Feature 4 | Feature 5 | Feature 6 | Feature 7 | Feature 8 | |
Meana | 25 | 19.84 | 2.0 | 2.51 | 1.44 | 36.0 | 128.04 | 1.21 | 1.51 |
Stddevb | 0.0 | 0.0 | 0.0 | 1.11 | 1.07 | 1.25 | 58.08 | 1.14 | 1.11 |
Minimumc | 25 | 19.84 | 2 | 1 | 0 | 28.0 | 43 | 1 | 1 |
Maximumd | 25 | 19.84 | 2 | 4 | 4 | 40.0 | 238 | 3 | 3 |
Features | Feature 0 | Feature 1 | Feature 2 | Feature 3 | Feature 4 | Feature 5 | Feature 6 | Feature 7 | Feature 8 |
---|---|---|---|---|---|---|---|---|---|
Value of | Age | BMI | nMisc | Activity | Location | Temp | BPM | Stress | BP |
Evaluation and validation
Evaluation using a random K
“K” | WSSSE |
---|---|
1 | 2000.05 |
2 | 0978.24 |
3 | 0546.35 |
Clustering validation metrics
-
How close are the objects within the same cluster? A lower variation of within-cluster is a good indicator of a good compactness and good clustering.
-
How well a cluster is separated from other clusters?
-
Si is almost 1: observations are very well clustered (K = 2, Si = 0.95).
-
Si is around 0: the clustering configuration may have too many or too few clusters (K = 3, Si = 0.36) or objects are not very well matched to their own cluster (K = 1, Si = 0.51).
-
Si is negative: Observations are probably placed in the wrong cluster.