Introduction
Related work
Methodology
Data collection phase
Dataset pre-processing
- Let \(P_{i}\) be the probability of any feature instance \(\left( f \right)\) of k feature set \(F = \left\{ {f_{1} ,f_{2} , \ldots f_{k} } \right\}\) belonging to ith customer review \(R_{i}\), where i varies from 1 to N.
- Let N denotes the total number of customer reviews.
- Let \(O_{R}\) denotes the polarity of extracted opinions of the Review.
- Let \(S_{R}\) denotes product rating scale of review (R).
No | Customer reviewed features | No | Customer reviewed features |
---|---|---|---|
1 | Author | 17 | RAM |
2 | Title | 18 | Sim type |
3 | ReviewID | 19 | Product category |
4 | Content | 20 | Thickness |
5 | Product brand | 21 | Weight of mobile phone |
6 | Ratings | 22 | Height |
7 | Battery life | 23 | Product type |
8 | Price | 24 | Product rating |
9 | Feature information gain | 25 | Front camera |
10 | Review type | 26 | Back camera |
11 | Product display | 27 | Opinion of review |
12 | Processor | 28 | Multi-band |
13 | Operating system | 29 | Network support |
14 | Water proof | 30 | Quick charging |
15 | Rear camera | 31 | Finger sensor |
16 | Applications inbuilt | 32 | Internal storage |
Resilient Distributed Dataset
- Reduce (β): Combine all the elements of the dataset using the function β.
- First (): This function will return the first element
- takeOrdered(n): RDD is returned with first ‘n’ elements.
- saveAsSequenceFile(path): the elements in the dataset to be written to the local file system with given path.
- map(β): Elements from the input file is mapped and new dataset is returned through function β.
- filter(β): New dataset is returned if the function β returns true.
- groupBykey(): When called a dataset of (key, value) pairs, this function returns a dataset of (key, value) pairs.
- ReduceBykey(β): A (key, value) pair dataset is returned, where the values of each key are combined using the given reduce function β.
-
Let the list of n customers represented as \(C = \left\{ {c_{1} ,c_{2} ,c_{3} \ldots ,c_{n} } \right\}\)
-
Let the list of N reviews be represented as \(R = \left\{ {r_{1} ,r_{2} ,r_{3} \ldots ,r_{N} } \right\}\)
-
Let \(x\) significant features are identified from feature set \((F\)) represented as \(F_{x} \subset F\)
-
An active customer consists of significant feature having information Gain value denoted by \(\Delta_{G}\)
Prediction classifiers
Logistic regression (LR)
- Let \(p\) be the prediction variable value, assigning 0 for failure and 1 for success.
- \(p_{0}\) is the constant value.
- \(b\) is the logarithmic base value.
Support Vector Machine (SVM)
- For a set \(T\) of \(t\) training feature vectors, \(z_{i} \in R^{D} ,\) where i = 1 to t.
- Let \(y_{i} \in \left\{ { + 1, - 1} \right\}\), where +1 belongs to product success class and -1 belongs to product failure class.
- The data separation occurs in the real numbers denoted as \(X\) in the D dimensional input space.
- Let \(w\) be the hyper plane normal vector element, where \(w \in X^{D}\).
Experimental setup
Results and discussions
Classifier | Support vector machine | ||
---|---|---|---|
Method used | P@R (precision) | PA % (prediction accuracy) | |
DMRDF | 0.941 | 0.92 | 95.4 |
LSA-based | 0.894 | 0.79 | 87.5 |
Gini-index | 0.66 | 0.567 | 83.2 |
Classifier | Logistic regression | ||
---|---|---|---|
Method used | P@R | R@R % | PA % |
DMRDF | 0.915 | 0.849 | 93.5 |
LSA-based | 0.839 | 0.753 | 83 |
Gini-index | 0.62 | 0.52 | 79.8 |