Introduction
Related work
Proposed method
Notations | Details |
---|---|
\(D\)
| Product review document |
\(S_{k}\)
| Review sentence within \(D\) with index-\(k\) |
\(w_{i}\)
| Word with index-\(i\) |
\(ws_{i}^{j}\)
| Sense of word \(i\) with index-\(j\) |
\(spos_{i}^{j}\)
| Raw positive sentiment value of \(ws_{i}^{j}\) picked from SentiwordNet |
\(sneg_{i}^{j}\)
| Raw negative sentiment value of \(ws_{i}^{j}\) picked from SentiwordNet |
\(sneu_{i}^{j}\)
| Raw neutral sentiment value of \(ws_{i}^{j}\) picked from SentiwordNet |
\(sim_{ab}^{cd}\)
| Similarity value between \(ws_{a}^{c}\) and \(ws_{b}^{d}\) calculated using one of Wordnet similarity algorithms |
\(deg\left( {ws_{i}^{j} } \right)\)
| Indegree score of \(ws_{i}^{j}\) |
\(cspos_{i}\)
| Contextual positive sentiment value of \(w_{i}\) |
\(csneg_{i}\)
| Contextual negative sentiment value of \(w_{i}\) |
\(csneu_{i}\)
| Contextual neutral sentiment value of \(w_{i}\) |
\(fposS_{k}\)
| Positive value of feature of \(S_{k}\) |
\(fnegS_{k}\)
| Negative value of feature of \(S_{k}\) |
\(fneuS_{k}\)
| Neutral value of feature of \(S_{k}\) |
\(fposD\)
| Positive value of feature of \(D\) |
\(fnegD\)
| Negative value of feature of \(D\) |
\(fneuD\)
| Neutral value of feature of \(D\) |
\(wd\)
| Domain word of product review dataset |
\(pw_{k}\)
| Pivot word of review sentence \(S_{k}\) |
\(\theta_{i}\)
| Angle representing semantic orientation adjustment of \(w_{i}\) |
\(r_{i}\)
| Degree of correlation between \(pw_{k}\) and \(w_{i}\) |
\(cts_{i}\)
| Prior sentiment value of \(w_{i}\) determined using Rule (11) |
\(x_{i}\)
| Senticircle representation in Cartesian coordinate |
\(y_{i}\)
| Senticircle representation in Cartesian coordinate |
\(fxS_{k}\)
| Feature value of \(S_{k}\) calculated using \(x\) of Senticircle |
\(fyS_{k}\)
| Feature value of \(S_{k}\) calculated using \(y\) of Seinticircle |
\(fxD\)
| Feature value of \(D\) calculated using \(x\) of Senticircle |
\(fyD\)
| Feature value of \(D\) calculated using \(y\) of Senticircle |
Extracting sentence level feature (SLF)
Word | Sentence | Sense | Sentiment |
---|---|---|---|
enjoy | I enjoy using the camera of this smartphone | Get pleasure from | Positive |
The vendor enjoys new regulation issued by the authority | Possess and benefit from | Neutral |
\(w_{1}\)
|
\(w_{2}\)
| |||||
---|---|---|---|---|---|---|
\(ws_{1}^{1}\)
|
\(ws_{1}^{2}\)
|
\(ws_{1}^{3}\)
|
\(ws_{2}^{1}\)
|
\(ws_{2}^{2}\)
| ||
\(w_{1}\)
|
\(ws_{1}^{1}\)
|
\(sim_{12}^{11}\)
|
\(sim_{12}^{12}\)
| |||
\(ws_{1}^{2}\)
|
\(sim_{12}^{21}\)
|
\(sim_{12}^{22}\)
| ||||
\(ws_{1}^{3}\)
|
\(sim_{12}^{31}\)
|
\(sim_{12}^{32}\)
| ||||
\(w_{2}\)
|
\(ws_{2}^{1}\)
| |||||
\(ws_{2}^{2}\)
|
Word | Senses | Sentiment score | |
---|---|---|---|
\(w_{1}\)
| |||
Screen |
\(w_{1}^{1}\)
| A white or silvered surface where pictures can be projected for viewing |
\(spos_{1}^{1}\)
|
\(sneg_{1}^{1}\)
| |||
\(sneu_{1}^{1}\)
| |||
\(w_{1}^{2}\)
| A protective covering that keeps things out or hinders sight |
\(spos_{1}^{2}\)
| |
\(sneg_{1}^{2}\)
| |||
\(sneu_{1}^{2}\)
| |||
\(w_{1}^{3}\)
| The personnel of the film industry |
\(spos_{1}^{3}\)
| |
\(sneg_{1}^{3}\)
| |||
\(sneu_{1}^{3}\)
| |||
\(w_{2}\)
| |||
Great |
\(w_{2}^{1}\)
| Relatively large in size or number or extent |
\(spos_{2}^{1}\)
|
\(sneg_{2}^{1}\)
| |||
\(sneu_{2}^{1}\)
| |||
\(w_{2}^{2}\)
| Of major significance or importance |
\(spos_{2}^{2}\)
| |
\(sneg_{2}^{2}\)
| |||
\(sneu_{2}^{2}\)
|
Capturing domain sensitive features (DSF)
Variable value | Domain | |
---|---|---|
Electronics | Automobile | |
Review sentence | I mounted a shelf above the TV to get the cable box out of the way and avoid having to run a long HDMI cable through the wall | …but they are built solid, nice tough big hard clamps and love having a long cable so I never have to move cars around or anything if needed |
\(pw_{k}\)
| 230 | 495 |
\(f\left( {pw_{k} ,w_{i} } \right)\)
| 85 | 145 |
\(N\)
| 130,765 | 170,873 |
\(Nw_{i}\)
| 186 | 208 |
\(r\)
| 241 | 422 |
\(cts_{i}\)
| 0.32 | 0.25 |
\(\theta_{i}\)
|
57.6°
|
45°
|
\(x_{i}\)
|
0.29
|
0.66
|
\(y_{i}\)
|
0.45
|
0.66
|
Feature | Details | Type |
---|---|---|
F1 | ||
\(fposD\left( {wup} \right)\) | Average positive value of review document where Wu and Palmer is employed as similarity algorithm | Sentence level features (SLF) |
F2 | ||
\(fnegD\left( {wup} \right)\) | Average negative value of review document where Wu and Palmer is employed as similarity algorithm | |
F3 | ||
\(fneuD\left( {wup} \right)\) | Average neutral value of review document where Wu and Palmer is employed as similarity algorithm | |
F4 | ||
\(fposD\left( {jcn} \right)\) | Average positive value of review document where Jiang and Conrath is employed as similarity algorithm | |
F5 | ||
\(fnegD\left( {jcn} \right)\) | Average negative value of review document where Jiang and Conrath is employed as similarity algorithm | |
F6 | ||
\(fneuD\left( {jcn} \right)\) | Average neutral value of review document where Jiang and Conrath is employed as similarity algorithm | |
F7 | ||
\(fposD\left( {lch} \right)\) | Average positive value of review document where Leacock and Chodorow is employed as similarity algorithm | |
F8 | ||
\(fnegD\left( {lch} \right)\) | Average negative value of review document where Leacock and Chodorow is employed as similarity algorithm | |
F9 | ||
\(fneuD\left( {lch} \right)\) | Average neutral value of review document where Leacock and Chodorow is employed as similarity algorithm | |
F10 | ||
\(fposD\left( {res} \right)\) | Average positive value of review document where Resnik is employed as similarity algorithm | |
F11 | ||
\(fnegD\left( {res} \right)\) | Average negative value of review document where Resnik is employed as similarity algorithm | |
F12 | ||
\(fneuD\left( {res} \right)\) | Average neutral value of review document where Resnik is employed as similarity algorithm | |
F13 | ||
\(fposD\left( {lin} \right)\) | Average positive value of review document where Lin is employed as similarity algorithm | |
F14 | ||
\(fnegD\left( {lin} \right)\) | Average negative value of review document where Lin is employed as similarity algorithm | |
F15 | ||
\(fneuD\left( {lin} \right)\) | Average neutral value of review document where Lin is employed as similarity algorithm | |
F16 | ||
\(fxD\left( {wup} \right)\) | Average \(x\) value of review document where Wu and Palmer is employed as similarity algorithm | Domain sensitive features (DSF) |
F17 | ||
\(fyD\left( {wup} \right)\) | Average \(y\) value of review document where Wu and Palmer is employed as similarity algorithm | |
F18 | ||
\(fxD\left( {jcn} \right)\) | Average \(x\) value of review document where Jiang and Conrath is employed as similarity algorithm | |
F19 | ||
\(fyD\left( {jcn} \right)\) | Average \(y\) value of review document where Jiang and Conrath is employed as similarity algorithm | |
F20 | ||
\(fxD\left( {lch} \right)\) | Average \(x\) value of review document where Leacock and Chodorow is employed as similarity algorithm | |
F21 | ||
\(fyD\left( {lch} \right)\) | Average \(y\) value of review document where Leacock and Chodorow is employed as similarity algorithm | |
F22 | ||
\(fxD\left( {res} \right)\) | Average \(x\) value of review document where Resnik is employed as similarity algorithm | |
F23 | ||
\(fyD\left( {res} \right)\) | Average \(y\) value of review document where Resnik is employed as similarity algorithm | |
F24 | ||
\(fxD\left( {lin} \right)\) | Average \(x\) value of review document where Lin is employed as similarity algorithm | |
F25 | ||
\(fyD\left( {lin} \right)\) | Average \(y\) value of review document where Lin is employed as similarity algorithm |
Experimental results and discussion
Experimental setup
Dataset description
Data | Details |
---|---|
reviewerID | ID of the reviewer |
asin | ID of the product |
revewerName | Name of the reviewer |
Helpfulness | Helpfulness rating of the review |
reviewText | Text of the review |
Overall | Rating of the product |
Summary | Summary of the review |
unixReviewTime | Time of the review (unix time) |
reviewTime | Time of the review (raw time) |
Results and discussion
Feature selection method | ML algorithm | BF | Proposed feature | |||||||
---|---|---|---|---|---|---|---|---|---|---|
SLF | SLF + DSF | |||||||||
Prec | Rec | F-meas | Prec | Rec | F-meas | Prec | Rec | F-meas | ||
None | Bayes Net | 0.632 | 0.481 | 0.517 | 0.674 | 0.519 | 0.570 | 0.733 | 0.752 | 0.741 |
Naïve Bayes | 0.595 | 0.595 | 0.595 | 0.674 | 0.519 | 0.570 | 0.678 | 0.495 | 0.549 | |
Logistic | 0.544 | 0.544 | 0.544 | 0.641 | 0.740 | 0.687 | 0.628 | 0.657 | 0.642 | |
MLP | 0.701 | 0.722 | 0.710 | 0.707 | 0.750 | 0.724 | 0.712 | 0.733 | 0.722 | |
J48 | 0.607 | 0.658 | 0.629 | 0.651 | 0.798 | 0.717 | 0.743 | 0.733 | 0.738 | |
Random Forest | 0.660 | 0.747 | 0.670 | *0.792 | *0.817 | *0.758 | 0.823 | 0.646 | 0.752 | |
Random Tree | 0.689 | 0.684 | 0.686 | 0.730 | 0.750 | 0.739 | 0.757 | 0.762 | *0.760 | |
CA | Bayes Net | 0.632 | 0.481 | 0.517 | 0.674 | 0.519 | 0.570 | 0.733 | 0.752 | 0.741 |
Naïve Bayes | 0.595 | 0.595 | 0.595 | 0.674 | 0.519 | 0.570 | 0.678 | 0.495 | 0.549 | |
Logistic | 0.544 | 0.544 | 0.544 | 0.641 | 0.740 | 0.687 | 0.628 | 0.657 | 0.642 | |
MLP | 0.701 | 0.722 | 0.710 | 0.707 | 0.750 | 0.724 | 0.712 | 0.733 | 0.722 | |
J48 | 0.607 | 0.658 | 0.629 | 0.651 | 0.798 | 0.717 | 0.743 | 0.733 | 0.738 | |
Random Forest | 0.660 | 0.747 | 0.670 | *0.792 | *0.817 | *0.758 | 0.646 | 0.752 | 0.695 | |
Random Tree | 0.689 | 0.684 | 0.686 | 0.730 | 0.750 | 0.739 | 0.757 | 0.762 | *0.760 | |
GR | Bayes Net | 0.632 | 0.481 | 0.517 | 0.674 | 0.519 | 0.570 | 0.733 | 0.752 | 0.741 |
Naïve Bayes | 0.595 | 0.595 | 0.595 | 0.674 | 0.519 | 0.570 | 0.678 | 0.495 | 0.549 | |
Logistic | 0.544 | 0.544 | 0.544 | 0.641 | 0.740 | 0.687 | 0.628 | 0.657 | 0.642 | |
MLP | 0.701 | 0.722 | 0.710 | 0.707 | 0.750 | 0.724 | 0.712 | 0.733 | 0.722 | |
J48 | 0.607 | 0.658 | 0.629 | 0.651 | 0.798 | 0.717 | 0.743 | 0.733 | 0.738 | |
Random Forest | 0.660 | 0.747 | 0.670 | 0.792 | 0.817 | 0.758 | 0.646 | 0.752 | 0.695 | |
Random Tree | 0.689 | 0.684 | 0.686 | 0.730 | 0.750 | 0.739 | 0.757 | 0.762 | *0.760 | |
IG | Bayes Net | 0.632 | 0.481 | 0.517 | 0.674 | 0.519 | 0.570 | 0.733 | 0.752 | 0.741 |
Naïve Bayes | 0.595 | 0.595 | 0.595 | 0.674 | 0.519 | 0.570 | 0.678 | 0.495 | 0.549 | |
Logistic | 0.544 | 0.544 | 0.544 | 0.641 | 0.740 | 0.687 | 0.628 | 0.657 | 0.642 | |
MLP | 0.701 | 0.722 | 0.710 | 0.707 | 0.750 | 0.724 | 0.712 | 0.733 | 0.722 | |
J48 | 0.607 | 0.658 | 0.629 | 0.651 | 0.798 | 0.717 | 0.743 | 0.733 | 0.738 | |
Random Forest | 0.660 | 0.747 | 0.670 | *0.792 | *0.817 | *0.758 | 0.646 | 0.752 | 0.695 | |
Random Tree | 0.689 | 0.684 | 0.686 | 0.730 | 0.750 | 0.739 | 0.757 | 0.762 | *0.760 | |
OneR | Bayes Net | 0.632 | 0.481 | 0.517 | 0.674 | 0.519 | 0.570 | 0.707 | 0.771 | 0.730 |
Naïve Bayes | 0.595 | 0.595 | 0.595 | 0.674 | 0.519 | 0.570 | 0.652 | 0.790 | 0.715 | |
Logistic | 0.544 | 0.544 | 0.544 | 0.641 | 0.740 | 0.687 | 0.654 | *0.800 | 0.720 | |
MLP | 0.701 | 0.722 | 0.710 | 0.707 | 0.750 | 0.724 | 0.712 | 0.733 | 0.722 | |
J48 | 0.607 | 0.658 | 0.629 | 0.651 | 0.798 | 0.717 | 0.743 | 0.733 | 0.738 | |
Random Forest | 0.660 | 0.747 | 0.670 | 0.792 | 0.817 | 0.758 | 0.714 | 0.714 | 0.714 | |
Random Tree | 0.689 | 0.684 | 0.686 | 0.730 | 0.750 | 0.739 | 0.692 | 0.686 | 0.689 | |
PCA | Bayes Net | 0.560 | 0.544 | 0.552 | 0.648 | 0.683 | 0.665 | 0.733 | 0.752 | 0.741 |
Naïve Bayes | 0.590 | 0.620 | 0.604 | 0.648 | 0.683 | 0.665 | 0.679 | 0.619 | 0.645 | |
Logistic | 0.550 | 0.633 | 0.589 | 0.648 | 0.779 | 0.707 | 0.644 | 0.743 | 0.690 | |
MLP | 0.569 | 0.532 | 0.549 | 0.648 | 0.779 | 0.707 | 0.621 | 0.629 | 0.625 | |
J48 | 0.633 | 0.620 | 0.627 | 0.651 | 0.798 | 0.717 | 0.652 | 0.790 | 0.715 | |
Random Forest | 0.564 | 0.696 | 0.623 | 0.649 | 0.788 | 0.712 | 0.648 | 0.762 | 0.700 | |
Random Tree | 0.653 | 0.684 | 0.666 | 0.720 | 0.731 | 0.725 | 0.683 | 0.629 | 0.652 |
Feature selection method | ML algorithm | BF | Proposed feature | |||||||
---|---|---|---|---|---|---|---|---|---|---|
SLF | SLF + DSF | |||||||||
Prec | Rec | F-meas | Prec | Rec | F-meas | Prec | Rec | F-meas | ||
None | Bayes Net | 0.664 | 0.664 | 0.664 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 |
Naïve Bayes | 0.701 | 0.770 | 0.735 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 | |
Logistic | 0.700 | 0.752 | 0.724 | 0.751 | 0.796 | 0.772 | 0.779 | 0.847 | 0.801 | |
MLP | 0.739 | 0.796 | 0.761 | 0.782 | 0.810 | 0.795 | 0.770 | 0.810 | 0.788 | |
J48 | 0.681 | 0.761 | 0.719 | *0.796 | *0.847 | *0.811 | 0.779 | 0.847 | 0.801 | |
Random Forest | 0.689 | 0.814 | 0.747 | 0.740 | 0.847 | 0.790 | 0.740 | 0.847 | 0.790 | |
Random Tree | 0.736 | 0.708 | 0.721 | 0.773 | 0.788 | 0.781 | 0.776 | 0.766 | 0.771 | |
CA | Bayes Net | 0.707 | 0.770 | 0.735 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 |
Naïve Bayes | 0.664 | 0.664 | 0.664 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 | |
Logistic | 0.700 | 0.752 | 0.724 | 0.751 | 0.796 | 0.772 | 0.779 | 0.847 | 0.801 | |
MLP | 0.739 | 0.796 | 0.761 | 0.782 | 0.810 | 0.795 | 0.770 | 0.810 | 0.788 | |
J48 | 0.681 | 0.761 | 0.719 | *0.796 | *0.847 | 0.811 | 0.779 | 0.847 | 0.801 | |
Random Forest | 0.689 | 0.814 | 0.747 | 0.740 | *0.847 | 0.790 | 0.740 | 0.847 | 0.790 | |
Random Tree | 0.736 | 0.708 | 0.721 | 0.773 | 0.788 | 0.781 | 0.776 | 0.766 | 0.771 | |
GR | Bayes Net | 0.707 | 0.770 | 0.735 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 |
Naïve Bayes | 0.664 | 0.664 | 0.664 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 | |
Logistic | 0.700 | 0.752 | 0.724 | 0.751 | 0.796 | 0.772 | 0.779 | 0.847 | 0.801 | |
MLP | 0.739 | 0.796 | 0.761 | 0.782 | 0.810 | 0.795 | 0.770 | 0.81 | 0.788 | |
J48 | 0.681 | 0.761 | 0.719 | *0.796 | *0.847 | *0.811 | 0.779 | 0.847 | 0.801 | |
Random Forest | 0.689 | 0.814 | 0.747 | 0.740 | *0.847 | 0.790 | 0.740 | 0.847 | 0.790 | |
Random Tree | 0.736 | 0.708 | 0.721 | 0.773 | 0.788 | 0.781 | 0.776 | 0.766 | 0.771 | |
IG | Bayes Net | 0.707 | 0.770 | 0.735 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 |
Naïve Bayes | 0.664 | 0.664 | 0.664 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 | |
Logistic | 0.700 | 0.752 | 0.724 | 0.751 | 0.796 | 0.772 | 0.779 | 0.847 | 0.801 | |
MLP | 0.739 | 0.796 | 0.761 | 0.782 | 0.810 | 0.795 | 0.770 | 0.810 | 0.788 | |
J48 | 0.681 | 0.761 | 0.719 | *0.796 | *0.847 | *0.811 | 0.779 | 0.847 | 0.801 | |
Random Forest | 0.689 | 0.814 | 0.747 | 0.740 | *0.847 | 0.790 | 0.740 | 0.847 | 0.790 | |
Random Tree | 0.736 | 0.708 | 0.721 | 0.773 | 0.788 | 0.781 | 0.776 | 0.766 | 0.771 | |
OneR | Bayes Net | 0.664 | 0.664 | 0.664 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 |
Naïve Bayes | 0.728 | 0.717 | 0.722 | 0.764 | 0.759 | 0.762 | 0.786 | 0.818 | 0.800 | |
Logistic | 0.700 | 0.752 | 0.724 | 0.751 | 0.796 | 0.772 | 0.779 | 0.847 | 0.801 | |
MLP | 0.739 | 0.796 | 0.761 | 0.782 | 0.810 | 0.795 | 0.77 | 0.81 | 0.788 | |
J48 | 0.681 | 0.761 | 0.719 | *0.796 | *0.847 | *0.811 | 0.779 | 0.847 | 0.801 | |
Random Forest | 0.689 | 0.814 | 0.747 | 0.740 | *0.847 | 0.790 | 0.74 | 0.847 | 0.790 | |
Random Tree | 0.736 | 0.708 | 0.721 | 0.773 | 0.788 | 0.781 | 0.776 | 0.766 | 0.771 | |
PCA | Bayes Net | 0.681 | 0.761 | 0.719 | 0.770 | 0.810 | 0.788 | 0.806 | *0.854 | 0.816 |
Naïve Bayes | 0.749 | 0.743 | 0.746 | 0.770 | 0.810 | 0.788 | 0.806 | *0.854 | 0.816 | |
Logistic | 0.688 | 0.805 | 0.742 | 0.740 | 0.847 | 0.790 | 0.740 | 0.847 | 0.790 | |
MLP | 0.700 | 0.752 | 0.724 | 0.742 | 0.759 | 0.750 | 0.782 | 0.832 | 0.802 | |
J48 | 0.691 | 0.823 | 0.751 | 0.738 | 0.832 | 0.782 | 0.740 | 0.847 | 0.790 | |
Random Forest | 0.688 | 0.805 | 0.742 | 0.740 | *0.847 | 0.790 | 0.740 | 0.847 | 0.790 | |
Random Tree | 0.741 | 0.752 | 0.747 | 0.766 | 0.766 | 0.766 | *0.825 | 0.839 | *0.831 |
Study | Rule for determining pivot word |
---|---|
Senticircle [37] | Simply pick word that has POS tags NN in tweet |
This study | NN + similarity algorithm |