1 Introduction
-
We present a BERT-based Learning-To-Rank (LTR) approach that is trained on GUI relevance data harvested using crowdsourcing techniques, which significantly outperforms all traditional ranking methods.
-
We create the first and comprehensive gold standard for NL-based GUI retrieval with crowdsourcing techniques to enable a systematic evaluation of NL-based GUI ranking models and make it publicly available to foster further research.
-
We conduct the first in-depth analysis and evaluation of various traditional Information Retrieval (IR), Automatic Query Expansion (AQE) and trained BERT-based Learning-To-Rank models for NL-based GUI ranking using the newly proposed gold standard.
-
We propose a comprehensive and general evaluation methodology including multiple metrics for measuring GUI prototyping productivity and conduct an extensive user study to assess the usefulness of the GUI prototyping approach in a practical rapid prototyping environment.
2 Approach: RaWi
2.1 GUI repository and text extraction
2.1.1 GUI filtering
2.1.2 GUI text extraction
2.2 GUI retrieval and text preprocessing
2.2.1 Baseline ranking models
2.2.2 Automatic query expansion
2.2.3 BERT-based learning-to-rank models
2.3 Deriving editable GUI screens
2.4 Prototype implementation
2.4.1 GUI search
2.4.2 GUI prototyping editor
2.4.3 GUI prototype preview
3 Experimental evaluation
-
\({\textbf {RQ}}_{{1}}\): Which retrieval method performs best for GUI retrieval on the basis of NL search queries?
-
\({\textbf {RQ}}_{{2}}\): Does RaWi increase the GUI prototyping productivity compared to a traditional prototyping tool?
-
\({\textbf {RQ}}_{{3}}\): Do users perceive RaWi as useful for rapid high-fidelity GUI prototyping?
3.1 \(\hbox {RQ}_1\): GUI retrieval performance
3.1.1 Gold standard
3.1.2 Ranking model parameters
3.1.3 Evaluation metrics
3.2 \(\hbox {RQ}_2\): productivity of rapid prototyping
3.2.1 User study design
3.2.2 User study tasks
3.2.3 Experimental procedure
3.2.4 Evaluation metrics
3.3 \(\hbox {RQ}_3\): perceived usefulness
4 Results and discussion
4.1 \(\hbox {RQ}_1\): GUI retrieval performance
P@k | NDCG@k (N@k) | |||||||
---|---|---|---|---|---|---|---|---|
P@3 | P@5 | P@7 | P@10 | N@3 | N@5 | N@10 | N@15 | |
TF-IDF | 0.223 | 0.204 | 0.181 | 0.175 | 0.329 | 0.339 | 0.395 | 0.480 |
BM25 | 0.303 | 0.276 | 0.246 | 0.226 | 0.426 | 0.441 | 0.515 | 0.579 |
nBOW | 0.270 | 0.234 | 0.220 | 0.193 | 0.395 | 0.370 | 0.374 | 0.398 |
BM25 | 0.303 | 0.276 | 0.246 | 0.226 | 0.426 | 0.441 | 0.515 | 0.579 |
+PRF | 0.313 | 0.266 | 0.259 | 0.236 | 0.432 | 0.443 | 0.520 | 0.584 |
+PRF (s) | 0.317 | 0.280 | 0.260 | 0.235 | 0.441 | 0.462 | 0.536 | 0.604 |
+PRF (w) | 0.320 | 0.276 | 0.244 | 0.231 | 0.439 | 0.417 | 0.421 | 0.446 |
+PRF (sw) | 0.317 | 0.280 | 0.256 | 0.237 | 0.429 | 0.416 | 0.426 | 0.456 |
Sentence-BERT | 0.343 | 0.314 | 0.294 | 0.267 | 0.481 | 0.511 | 0.611 | 0.667 |
BERT-LTR
(1) | 0.377 | 0.350 | 0.307 | 0.269 | 0.530 | 0.560 | 0.634 | 0.697 |
BERT-LTR (2) | 0.400 | 0.340 | 0.304 | 0.281 | 0.543 | 0.556 | 0.636 | 0.701 |
BERT-LTR (3) | 0.363 | 0.354 | 0.317 | 0.287 | 0.517 | 0.554 | 0.646 | 0.694 |
AveP | MRR | HITS@k (H@k) | ||||||
---|---|---|---|---|---|---|---|---|
H@1 | H@3 | H@5 | H@7 | H@10 | H@15 | |||
TF-IDF | 0.331 | 0.451 | 0.320 | 0.460 | 0.580 | 0.650 | 0.760 | 0.910 |
BM25 | 0.413 | 0.520 | 0.370 | 0.600 | 0.690 | 0.760 | 0.860 | 0.930 |
nBOW | 0.281 | 0.490 | 0.340 | 0.540 | 0.630 | 0.750 | 0.840 | 0.980 |
BM25 | 0.413 | 0.520 | 0.370 | 0.600 | 0.690 | 0.760 | 0.860 | 0.930 |
+PRF | 0.419 | 0.505 | 0.370 | 0.580 | 0.680 | 0.780 | 0.850 | 0.930 |
+PRF (s) | 0.427 | 0.532 | 0.380 | 0.610 | 0.700 | 0.780 | 0.880 | 0.960 |
+PRF (w) | 0.325 | 0.523 | 0.380 | 0.580 | 0.700 | 0.720 | 0.860 | 0.930 |
+PRF (sw) | 0.333 | 0.533 | 0.390 | 0.590 | 0.700 | 0.770 | 0.870 | 0.930 |
Sentence-BERT | 0.454 | 0.560 | 0.370 | 0.680 | 0.760 | 0.880 | 0.960 | 0.990 |
BERT-LTR (1) | 0.486 | 0.618 | 0.460 | 0.710 | 0.860 | 0.920 | 0.980 | 1.00 |
BERT-LTR (2) | 0.501 | 0.631 | 0.440 | 0.750 | 0.910 | 0.960 | 0.980 | 1.00 |
BERT-LTR (3) | 0.499 | 0.626 | 0.450 | 0.730 | 0.860 | 0.940 | 1.00 | 1.00 |