Skip to main content

2017 | OriginalPaper | Buchkapitel

On Making Skyline Queries Resistant to Outliers

verfasst von : Hélène Jaudoin, Pierre Nerzic, Olivier Pivert, Daniel Rocacher

Erschienen in: Advances in Knowledge Discovery and Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper deals with the issue of retrieving the most preferred objects (in the sense of Skyline queries, i.e., of Pareto ordering) from a collection involving outliers. Indeed, many real-world datasets, for instance from ad sales websites, contain odd data and it is important to limit the impact of such odd data (outliers) on the result of skyline queries, and prevent them from hiding more interesting points. The approach we propose relies on the notion of fuzzy typicality and makes it possible to compute a graded skyline where each answer is associated with both a degree of membership to the skyline and a typicality degree. A GPU-based parallel implementation of the algorithm is described and experimental results are presented, which show the scalability of the approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bartolini, I., Ciaccia, P., & Patella, M. (2008). Efficient sort-based skyline evaluation. ACM Transactions on Database Systems, 33(4), 1–49.CrossRef Bartolini, I., Ciaccia, P., & Patella, M. (2008). Efficient sort-based skyline evaluation. ACM Transactions on Database Systems, 33(4), 1–49.CrossRef
Zurück zum Zitat Bøgh, K. S., Assent, I., & Magnani, M. (2013). Efficient GPU-based skyline computation. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN 2013 (pp. 5:1–5:6). New York: ACM. Bøgh, K. S., Assent, I., & Magnani, M. (2013). Efficient GPU-based skyline computation. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN 2013 (pp. 5:1–5:6). New York: ACM.
Zurück zum Zitat Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of ICDE 2001 (pp. 421–430). Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of ICDE 2001 (pp. 421–430).
Zurück zum Zitat Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006a). Finding k-dominant skylines in high dimensional space. In Proceedings of SIGMOD 2006 (pp. 503–514). Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006a). Finding k-dominant skylines in high dimensional space. In Proceedings of SIGMOD 2006 (pp. 503–514).
Zurück zum Zitat Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006b). On high dimensional skylines. In Proceedings of EDBT 2006. LNCS (Vol. 3896, pp. 478–495). Chan, C., Jagadish, H., Tan, K., Tung, A., & Zhang, Z. (2006b). On high dimensional skylines. In Proceedings of EDBT 2006. LNCS (Vol. 3896, pp. 478–495).
Zurück zum Zitat Choi, W., Liu, L., & Yu, B. (2012). Multi-criteria decision making with skyline computation. In 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI) (pp. 316–323). Choi, W., Liu, L., & Yu, B. (2012). Multi-criteria decision making with skyline computation. In 2012 IEEE 13th International Conference on Information Reuse and Integration (IRI) (pp. 316–323).
Zurück zum Zitat Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2003). Skyline with presorting. In Proceedings of ICDE 2003 (pp. 717–719). Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2003). Skyline with presorting. In Proceedings of ICDE 2003 (pp. 717–719).
Zurück zum Zitat Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2005). Skyline with presorting: Theory and optimizations. In Proceedings of IIS 2005 (pp. 595–604). Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2005). Skyline with presorting: Theory and optimizations. In Proceedings of IIS 2005 (pp. 595–604).
Zurück zum Zitat Cole, M. (1991). Algorithmic skeletons: Structured management of parallel computation. Cambridge: MIT Press.MATH Cole, M. (1991). Algorithmic skeletons: Structured management of parallel computation. Cambridge: MIT Press.MATH
Zurück zum Zitat Dubois, D. & Prade, H. (1984). On data summarization with fuzzy sets. In Proceedings of IFSA 1993 (pp. 465–468). Dubois, D. & Prade, H. (1984). On data summarization with fuzzy sets. In Proceedings of IFSA 1993 (pp. 465–468).
Zurück zum Zitat Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA (pp. 226–231). AAAI Press. Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA (pp. 226–231). AAAI Press.
Zurück zum Zitat Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.CrossRef Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.CrossRef
Zurück zum Zitat Friedman, M., Ming, M., & Kandel, A. (1995). On the theory of typicality. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 3(2), 127–142.MathSciNetCrossRefMATH Friedman, M., Ming, M., & Kandel, A. (1995). On the theory of typicality. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 3(2), 127–142.MathSciNetCrossRefMATH
Zurück zum Zitat Gabel, M., Keren, D., & Schuster, A. (2013). Communication-efficient outlier detection for scale-out systems. In Proceedings of the First International Workshop on Big Dynamic Distributed Data, Riva del Garda, Italy, 30 August 2013 (pp. 19–24). Gabel, M., Keren, D., & Schuster, A. (2013). Communication-efficient outlier detection for scale-out systems. In Proceedings of the First International Workshop on Big Dynamic Distributed Data, Riva del Garda, Italy, 30 August 2013 (pp. 19–24).
Zurück zum Zitat Goncalves, M., & Tineo, L. (2007). Fuzzy dominance skyline queries. In R. Wagner, N. Revell & G. Pernul (Eds.), Proceedings 18th International Conference on Database and Expert Systems Applications, DEXA 2007, Regensburg, Germany, 3–7 September 2007 (Vol. 4653, pp. 469–478). Lecture notes in computer science. Heidelberg: Springer. Goncalves, M., & Tineo, L. (2007). Fuzzy dominance skyline queries. In R. Wagner, N. Revell & G. Pernul (Eds.), Proceedings 18th International Conference on Database and Expert Systems Applications, DEXA 2007, Regensburg, Germany, 3–7 September 2007 (Vol. 4653, pp. 469–478). Lecture notes in computer science. Heidelberg: Springer.
Zurück zum Zitat Gupta, M., Gao, J., and Han, J. (2013). Community distribution outlier detection in heterogeneous information networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (eds.), Machine learning and knowledge discovery in databases (Vol. 8188, pp. 557–573). Lecture notes in computer science. Berlin: Springer. Gupta, M., Gao, J., and Han, J. (2013). Community distribution outlier detection in heterogeneous information networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (eds.), Machine learning and knowledge discovery in databases (Vol. 8188, pp. 557–573). Lecture notes in computer science. Berlin: Springer.
Zurück zum Zitat Hadjali, A., Pivert, O., & Prade, H. (2011). On different types of fuzzy skylines. In Proceedings of ISMIS 2011 (pp. 581–591). Hadjali, A., Pivert, O., & Prade, H. (2011). On different types of fuzzy skylines. In Proceedings of ISMIS 2011 (pp. 581–591).
Zurück zum Zitat Hampton, J. (1988). Overextension of conjunctive concepts: evidence for a unitary model of concept typicality and class inclusion. Journal of Experimental Psychology: Learning Memory and Cognition, 14(1), 12–32. Hampton, J. (1988). Overextension of conjunctive concepts: evidence for a unitary model of concept typicality and class inclusion. Journal of Experimental Psychology: Learning Memory and Cognition, 14(1), 12–32.
Zurück zum Zitat Harris, M. (2007). Optimizing parallel reduction in cuda. Technical report, nVidia. Harris, M. (2007). Optimizing parallel reduction in cuda. Technical report, nVidia.
Zurück zum Zitat Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.CrossRefMATH Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.CrossRefMATH
Zurück zum Zitat Ji, T., Yang, D., & Gao, J. (2013). Incremental local evolutionary outlier detection for dynamic social networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (Eds.), Machine learning and knowledge discovery in databases (Vol. 8189, pp. 1–15). Lecture notes in computer science. Berlin: Springer. Ji, T., Yang, D., & Gao, J. (2013). Incremental local evolutionary outlier detection for dynamic social networks. In H. Blockeel, K. Kersting, S. Nijssen & F. Z̆elezný (Eds.), Machine learning and knowledge discovery in databases (Vol. 8189, pp. 1–15). Lecture notes in computer science. Berlin: Springer.
Zurück zum Zitat Lee, K. C. K., Zheng, B., Li, H., & Lee, W.-C. (2007). Approaching the skyline in z order. In VLDB (pp. 279–290). Lee, K. C. K., Zheng, B., Li, H., & Lee, W.-C. (2007). Approaching the skyline in z order. In VLDB (pp. 279–290).
Zurück zum Zitat Lesot, M. (2006). Typicality-based clustering. International Journal of Information technology and Intelligent Computing, 12, 279–292. Lesot, M. (2006). Typicality-based clustering. International Journal of Information technology and Intelligent Computing, 12, 279–292.
Zurück zum Zitat Lin, X., Yuan, Y., Zhang, Q., & Zhang, Y. (2007). Selecting stars: The k most representative skyline operator. In Proceedings of the ICDE 2007 (pp. 86–95). Lin, X., Yuan, Y., Zhang, Q., & Zhang, Y. (2007). Selecting stars: The k most representative skyline operator. In Proceedings of the ICDE 2007 (pp. 86–95).
Zurück zum Zitat Niu, Z., Shi, S., Sun, J., & He, X. (2011). A survey of outlier detection methodologies and their applications. In Proceedings of Third International Conference Artificial Intelligence and Computational Intelligence, Part I, AICI 2011, Taiyuan, China, September 24–25, 2011 (pp. 380–387). Niu, Z., Shi, S., Sun, J., & He, X. (2011). A survey of outlier detection methodologies and their applications. In Proceedings of Third International Conference Artificial Intelligence and Computational Intelligence, Part I, AICI 2011, Taiyuan, China, September 24–25, 2011 (pp. 380–387).
Zurück zum Zitat Osherson, D., & Smith, E. (1997). On typicality and vagueness. Cognition, 64, 189–206.CrossRef Osherson, D., & Smith, E. (1997). On typicality and vagueness. Cognition, 64, 189–206.CrossRef
Zurück zum Zitat Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 2005.CrossRef Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 2005.CrossRef
Zurück zum Zitat Park, S., Kim, T., Park, J., Kim, J., & Im, H. (2009). Parallel skyline computation on multicore architectures. In ICDE (pp. 760–771). Park, S., Kim, T., Park, J., Kim, J., & Im, H. (2009). Parallel skyline computation on multicore architectures. In ICDE (pp. 760–771).
Zurück zum Zitat Pivert, O., & Bosc, P. (2012). Fuzzy preference queries to relational databases. London: Imperial College Press.CrossRefMATH Pivert, O., & Bosc, P. (2012). Fuzzy preference queries to relational databases. London: Imperial College Press.CrossRefMATH
Zurück zum Zitat Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B., and Lepailleur, A. (2014). Mining (soft-) skypatterns using dynamic CSP. In H. Simonis (ed.), Proceedings of 11th International Conference on Integration of AI and OR Techniques in Constraint Programming, CPAIOR 2014, Cork, Ireland, 19–23 May 2014 (Vol. 8451, pp. 71–87). Lecture notes in computer science. Cham: Springer. Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B., and Lepailleur, A. (2014). Mining (soft-) skypatterns using dynamic CSP. In H. Simonis (ed.), Proceedings of 11th International Conference on Integration of AI and OR Techniques in Constraint Programming, CPAIOR 2014, Cork, Ireland, 19–23 May 2014 (Vol. 8451, pp. 71–87). Lecture notes in computer science. Cham: Springer.
Zurück zum Zitat Rosch, E., & Mervis, C. (1975). Family resemblance: Studies in the internal structure of categories. Cogn. Psychol., 7, 573–605.CrossRef Rosch, E., & Mervis, C. (1975). Family resemblance: Studies in the internal structure of categories. Cogn. Psychol., 7, 573–605.CrossRef
Zurück zum Zitat Tan, K.-L., Eng, P.-K., & Ooi, B. C. (2001). Efficient progressive skyline computation. In Proceedings of VLDB 2001 (pp. 301–310). Tan, K.-L., Eng, P.-K., & Ooi, B. C. (2001). Efficient progressive skyline computation. In Proceedings of VLDB 2001 (pp. 301–310).
Zurück zum Zitat Tao, Y., Ding, L., Lin, X., & Pei, J. (2009). Distance-based representative skyline. In Y. E. Ioannidis, D. L. Lee & R. T. Ng (Eds.), ICDE (pp. 892–903). IEEE. Tao, Y., Ding, L., Lin, X., & Pei, J. (2009). Distance-based representative skyline. In Y. E. Ioannidis, D. L. Lee & R. T. Ng (Eds.), ICDE (pp. 892–903). IEEE.
Zurück zum Zitat Yager, R. (1997). A note on a fuzzy measure of typicality. International Journal of Intelligent Systems, 12(3), 233–249.CrossRefMATH Yager, R. (1997). A note on a fuzzy measure of typicality. International Journal of Intelligent Systems, 12(3), 233–249.CrossRefMATH
Zurück zum Zitat Zadeh, L. A. (1982). A note on prototype theory and fuzzy sets. Cognition, 12, 291–298. Zadeh, L. A. (1982). A note on prototype theory and fuzzy sets. Cognition, 12, 291–298.
Zurück zum Zitat Zadeh, L. A. (1984). A computational theory of dispositions. In Y. Wilks (Ed.), COLING (pp. 312–318). ACL. Zadeh, L. A. (1984). A computational theory of dispositions. In Y. Wilks (Ed.), COLING (pp. 312–318). ACL.
Zurück zum Zitat Zadeh, L. (1987). A computational theory of dispositions. International Journal of Intelligent Systems, 2, 39–63.MATH Zadeh, L. (1987). A computational theory of dispositions. International Journal of Intelligent Systems, 2, 39–63.MATH
Zurück zum Zitat Zhang, J. (2013). Advancements of outlier detection: A survey. EAI Endorsed Transactions on Scalable Information Systems, 1, e2.CrossRef Zhang, J. (2013). Advancements of outlier detection: A survey. EAI Endorsed Transactions on Scalable Information Systems, 1, e2.CrossRef
Zurück zum Zitat Zimek, A., Gaudet, M., Campello, R. J. G. B., & Sander, J. (2013). Subsampling for efficient and effective unsupervised outlier detection ensembles. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013 (pp. 428–436). Zimek, A., Gaudet, M., Campello, R. J. G. B., & Sander, J. (2013). Subsampling for efficient and effective unsupervised outlier detection ensembles. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013 (pp. 428–436).
Zurück zum Zitat Zimek, A., Schubert, E., & Kriegel, H. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 5(5), 363–387.MathSciNetCrossRef Zimek, A., Schubert, E., & Kriegel, H. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 5(5), 363–387.MathSciNetCrossRef
Metadaten
Titel
On Making Skyline Queries Resistant to Outliers
verfasst von
Hélène Jaudoin
Pierre Nerzic
Olivier Pivert
Daniel Rocacher
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-45763-5_2