Skip to main content

2019 | OriginalPaper | Buchkapitel

A Large-Scale Repository of Deterministic Regular Expression Patterns and Its Applications

verfasst von : Haiming Chen, Yeting Li, Chunmei Dong, Xinyu Chu, Xiaoying Mou, Weidong Min

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deterministic regular expressions (DREs) have been used in a myriad of areas in data management. However, to the best of our knowledge, presently there has been no large-scale repository of DREs in the literature. In this paper, based on a large corpus of data that we harvested from the Web, we build a large-scale repository of DREs by first collecting a repository after analyzing determinism of the real data; and then further processing the data by using normalized DREs to construct a compact repository of DREs, called DRE pattern set. At last we use our DRE patterns as benchmark datasets in several algorithms that have lacked experiments on real DRE data before. Experimental results demonstrate the usefulness of the repository.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The number of total is not equal to the sum of DTD, XSD, RNG and RegExLib, because there exist duplicate DREs among the different types of files.
 
Literatur
4.
Zurück zum Zitat Abiteboul, S., Milo, T., Benjelloun, O.: Regular rewriting of active XML and unambiguity. In: PODS 2005, pp. 295–303. ACM (2005) Abiteboul, S., Milo, T., Benjelloun, O.: Regular rewriting of active XML and unambiguity. In: PODS 2005, pp. 295–303. ACM (2005)
5.
Zurück zum Zitat Barbosa, D., Mignet, L., Veltri, P.: Studying the XML Web: gathering statistics from an XML sample. World Wide Web 9(2), 187–212 (2006)CrossRef Barbosa, D., Mignet, L., Veltri, P.: Studying the XML Web: gathering statistics from an XML sample. World Wide Web 9(2), 187–212 (2006)CrossRef
6.
Zurück zum Zitat Bex, G.J., Martens, W., Neven, F., Schwentick, T.: Expressiveness of XSDs: from practice to theory, there and back again. In: WWW 2005, pp. 712–721. ACM (2005) Bex, G.J., Martens, W., Neven, F., Schwentick, T.: Expressiveness of XSDs: from practice to theory, there and back again. In: WWW 2005, pp. 712–721. ACM (2005)
7.
Zurück zum Zitat Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: a practical study. In: WebDB 2004, pp. 79–84. ACM (2004) Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: a practical study. In: WebDB 2004, pp. 79–84. ACM (2004)
8.
Zurück zum Zitat Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: VLDB 2006, pp. 115–126. VLDB Endowment (2006) Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: VLDB 2006, pp. 115–126. VLDB Endowment (2006)
9.
Zurück zum Zitat Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: VLDB 2007, pp. 998–1009 (2007) Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: VLDB 2007, pp. 998–1009 (2007)
10.
Zurück zum Zitat Björklund, H., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: CIKM 2015, pp. 1541–1550. ACM (2015) Björklund, H., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: CIKM 2015, pp. 1541–1550. ACM (2015)
14.
Zurück zum Zitat Choi, B.: What are real DTDs like. Technical reports (CIS), p. 17 (2002) Choi, B.: What are real DTDs like. Technical reports (CIS), p. 17 (2002)
16.
Zurück zum Zitat Colazzo, D., Ghelli, G., Pardini, L., Sartiani, C.: Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking. Theor. Comput. Sci. 492(2013), 88–116 (2013)MathSciNetCrossRefMATH Colazzo, D., Ghelli, G., Pardini, L., Sartiani, C.: Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking. Theor. Comput. Sci. 492(2013), 88–116 (2013)MathSciNetCrossRefMATH
17.
Zurück zum Zitat Colazzo, D., Ghelli, G., Sartiani, C.: Linear time membership in a class of regular expressions with counting, interleaving, and unordered concatenation. ACM Trans. Database Syst. (TODS) 42(4), 24 (2017)MathSciNetCrossRef Colazzo, D., Ghelli, G., Sartiani, C.: Linear time membership in a class of regular expressions with counting, interleaving, and unordered concatenation. ACM Trans. Database Syst. (TODS) 42(4), 24 (2017)MathSciNetCrossRef
18.
Zurück zum Zitat Freydenberger, D.D., Kötzing, T.: Fast learning of restricted regular expressions and DTDs. Theory Comput. Syst. 57, 1114–1158 (2015)MathSciNetCrossRefMATH Freydenberger, D.D., Kötzing, T.: Fast learning of restricted regular expressions and DTDs. Theory Comput. Syst. 57, 1114–1158 (2015)MathSciNetCrossRefMATH
19.
Zurück zum Zitat Grijzenhout, S., Marx, M.: The quality of the XML web. In: CIKM 2011, pp. 1719–1724 (2011) Grijzenhout, S., Marx, M.: The quality of the XML web. In: CIKM 2011, pp. 1719–1724 (2011)
20.
Zurück zum Zitat Huang, X., Bao, Z., Davidson, S.B., Milo, T., Yuan, X.: Answering regular path queries on workflow provenance, pp. 375–386. IEEE (2015) Huang, X., Bao, Z., Davidson, S.B., Milo, T., Yuan, X.: Answering regular path queries on workflow provenance, pp. 375–386. IEEE (2015)
21.
Zurück zum Zitat Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: WebDB 2013, pp. 13–18 (2013) Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: WebDB 2013, pp. 13–18 (2013)
22.
Zurück zum Zitat Kilpeläinen, P.: Checking determinism of XML Schema content models in optimal time. Inf. Syst. 36(3), 596–617 (2011)CrossRef Kilpeläinen, P.: Checking determinism of XML Schema content models in optimal time. Inf. Syst. 36(3), 596–617 (2011)CrossRef
23.
Zurück zum Zitat Laender, A.H., Moro, M.M., Nascimento, C., Martins, P.: An X-ray on web-available XML schemas. ACM SIGMOD Rec. 38(1), 37–42 (2009)CrossRef Laender, A.H., Moro, M.M., Nascimento, C., Martins, P.: An X-ray on web-available XML schemas. ACM SIGMOD Rec. 38(1), 37–42 (2009)CrossRef
24.
Zurück zum Zitat Li, Y., Chu, X., Mou, X., Dong, C., Chen, H.: Practical study of deterministic regular expressions from large-scale XML and schema files. In: IDEAS 2018, pp. 45–53. ACM (2018) Li, Y., Chu, X., Mou, X., Dong, C., Chen, H.: Practical study of deterministic regular expressions from large-scale XML and schema files. In: IDEAS 2018, pp. 45–53. ACM (2018)
26.
Zurück zum Zitat Losemann, K., Martens, W.: The complexity of regular expressions and property paths in SPARQL. ACM Trans. Database Syst. 38(4), 24:1–24:39 (2013)MathSciNetCrossRefMATH Losemann, K., Martens, W.: The complexity of regular expressions and property paths in SPARQL. ACM Trans. Database Syst. 38(4), 24:1–24:39 (2013)MathSciNetCrossRefMATH
30.
Zurück zum Zitat Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema part 1: structures second edition. W3C Recommendation (2004) Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema part 1: structures second edition. W3C Recommendation (2004)
Metadaten
Titel
A Large-Scale Repository of Deterministic Regular Expression Patterns and Its Applications
verfasst von
Haiming Chen
Yeting Li
Chunmei Dong
Xinyu Chu
Xiaoying Mou
Weidong Min
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-16142-2_20