Skip to main content

2016 | OriginalPaper | Buchkapitel

Practical Study of Subclasses of Regular Expressions in DTD and XML Schema

verfasst von : Yeting Li, Xiaolan Zhang, Feifei Peng, Haiming Chen

Erschienen in: Web Technologies and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

DTD and XSD are two popular schema languages widely used in XML documents. Most content models used in DTD and XSD essentially consist of restricted subclasses of regular expressions. However, existing subclasses of content models are all defined on standard regular expressions without considering counting and interleaving. Through the investigation on the real world data, this paper introduces a new subclass of regular expressions with counting and interleaving. Then we give a practical study on this new subclass and five already known subclasses of content models. One distinguishing feature of this paper is that the data set is sufficiently large compared with previous relevant work. Therefore our results are more accurate. In addition, based on this large data set, we analyze the different features of regular expressions used in practice. Meanwhile, we are the first to simultaneously inspect the usage of the five subclasses and analyze different reasons dissatisfying the corresponding definitions. Furthermore, since W3C standard requires the content models to be deterministic, the determinism of content models is also tested by our validation tools.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, Burlington (2000) Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, Burlington (2000)
2.
Zurück zum Zitat Bala, S.: Intersection of regular languages and star hierarchy. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 159–169. Springer, Heidelberg (2002)CrossRef Bala, S.: Intersection of regular languages and star hierarchy. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 159–169. Springer, Heidelberg (2002)CrossRef
4.
Zurück zum Zitat Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. ACM Trans. Web (TWEB) 4(4), 14 (2010) Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. ACM Trans. Web (TWEB) 4(4), 14 (2010)
5.
Zurück zum Zitat Bex, G.J., Martens, W., Neven, F., Schwentick, T.: Expressiveness of XSDs: from practice to theory, there and back again. In: Proceedings of the 14th International Conference on World Wide Web, pp. 712–721. ACM (2005) Bex, G.J., Martens, W., Neven, F., Schwentick, T.: Expressiveness of XSDs: from practice to theory, there and back again. In: Proceedings of the 14th International Conference on World Wide Web, pp. 712–721. ACM (2005)
6.
Zurück zum Zitat Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: a practical study. In: Proceedings of the 7th International Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS 2004, pp. 79–84. ACM (2004) Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: a practical study. In: Proceedings of the 7th International Workshop on the Web and Databases: Colocated with ACM SIGMOD/PODS 2004, pp. 79–84. ACM (2004)
7.
Zurück zum Zitat Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 115–126. VLDB Endowment (2006) Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 115–126. VLDB Endowment (2006)
8.
Zurück zum Zitat Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. (TODS) 35(2), 11 (2010)CrossRef Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. (TODS) 35(2), 11 (2010)CrossRef
9.
Zurück zum Zitat Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 998–1009. VLDB Endowment (2007) Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 998–1009. VLDB Endowment (2007)
10.
Zurück zum Zitat Björklund, H., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1541–1550. ACM (2015) Björklund, H., Martens, W., Timm, T.: Efficient incremental evaluation of succinct regular expressions. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1541–1550. ACM (2015)
12.
Zurück zum Zitat Che, D., Aberer, K., Özsu, M.T.: Query optimization in XML structured-document databases. VLDB J. 15(3), 263–289 (2006)CrossRef Che, D., Aberer, K., Özsu, M.T.: Query optimization in XML structured-document databases. VLDB J. 15(3), 263–289 (2006)CrossRef
14.
Zurück zum Zitat Choi, B.: What are real DTDs like. Technical reports (CIS), p. 17 (2002) Choi, B.: What are real DTDs like. Technical reports (CIS), p. 17 (2002)
15.
Zurück zum Zitat Feng, X.Q., Zheng, L.X., Chen, H.M.: Inference algorithm for a restricted class of regular expressions. Comput. Sci. 41(4), 178–183 (2014) Feng, X.Q., Zheng, L.X., Chen, H.M.: Inference algorithm for a restricted class of regular expressions. Comput. Sci. 41(4), 178–183 (2014)
16.
Zurück zum Zitat Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: weak versus strong determinism. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 369–381. Springer, Heidelberg (2009)CrossRef Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: weak versus strong determinism. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 369–381. Springer, Heidelberg (2009)CrossRef
17.
Zurück zum Zitat Ghelli, G., Colazzo, D., Sartiani, C.: Efficient inclusion for a class of XML types with interleaving and counting. In: Arenas, M. (ed.) DBPL 2007. LNCS, vol. 4797, pp. 231–245. Springer, Heidelberg (2007)CrossRef Ghelli, G., Colazzo, D., Sartiani, C.: Efficient inclusion for a class of XML types with interleaving and counting. In: Arenas, M. (ed.) DBPL 2007. LNCS, vol. 4797, pp. 231–245. Springer, Heidelberg (2007)CrossRef
18.
Zurück zum Zitat Ghelli, G., Colazzo, D., Sartiani, C.: Linear time membership in a class of regular expressions with interleaving and counting. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 389–398. ACM (2008) Ghelli, G., Colazzo, D., Sartiani, C.: Linear time membership in a class of regular expressions with interleaving and counting. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 389–398. ACM (2008)
19.
Zurück zum Zitat Kilpeläinen, P.: Checking determinism of XML schema content models in optimal time. Inf. Syst. 36(3), 596–617 (2011)CrossRef Kilpeläinen, P.: Checking determinism of XML schema content models in optimal time. Inf. Syst. 36(3), 596–617 (2011)CrossRef
20.
Zurück zum Zitat Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: Proceedings of the 30th International Conference on Very Large Data Bases, vol. 30, pp. 228–239. VLDB Endowment (2004) Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: Proceedings of the 30th International Conference on Very Large Data Bases, vol. 30, pp. 228–239. VLDB Endowment (2004)
21.
Zurück zum Zitat Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries on heterogeneous data sources. VLDB 1, 241–250 (2001) Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries on heterogeneous data sources. VLDB 1, 241–250 (2001)
22.
Zurück zum Zitat Martens, W., Neven, F.: Typechecking top-down uniform unranked tree transducers. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 64–78. Springer, Heidelberg (2002)CrossRef Martens, W., Neven, F.: Typechecking top-down uniform unranked tree transducers. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 64–78. Springer, Heidelberg (2002)CrossRef
23.
Zurück zum Zitat Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 23–34. ACM (2004) Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 23–34. ACM (2004)
24.
Zurück zum Zitat Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 889–900. Springer, Heidelberg (2004)CrossRef Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 889–900. Springer, Heidelberg (2004)CrossRef
25.
Zurück zum Zitat Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)MathSciNetCrossRefMATH Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)MathSciNetCrossRefMATH
26.
Zurück zum Zitat Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. (TODS) 31(3), 770–813 (2006)CrossRef Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. (TODS) 31(3), 770–813 (2006)CrossRef
27.
Zurück zum Zitat Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 35–46. ACM (2000) Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 35–46. ACM (2000)
28.
29.
Zurück zum Zitat Sperberg-McQueen, C.: Applications of Brzozowski derivatives to XML schema processing. In: Extreme Markup Languages\(\textregistered \), Citeseer (2005) Sperberg-McQueen, C.: Applications of Brzozowski derivatives to XML schema processing. In: Extreme Markup Languages\(\textregistered \), Citeseer (2005)
30.
Zurück zum Zitat Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML schema part 1: structures. 2nd edn. W3C Recommendation (2004) Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N.: XML schema part 1: structures. 2nd edn. W3C Recommendation (2004)
31.
Zurück zum Zitat Wang, G., Liu, M., Yu, G., Sun, B., Yu, G., Lv, J., Lu, H.: Effective schema-based XML query optimization techniques. In: 2003 Proceedings of Seventh International Database Engineering and Applications Symposium, pp. 230–235. IEEE (2003) Wang, G., Liu, M., Yu, G., Sun, B., Yu, G., Lv, J., Lu, H.: Effective schema-based XML query optimization techniques. In: 2003 Proceedings of Seventh International Database Engineering and Applications Symposium, pp. 230–235. IEEE (2003)
Metadaten
Titel
Practical Study of Subclasses of Regular Expressions in DTD and XML Schema
verfasst von
Yeting Li
Xiaolan Zhang
Feifei Peng
Haiming Chen
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-45817-5_29