Practical Study of Subclasses of Regular Expressions in DTD and XML Schema

2016 
DTD and XSD are two popular schema languages widely used in XML documents. Most content models used in DTD and XSD essentially consist of restricted subclasses of regular expressions. However, existing subclasses of content models are all defined on standard regular expressions without considering counting and interleaving. Through the investigation on the real world data, this paper introduces a new subclass of regular expressions with counting and interleaving. Then we give a practical study on this new subclass and five already known subclasses of content models. One distinguishing feature of this paper is that the data set is sufficiently large compared with previous relevant work. Therefore our results are more accurate. In addition, based on this large data set, we analyze the different features of regular expressions used in practice. Meanwhile, we are the first to simultaneously inspect the usage of the five subclasses and analyze different reasons dissatisfying the corresponding definitions. Furthermore, since W3C standard requires the content models to be deterministic, the determinism of content models is also tested by our validation tools.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    9
    Citations
    NaN
    KQI
    []