Converting to schema: the TEI and Relax NG

2002 
The Text Encoding Initiative Guidelines are the product of an ambitious international research project dating from the early 1990s, the goal of which was to provide generic but detailed recommendations for the mark-up of electronic documents, in particular texts from the literary and linguistic domains. The project was one of the first large-scale attempts to apply then-emerging markup technologies to traditional scholarly and research concerns, and has had considerable impact, both within academia and beyond. The original TEI project concluded with a detailed publication, presented at SGML Europe in 1994, and subsequently revised in 1996. In 2001, the TEI was reorganized as a membership Consortium, and began work on a new XML-based release of its work, due for publication in April 2002. This release (P4) will be the last to retain compatibility with the original SGML version of the Guidelines. Future releases of the Guidelines, it seems probable, will be expressed using some form of schema-based markup rather than XML DTDs. In this paper we report on preliminary work undertaken into the feasibility of using the existing abstractions underlying the Guidelines to generate XML schemas, conventional DTDs, or the Relax NG schemas, according to need.We describe in some detail the literate programming system used by the TEI, in which a single XML file contains both the text of Guidelines, and structured information from which DTDs (or other formalizations) are created. We also outline how the modularity of the TEI scheme is implemented, and how users follow the ‘Chicago pizza model’ to generate an instance DTD. Finally, we describe how the current TEI Guidelines document, which was converted from SGML to XML in 2001, can now be transformed using Relax NG, and how DTDs and schemas can be derived from it. We provide examples of how the resulting system can be used, and discuss how schema features such as namespaces and datatyping can be used to make the TEI both more modular and more rigorous.The TEI is probably the most detailed and best documented public markup system yet devised. This paper reports on a further stage in its evolution.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []