1   Introduction

An XML schema provides much information that can be helpful in processing the XML instance documents that it describes. This can be especially true of applications that are generic in the sense that they process XML documents of more than one type (described by different schemas) and do not know the structure of the instance document in advance.

Examples of support for writing such "generic" applications might be:

  • An application that provides a query interface for a specified XML instance document given an XML schema that describes it. The application should be able to describe the allowable structured queries that it supports. These queries and the responses to them might be implemented on top of GraphQL (http://graphql.org/learn/queries/).
  • An application that supports exploration of the shape and structure of an XML instance document given an XML schema that describes it. Given a node/element of a given type, the application should be able to present a picture of all the possible children and grandchildren etc of the nodes under it.
  • A Python script that produces a report of all the xs:complexType and xs:simpleType definitions in an XML schema.
  • A Python script that generates a models.py module for the Django ORM.

2   schemalib

schemalib is a Python script/module that extracts information from an XML schema and stores that information in a structure of Python objects where it can be accessed and used by client applications.

You can find the Python source for schemalib here: schemalib.py

3   Using schemalib.py

This section provides several examples of how schemalib.py might be put to use.

3.1   Printing information from an XML schema

Here is an example of the use of schemalib.py:

#!/usr/bin/env python

from __future__ import print_function
import sys
import schemalib

def test(infilename):
    schemalib.set_global_opts(verbose=False)      # See note 1
    schema = schemalib.XsdSchemaClass()           # See note 2
    schema.parse(infilename)                      # See note 3
    schema.build_schema()
    for element in schema.elements:               # See note 4
        print('name: {}'.format(element.name))
    for complex_type in schema.complex_types:     # See note 5
        print('complex type name: {}'.format(complex_type.name))
        for member in complex_type.sequence:
            print('    member name: {}  type: {}'.format(
                member.name, member.type_name))

def main():
    args = sys.argv[1:]
    infilename = args[0]
    test(infilename)

if __name__ == '__main__':
    main()

Notes:

  1. We call set_global_opts to set a global variable inside schemalib.py that contains options that would be set if schemalib.py were run as a script.
  2. We create an instance of the XsdSchemaClass. This instance contains helper functions for parsing and building itself. It contains instance variables that give us access to representations of the contents of the XML schema.
  3. We use the instance of XsdSchemaClass to parse and build the representation of the schema.
  4. We print the names of the elements declared at the top level in the schema.
  5. We print the names of the xs:complexType definitions that are at the top level of the schema, and we print information about each member defined in those definitions of xs:complexType.

Here is a sample of the output:

name: album-collection
name: artist-collection
complex type name: album-collectionType
    member name: album  type: alternativeAlbumType
    member name: artist  type: artistType
complex type name: artist-collectionType
    member name: artist  type: artistType
complex type name: artistType
    member name: name  type: xs:string
    member name: instrument  type: xs:string
complex type name: albumType
    member name: title  type: xs:string
    member name: genre  type: xs:string
complex type name: alternativeAlbumType
    member name: title  type: xs:string
    member name: genre  type: xs:string
    member name: artist  type: artistCollectionType
complex type name: artistCollectionType
    member name: artist-ref  type: xs:IDREF

3.2   Generating a Django DB model from an XML schema

This example shows that it's quite easy to use schemalib.py to help write out a models.py file containing model definitions for the Django ORM (object relational mapper).

Caution: This example is not complete. It is intended as a demonstration of what might and can be done with the help of schemalib.py and to give some idea of how easy (or hard) a task like this might be. If you want to generate models for use in a real Django application, you will to either (1) enhance and fill out this example or (2) use a similar capability provided by generateDS.py: http://www.davekuhlman.org/pages/daves-other-stuff.html#generateds-py.

Here is the source code for gen_django.py: gen_django.py

Here is a sample XML schema and the output (models) that are generated from it.

album_musician.xsd:

<?xml version='1.0' encoding='ASCII'?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns="http://www.music.com/namespaces/album_namespace"
    targetNamespace="http://www.music.com/namespaces/album_namespace"
    elementFormDefault="qualified"
    >
    <xs:element name="album-collection" type="album-collectionType"/>
    <xs:complexType name="album-collectionType">
        <xs:sequence>
            <xs:element name="album" type="alternativeAlbumType"
                minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="artist" type="artistType"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
    </xs:complexType>
    <xs:element name="artist-collection" type="artist-collectionType"/>
    <xs:complexType name="artist-collectionType">
        <xs:sequence>
            <xs:element name="artist" type="artistType"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="artistType">
        <xs:sequence>
            <xs:element name="name" type="xs:string"/>
            <xs:element name="instrument" type="xs:string"/>
        </xs:sequence>
        <xs:attribute name="artist_id" type="xs:ID"/>
    </xs:complexType>
    <xs:complexType name="albumType">
        <xs:sequence>
            <xs:element name="title" type="xs:string"/>
            <xs:element name="genre" type="xs:string"/>
        </xs:sequence>
        <xs:attribute name="artist" type="artistListType"/>
    </xs:complexType>
    <xs:simpleType name="artistListType">
        <xs:list itemType="xs:IDREF"/>
    </xs:simpleType>
    <xs:complexType name="alternativeAlbumType">
        <xs:sequence>
            <xs:element name="title" type="xs:string"/>
            <xs:element name="genre" type="xs:string"/>
            <xs:element name="artist" type="artistCollectionType"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="artistCollectionType">
        <xs:sequence>
            <xs:element name="artist-ref" type="xs:IDREF"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
    </xs:complexType>
</xs:schema>

models.py:

from django.db import models

class album_collectionType_model(models.Model):
    album = models.ForeignKey(
        "alternativeAlbumType_model",
        related_name="album-collectionType_album_alternativeAlbumType",
    )
    artist = models.ForeignKey(
        "artistType_model",
        related_name="album-collectionType_artist_artistType",
    )
    def __unicode__(self):
        return "id: %s" % (self.id, )

class artist_collectionType_model(models.Model):
    artist = models.ForeignKey(
        "artistType_model",
        related_name="artist-collectionType_artist_artistType",
    )
    def __unicode__(self):
        return "id: %s" % (self.id, )

class artistType_model(models.Model):
    artist_id = models.CharField(max_length=1000, )
    name = models.CharField(max_length=1000, )
    instrument = models.CharField(max_length=1000, )
    def __unicode__(self):
        return "id: %s" % (self.id, )

class albumType_model(models.Model):
    artist = models.CharField(max_length=1000, )
    title = models.CharField(max_length=1000, )
    genre = models.CharField(max_length=1000, )
    def __unicode__(self):
        return "id: %s" % (self.id, )

class alternativeAlbumType_model(models.Model):
    title = models.CharField(max_length=1000, )
    genre = models.CharField(max_length=1000, )
    artist = models.ForeignKey(
        "artistCollectionType_model",
        related_name="alternativeAlbumType_artist_artistCollectionType",
    )
    def __unicode__(self):
        return "id: %s" % (self.id, )

class artistCollectionType_model(models.Model):
    artist_ref = models.CharField(max_length=1000, )

    def __unicode__(self):
        return "id: %s" % (self.id, )

Notes on gen_django.py:

  1. In function schemalib we parse and build the XsdSchemaClass instance as in the previous example.
  2. We create an instance of a Writer class to be used to write out content. This gives us a bit of indirection and flexibility for future code modifications.
  3. We call function write_model, which writes out a model class (a subclass of Django models.Model) for each xs:complexType in the schema. For each xs:complexType, it iterates over each attribute and member in that type definition to create each member (class variable) in the model class.

4   schemalib API reference

Here is a cheat sheet on the classes and functions in schemalib.

Note that much of the information extracted by schemalib from the XML schema is collected with the use of Lxml xpath. If you need additional information from the schema, you might consider looking at the use of xpath in schemalib.py.

4.1   class XsdSchemaClass

This class represents the schema. It is a container for the definitions in the XML schema.

Attributes (member variables):

  • self.root -- The Lxml root element of the schema.
  • self.complexType_nodes -- A list of the top level Lxml xs:complexType elements in the schema.
  • self.simpleType_nodes -- A list of the top level Lxml xs:simpleType elements in the schema.
  • self.element_nodes -- A list of the top level Lxml xs:element elements in the schema.
  • self.attributeGroup_nodes --
  • self.complex_types -- A list of the top level xs:complexType definitions in the schema. These are instances of class XsdComplexTypeClass.
  • self.simple_types -- A list of the top level xs:simpleType definitions in the schema. These are instances of class XsdSimpleTypeClass.
  • self.elements -- A list of the top level element declarations in the schema. These are instances of class XsdElementClass.
  • self.attribute_groups -- A list of the top level xs:attributeGroup declarations in the schema. These are instances of class XsdAttributeGroupClass.
  • self.complex_type_map -- A mapping (dictionary) from the names of xs:complexType definitions to their definitions (instances of XsdComplexTypeClass).
  • self.simple_type_map -- A mapping (dictionary) from the names of xs:simpleType definitions to their definitions (instances of XsdSimpleTypeClass).
  • self.element_map -- A mapping (dictionary) from the names of xs:simpleType definitions to their definitions (instances of XsdSimpleTypeClass).
  • self.attribute_group_map -- A mapping (dictionary) from the names of xs:attributeGroup definitions to their definitions (instances of XsdAttributeGroupClass).

Methods:

  • __init__(self) -- Constructor. Initialize the member variables in a schema representation.
  • parse(self, ref) -- Parse a schema; return (doc,root).
  • build_schema(self) -- Retrieve and build top-level element, complexType, and simpleType.
  • build_schema(self) -- Retrieve and build top-level element, complexType, and simpleType.
  • print_schema(self, outfile) -- Display (print or write) the schema and the objects in it.

4.2   class XsdElementClass

An instance of this class represents a top level xs:element declaration in the XML schema.

Attributes (member variables):

  • self.node -- The Lxml element (node) from which this declaration was derived.
  • self.name -- The name of the element.
  • self.type_name -- The name of the type (if the type is not anonymous).
  • self.type_obj -- If the type of this element is anonymous and is nested inside the Lxml element, then the value of this attribute will be that xs:complexType node.

Methods:

  • __init__(self, node, nsmap=None) -- Constructor. Build an XsdElementClass instance given an Lxml node.
  • show(self, wrt=sys.stdout.write, indent="") -- Display (print out) an Element.

4.3   class XsdComplexTypeClass

An instance of this class represents a top level (named) xs:complexType definition in the XML schema.

Attributes (member variables):

  • self.node -- The Lxml element (node) from which this definition was derived.
  • self.name -- The name of this type.
  • self.sequence -- If this type is defined as a sequence of members, then the value of this attribute will contain those member definitions.
  • self.extension_base -- If this type extends another type definition, then the value of this attribute will be that other definition.
  • self.complex_content_sequence -- If this type is defined by an xs:complexContent, then the value of this attribute will be that complex content.
  • self.attributes -- A list of the attributes defined by this type.
  • self.attribute_groups -- A list of attribute groups referenced by this type.

Methods:

  • __init__(self, node, nsmap=None) -- Constructor. Build an XsdComplexTypeClass instance given an Lxml node.
  • show(self, wrt=sys.stdout.write, indent="") -- Display (print out) a complex type.

4.4   class XsdSimpleTypeClass

An instance of this class represents a top level (named) xs:simpleType definition in the XML schema.

Attributes (member variables):

  • self.node -- The Lxml element (node) from which this definition was derived.
  • self.name -- The name of this type.
  • self.restrictions -- If this definition is a restriction on another simple type, then the value of this attribute will be that type.
  • self.lists --
  • self.unions --

Methods:

  • __init__(self, node, nsmap=None) -- Constructor. Build an XsdSimpleType Class instance given an Lxml node.
  • show(self, wrt=sys.stdout.write, indent="") -- Display (print out) a simple type.

4.5   class XsdMemberClass

An instance of this class represents a member (child) of a complex type definition.

Attributes (member variables):

  • self.node -- The Lxml element (node) that defines this member.
  • self.name -- The name of this member.
  • self.type_name -- The name of the type of this member.

Methods:

  • __init__(self, node, nsmap=None) -- Constructor. Build an XsdMemberClass instance given an Lxml node.
  • show(self, wrt=sys.stdout.write, indent="") -- Display (print out) a Member/child.

4.6   class XsdAttributeGroupClass

An instance of this class represents an xs:attributeGroup.

Attributes (member variables):

  • self.node -- The Lxml element (node) that defines this attribute group.
  • self.name -- The name of this attribute group.
  • self.type_name --
  • self.attributes -- A list of the attributes in this attribute group.

Methods:

  • __init__(self, node, nsmap=None) -- Constructor. Build an XsdAttributeGroupClass instance given an Lxml node.
  • show(self, wrt=sys.stdout.write, indent="") -- Display (print out) an attribute group.

4.7   class XsdAttributeClass

An instance of this class represents an xs:attribute in an xs:complexType definition.

Attributes (member variables):

  • self.node -- The Lxml element (node) that declares this attribute.
  • self.name -- The name of the attribute.
  • self.type_name -- The name of the type of this attribute.

Methods:

  • __init__(self, node, nsmap=None) -- Constructor. Build an XsdAttributeClass instance given an Lxml node.
  • show(self, wrt=sys.stdout.write, indent="") -- Display (print out) an attribute.

- Dave Kuhlman