1   Introduction

1.1   More information about Lua

This article is about the Lua scripting language, and so you can find lots of information about Lua here:

1.2   More information about LuaExpat

This example uses the LuaExpat module to parse XML. You can learn about it here: -- https://matthewwild.co.uk/projects/luaexpat/index.html.

You can install LuaExpat with the following:

$ luarocks install luaexpat

2   The example

2.1   The example code

The example code is here: xmltest05.lua

2.2   What to look for and learn

The are several things to look for and learn from this example:

  • How to use LuaExpat.
  • How to create a tree of objects that represents the tree of XML elements.
  • How to walk the tree of objects that we've created and process each node. This example performs several operations on each node, for example, it prints out information from each node: tag, text (character content), and attributes.

2.3   Notes and explanation about the sample code

In function main we create an instance of our own parser class: XmlParserClass. That class implements methods to handle the LuaExpat events:

  • StartElement -- method start_element
  • EndElement -- method end_element
  • CharacterData -- characters

The LuaExpat callback table -- Then we create a "callback" table mapping these LuaExpat events to short wrapper functions that call the methods in our XmlParserClass. Wrapping the method calls in this way enables us to use an instance of class XmlParserClass to hold state across the calls that LuaExpat makes to our callback functions.

The XmlParserClass methods -- This is where we construct the element tree. Specifically, in our example, these methods do the following:

  • Method start_element -- Create an instance of class XmlElementClass containing the tag (element name) and the attributes. Push this instance on a stack of elements.
  • Method end_element -- Pop the top element off the stack of elements. Add it as a child of the parent element (i.e. the new top of stack).
  • Method characters -- Add any text to the element on top of the stack.

Warning -- Our implementation of the characters callback will not work correctly for XML instance documents that include elements that contain mixed content, that is, elements that contain both child elements and (child) character content. Here is an example of an element containing mixed content. Notice that the p element contains both text ("A very " and " sentence.") and an element child (b).:

<p>A very <b>simple</b> sentence.</p>

Processing the element tree -- In this example, we actually "process" the element tree three times. Our processing does the following:

  1. Walk the tree and display information on each node.
  2. Walk the element tree and convert the text (character content of each node) to upper case.
  3. Walk the tree again and display information on each node.

3   Other topics

3.1   Namespaces and Namespace declarations

If you need to handle namespaces with LuaExpat, add the additional separator parameter when you create your parser. This will cause the namespace URI to be included with the tag name when the StartElement and EndElement callbacks are called. For example, you might try either of these:

local xmlparser = lxp.new(callbacks, " ")
local xmlparser = lxp.new(callbacks, "|")