Notes on xmerl and processing XML in Erlang

Contents

1 Introduction
2 In the erl shell
3 Constructing an element/node tree
4 Walking the element tree
5 Exporting or serializing the element tree

1 Introduction

xmerl and its associated libraries enable us to parse XML documents, walk the document/element tree, extract information from the element tree, serialize the document tree to text, search a document using xpath expressions, validate an XML document against an XML schema, and more.

For more information on xmerl, see:

xmerl user guide -- http://erlang.org/doc/apps/xmerl/users_guide.html
xmerl reference manual -- http://erlang.org/doc/apps/xmerl/index.html

2 In the erl shell

Often when you are working with xmerl, you will want access to the record definitions that xmerl defines. So, if you have an Erlang module that imports those definitions, that is, it contains this line:

-include_lib("xmerl/include/xmerl.hrl").

Then, supposing that module is named test01, you can get access to those record definitions with the following:

35> c(test01).
{ok,test01}
36> rr(test01).
[xmerl_event,xmerl_fun_states,xmerl_scanner,xmlAttribute,
 xmlComment,xmlContext,xmlDecl,xmlDocument,xmlElement,
 xmlNamespace,xmlNode,xmlNsNode,xmlObj,xmlPI,xmlText]

Call help() in the Erlang shell to learn about more record-related built-in functions. Also see: http://erlang.org/doc/man/shell.html

3 Constructing an element/node tree

We can construct an element tree by creating instances of #xmlElement to represent the lowest level elements, and then using those to create instances of #xmlElement to represent the next higher level. Here is an example:

create_tree() ->
    A1 = #xmlAttribute{name="id", value="001"},
    A2 = #xmlAttribute{name="hobby", value="birding"},
    T1 = #xmlText{value="Albert"},
    N1 = #xmlElement{name="person", content=[T1], attributes=[A1, A2]},
    A3 = #xmlAttribute{name="id", value="002"},
    A4 = #xmlAttribute{name="hobby", value="swiming"},
    T2 = #xmlText{value="Becky"},
    N2 = #xmlElement{name="person", content=[T2], attributes=[A3, A4]},
    N3 = #xmlElement{name="people", content=[N1, N2]},
    N3.

Notes:

Each element is an instance of #xmlElement record.
The content of an element is a list containing instances of #xmlElement and #xmlText.
The attributes of an element is a list containing instances of #xmlAttribute, each of which have a name and a value.
The text is made up of instances of #xmlText, each of which has a value.

4 Walking the element tree

In order to walk the tree, we need to both recurse into the content (children) of each node and iterate over each immediate child. Here is an example that reads and parses an XML document and then prints out some of the information (tags, attributes, text) from that element tree:

-module(test01).
-include_lib("xmerl/include/xmerl.hrl").

-export([
         show/1,
         show_node/2,
         create_tree/0
        ]).

show(Infilename) ->
    {Doc, _Misc} = xmerl_scan:file(Infilename),
    %io:format("Doc: ~p~n", [Doc]),
    show_node(0, Doc),
    ok.

%
% Show a node/element and then the children of that node.
show_node(Level, Node) ->
    case Node of
        #xmlElement{name=Name, attributes=Attributes, content=Content} ->
            show_indent(Level),
            io:format("name: ~s~n", [Name]),
            show_attributes(Level + 1, Attributes),
            show_children(Level + 1, Content);
        #xmlText{value=Value} ->
            if
                hd(Value) =/= hd("\n") ->
                    show_indent(Level),
                    io:format("Text: ~p~n", [Value]);
                true ->
                    ok
            end;
            _ -> ok
    end.

%
% Show all the immediate children of a node.
show_children(_Level, []) ->
    ok;
show_children(Level, [Node | MoreNodes]) ->
    show_node(Level, Node),
    show_children(Level, MoreNodes).

%
% Show the attributes of a node.
show_attributes(_Level, []) ->
    ok;
show_attributes(Level, [Attribute | MoreAttributes]) ->
    #xmlAttribute{name=Name, value=Value} = Attribute,
    show_indent(Level),
    io:format("Attribute -- ~s: ~s~n", [Name, Value]),
    show_attributes(Level, MoreAttributes).

show_indent(Level) ->
    Seq = lists:seq(1, Level),
    F = fun (_) -> io:format("    ") end,
    lists:foreach(F, Seq).

5 Exporting or serializing the element tree

If you want to convert the element tree back into text, you can try something like the following:

2> {Tree, _} = xmerl_scan:file("test01.xml").
3> io:format(lists:flatten(xmerl:export([Tree], xmerl_xml))).

Notes on xmerl and processing XML in Erlang

1 Introduction

2 In the erl shell

3 Constructing an element/node tree

4 Walking the element tree

5 Exporting or serializing the element tree

Published

Category

Tags

Contact