Read this in Russian
About me
Guest book.
AdaXml API

Let's discuss SAX API for Ada.

Short description of SAX parse is : User program registers its callbacks (handler) in SAX parser and starts parsing providing document URL. Parser calls handler in appropriate points of parsing process. For instance when parser encounters new tag handler gets tag name and tag attribute list.

Let's observe general structure of XML parser written in Ada.

xml document, presented as Ada.Streams.Stream Data type: Ada.Streams.Stream_Element_Array
encoding support engine
parser's buffer Data type: Character, Wide_Character
handler arguments Data type: access String, access Wide_String

We need effective information delivery within this process to make parser work faster. It requires:

  • encoding support engine should be able to transform Stream_Element_Array into String or Wide_String. This part of parser is implementation depended because Storage_Elements'Size may be varied between implementation.
  • parser should be able to pass data to handler effectively. Passing tag name and character data could be done in that way - callUserHandler( buffer(10..20) ). In this way we avoid data copying because compiler choices passing arguments by reference. The main question in passing list of attributes. Let's examine following choices:
    • function getAttrValue(al:AttrList;index:Positive) return String; It requres copying in/out secondary stack.
    • function getAttrValue(al:AttrList;index:Positive) return access String; It requres dymanic memory allocation because buffer(10..20)'Access is an illegal. In addition we need an agreement between handler and parser about deallocating memory to avoid dangling references and memory leak.

The second question is using of character types. XML specification requres that parser accept XML documents (at the least) in UTF-8 and Latin-1 encoding. We can use Wide_Character type to store characters from all encodings. But sometimes it's convenient to use Character. One possible way out is pass UTF-8 symbols in String argument. It conforms to specification requirements but makes coding more difficult because of UTF-8 occupies a few Characters in the String.

If we make right decision in these two questions we'll have a good performance in XML API.

I propose an example of SAX parser in Ada as background for further work and discussion.

I ask you to forgive me my penury of English. I hope the sence of this article is clear enough.