Tall John

Friday, 3 October 2008

Top Tips for Coding With Currency

As anyone who's ever made an E-Commerce system knows, money is everything. No, really if you don't get your financial sums right then you can't hope to build a successful online business (and may find yourself creating potential legal issues with the tax man).

So here's a rundown of the top tips I can give for making your financial calculations that bit easier.

Always Work In Minor Units

I can't stress enough how much this helps in terms of accuracy, rounding and speed. Working in major units may look better to you as you don't have to reformat the numbers to display them but I hope I can make the case here for minor units.

Integer arithmetic is much much faster than floating point arithmetic. Remember that even a single decimal place makes a number a float as far as your computer is concerned and all the processor overheads that go along with them suddenly arrive. I know it's not a lot slower but in a complex financial system it all adds up believe me.

Floating point arithmetic can get it's sums wrong. Don't believe me? Then pull up a Ruby console and try this:

a = "69.54".to_f
b = a * 100
b.ceil

Gives 6955 instead of 6954. This is because the limitations of floating point arithmetic have caused something like 0.0000000000000000000000000000000000000000000000001 to be added to 69.54. I spend a good 4 hours chasing this bug which manifested itself as a 1p discrepancy.

Trailing zeros can cause problems for major units. Think of trying to pass round £10.00 or £1.10 in major units. Storing it as a float you would keep losing the trailing zeros and would find yourself having to sprintf all over the place. I've seen plenty of systems in my time that store prices as decimal strings to get round these issues! There are of course various decimal formats that can be used (decimal is a data type in MySQL and BigDecimal has been introduced in Ruby on Rails) but when it comes down to it, these are just wrappers around either floats or stings and majorly sub-optimal for the other reasons given.

Freeze the Exchange Rate

If your business works in pounds but you allow payments to be made in Euros then with every payment you need to store the current exchange rate with it. Exchange rates change by the day and if you don't know exactly what rate you get for your transactions then you can kiss goodbye to any sort of accurate profit calculations.

Rounding - Pick a Direction and Stick With It

Often you will need to apply discounts, add markup etc and have to perform percentage calculations. If you are working in minor units this should be the only time (in normal day to day operations) that you ever have to handle fractions of pence. You will make your life so much simpler if, for all these calculations you decide the direction to round and stick with it. Do you want to keep the extra for yourself or be a nice guy and let the customer keep it? That's what the decision comes down to.

If you don't have consistency in this you really will find yourself spending days chasing 1p discrepancies.

From a coding point of view I tend to round down as preference because (as I demonstrate above) floating point arithmetic can get it wrong sometimes and, as it just wipes out everything after the decimal point, a 'floor' function is much more reliable that a 'ceil'.

Use a Pre-Filter on your Data Submissions

Of course your customers are always going to want to work in major units - no-one wants to see prices in pence splashed all over your website and it's much more intuitive to type major units into form fields.

What I like to do is put a pre-filter on all input coming into my back end system (so in Rails you would run the filter on 'params' or in PHP you would run it on '$_REQUEST') which pattern matches any string monetary amount (remember, all form submission values come through as strings) in a major unit and converts it to an integer minor unit.

In Rails it's in the application controller and looks like this:


def filter_units (input)
 if [Array, Hash, HashWithIndifferentAccess].include?(input.class)
  input.each do |key, value|
   #recurse through the data structure
   input[key] = self.filter_units(value)
  end
 #match the string format for a major unit
 elsif not input.nil? and input.match(/^\d+\.\d\d)$/)
  #convert to a minor unit integer
  (input.to_f * 100.0).to_i
 else
  #return the value unchanged
  input
 end
end

This also has the added benefit of validating monetary amounts - if a monetary field doesn't hit your back end as an integer then you know it has failed validation.

Friday, 29 August 2008

Rails: 'Has_many through' association across databases

So recently I had the challenge of creating a 'has_many through' relationship across two databases.
"Why would you do this?" you may ask. Well quite simply I am in a team building a new a new data management system to sit on top of a legacy system with its legacy database. All the new code is new, shiny and streamlined and the old code is... well... crap but we have to keep both systems running concurrently so we have various tables in the legacy database we need to access from the new system. As it happens we need to access the legacy users table in a 'has_many through' from the new 'orders' table.

Set Up Your Secondary Database Connection

You can quite happily set up a model to connect do a database other then the default by setting the connection up in your 'config/database.yml' as follows:


legacy:
 adapter: <adapter>
 database: <database>
 username: <username>
 password: <password>

And then in any model you want to use with your secondary database:


<model>.establish_connection configurations['legacy']

Create Your 'Through' Model

Now a normal 'has_many through' just plain won't work between models attached to two different databases but a normal has_many will. So we can create 'has_many through' functionality in the following way:

Set up your 'through' model on either database. It really doesn't matter which and set it to 'belong_to' your two main models.
Set both main models to 'has_many' of your 'through' model.

Create Your 'Through' Relationship

Use the following code in each of your main models to mimic the 'has_many through' association. In this example I'm using 'orders' and 'users' and my 'through' table is 'order_users':

In 'order'


def users
 user = []
 order_users.each do |ou|
  user << ou.user
 end
 user
end

In 'user'


def orders
 order = []
 order_users.each do |ou|
  order << ou.user
 end
 order
end

And you're done. Now the relationship will work just like any other 'has_many through'.

Sunday, 15 June 2008

Getting a Tablet PC Touchscreen working under Ubuntu

These are instructions I figured out for a Fujitsu-Siemens Lifebook P1510 but they should be applicable to most tablet PCs. I will note the values to change for different touchscreen resolutions.
This tutorial will also cover getting the touchscreen to map correctly when the screen is rotated for tablet mode.

My screen rotation script uses information found here with additions for correctly killing a previous instance (the touchscreen gets rather crazy when there are multiple copies of the script running for different screen rotations and they are all trying to pull the cursor in different directions).
My touchscreen script is based on one found here but with modifications for rotation.

First you need to install Perl and the required libraries:

sudo apt-get install wacom-tools
sudo apt-get install libx11-dev
sudo apt-get install libxtst-dev
sudo apt-get install x11-common
sudo apt-get install libxtest-dev

download X11::GuiTest, unzip it and install with

perl Makefile.PL
make
make test
make install

Now you need two scripts:
One to kill any previous instance of the touchscreen script, rotate the screen and fire a new instance of the touchscreen script
One to control the touchscreen - a modification of 'tablet5.en.pl' from here

I named my first script 'rotate.sh' and used this code:

#!/bin/bash

#Get the requested rotation angle from the args
angle=$1

#Kill off any previous instance of the touchscreen script
prs=`ps aux | grep tablet.pl | grep -v grep`

if [ "$prs" != "" ] ; then
    read -r -a Words <<< $prs
    kill -9 ${Words[1]}
fi

#Rotate the screen and fire off the appropriate Perl touchscreen script
case $angle in
    normal | 0 )
        xrandr -o normal
        perl /drivers/tablet.pl &
        ;;
        right | 90 )
        xrandr -o right
        perl /drivers/tablet.pl 90 &
        ;;
        inverted | 180 )
        xrandr -o inverted
        perl /drivers/tablet.pl 180 &
        ;;
        left | 270 )
        xrandr -o left
        perl /drivers/tablet.pl 270 &
        ;;
esac

You can replace '/drivers/tablet.pl' with the location of your touchscreen script.

Now take 'tablet5.en.pl' from here and modify is as follows:

Add the following code after the line 'use constant DIGITIZER=>(30.78,30.06);'

my $angle = '0';

if (not undef($ARGV)) {
    $angle = $ARGV[0];
}

Replace the body of 'sub movemouse(@)' with:

    (my $x,my $y)=@_;
    if (($x ne $prevx)||($y ne $prevy)) {

        if ($angle eq '90' || $angle eq 'right') {
            my $tmpx = $x;
            $x = $y;
            $y = (SCREEN)[0] - $tmpx;
        }
        elsif ($angle eq '180' || $angle eq 'inverted') {
            $x = (SCREEN)[0] - $x;
            $y = (SCREEN)[1] - $y;
        }
        elsif ($angle eq '270' || $angle eq 'left') {
            my $tmpx = $x;
            $x = (SCREEN)[1] - $y;
            $y = $tmpx;
        }

        $prevx=$x;
        $prevy=$y;
        MoveMouseAbs($x,$y);
    }

And you're done!

Now to get the touchscreen working at startup you need to add the line 'perl <path to touchscreen script> &' at the end of the file '/etc/gdm/Init/Default' (but before the exit command).

I would recommend putting links to the different screen rotations on your taskbar (there are some nice icons here) and also one to 'onboard', the very good onscreen keyboard that comes with Ubuntu.

Thursday, 12 June 2008

Rails: Refreshing Multiple Partials in a Page

So you have a page with a load of partials in it, all of which have forms that update their own partials by AJAX calls (form_for_remote, submit_to_remote etc.) - no worries so far. But then what if you update a field in one partial that has a knock on effect to other partials e.g. a price in an order booking partial could change the total price in an order summary partial. It seems like a fairly normal thing to want to do but ActionView's support for this sort of stuff is rubbish!

An ActionView controller will automatically render a partial with the same name as itself unless it is given an alternative render command. The problem is that ActionView will not allow more than one render per method.

Using JSP's 'page.replace_html' call it is possible to do multiple 'innerHTML' type javascript replacements. However you first must render the partials to strings using the 'render_to_string' method as shown:


   charges_partial = render_to_string(:partial => 'transactions/charges_and_payments', :object => @transaction)
   order_partial = render_to_string(:partial => 'transactions/order_totals', :object => transaction)
   
   render :update do |page|
     page.replace_html 'charges_and_payments', charges_partial
     page.replace_html 'order_totals', order_partial
   end

You cannot call the render_to_string method from within the render block as they lose their scope and you get 'method not found' errors.

Remember though that including this method will mean that your default partial (the one with the same name as the controller method) will not render, you will have to add it manually to the call above.

But what if you have a set of partials that will need to be updated by several different calls - you don't want to repeat the same code in every method but you also can't do a redirect to a master function (as they aren't allowed after a render) and you can't include separate renderer method as ActionView counts that as being multiple renderers in the one call which it doesn't allow. Bugger.

Here's the way that I got round it.

First I made a master 'refresh partials' method:


class TransactionsController < transaction =" Transaction.find_by_id(params[:id])" pending_partial =" render_to_string(:partial"> 'transactions/pending_order', :object => @transaction)
   order_partial = render_to_string(:partial => 'transactions/order_totals', :object => @transaction)
   charges_partial = render_to_string(:partial => 'transactions/charges_and_payments', :object => @transaction)
   comments_partial = render_to_string(:partial => 'transactions/comments', :object => @transaction)
   
   render :update do |page|    
     page.replace_html 'pending_order', pending_partial
     page.replace_html 'order_view_totals', order_partial
     page.replace_html 'charges_and_payments', charges_partial
     page.replace_html 'transaction_comments', comments_partial
   end
 end

end

Then (remember that I want these partials to be refreshed when a form in another is submitted) I added the following code to the partials that were being directly updated by their own forms. This will now call the refresh partials function whenever the parent partial is refreshed and the 'do_not_refresh_totals' variable is not set.


<% if not defined?(do_not_refresh_totals) or do_not_refresh_totals = true %>
 <script type="text/javascript">
   <%= remote_function(:url => { :controller => 'transactions', :action => 'refresh_partials', :id => transaction.id }) %>
 </script>
<% end %>

The point of the 'do_not_refresh_totals' variable is that when the page is loaded initially I don't want a load of refresh calls to be immediately fired off - there would be no point as no data is being submitted so no changes are being made. So in the parent view where the partials are being initially rendered from you simply need something like the following:


<%= render :partial => "flight_booking", :locals => { :flight_booking => flight_booking, :do_not_refresh_totals => true } %>

Enjoy!

Note: Actually I've found a more intuitive way of doing this using separate RJS files. For any method in any controller, if you put an RJS file with the name of that method in that controller's view folder then its code will be executed immediately after the controller method has run and the code will have access to any instance variables you created in the that method. Put your 'replace_html' syntax in your RJS file and you're good to go. Easy!
This however does totally spam your view folders with RJS files...

Monday, 9 June 2008

Cloning an EEEPC Drive

There are various tutorials on cloning disks, configuring grub, recompiling disk images etc but I've yet to see one that goes through step by step how to clone an EEEPC disk.
A recent project I was given for Radio Lollipop required me to configure an EEPC and then clone the disk to an SD card and make it bootable. This allowed us to be able to insert the card into any EEEPC and have it boot with our software and configuration.

Now the EEEPC is shipped with two partitions
sda1 is the root partition which is only ever mounted as ro and contains the base install
sda2 is the user partition and contains all the user files, additional apps and modifications.
When the machine is turned on unionfs is used to combine these partitions into one bootable drive
When you do a factory reset on an EEEPC it basically just blanks the user partition returning you to a base install

So I had two options available to me:

Clone only the user partition and make the EEEPC use the memory card as the user partition whist booting from the existing root partition
Advantages:
Can be done on a smaller memory card
Drawbacks:
Would have to create a new bootloader.
Susceptible to incompatibility with newer releases of the EEEPC.

Clone everything from both partitions and combine them correctly onto one memory card.
Advantages:
Grub and the boot image would already exist and would just need to be configured.
Copying all the data means that it would be immune to incompatibility issues on newer releases of the EEEPC.
Drawbacks:
Would have to merge the partitions somehow
Requires a much bigger memory card

I'll begin with my progress on cloning only the user partition.

First I downloaded a copy of DamnSmallLinux and burnt it to a CD. Then by booting from the CD I could access both drive partitions quite happily. DamnSmallLinux views the root partitions as 'hdc1' and the user partition as 'hdc2' (you can check this by simply running the command 'mount').

I ran this command to copy the user partition to my SD card (I got a 2Gb SD card which did the job fine - a 1GB may even do)

dd if=/dev/hdc2 of=/dev/sdb2

This took a while but once it was done I had my clone.
Then I copied the boot folder from the root partition (hdc1) to the SD card aswell.

I then setup the Grub bootloader on the SD card:

sudo grub
> find /boot/grub/stage1
##Pick the last hd it finds e.g. (hd3,0)
> root (hd3,0)
> setup (hd3)
> quit

Now toggle the SD card's 'bootable' flag using the following commands:

sudo fdisk /dev/sdb1
option 2
option a
partition 1

And finally edit your drive's /boot/grub/device.map to add

(hd1)   /dev/sdb
(hd2)   /dev/sdc

Now you have yourself a bootable SD card

I then followed what is said here about setting up menu.1st and the initramfs image (go from the heading 'Changing the SD card so that the EeePC boots and uses the SD card for it's disk'. However I had to make a couple of changes to the instructions:

Where it says to copy the device files for sda1 and sda2 into your initramfs dev folder instead run these commands.

sudo mknod sdb1 b 8 17
sudo mknod sdb2 b 8 18
sudo mknod sdc1 b 8 33
sudo mknod sdc2 b 8 34

As firstly the copy command just plain doesn't work and secondly this will also give you access to sdc which is the USB

Also where it tells you to enter the sleep command into the init file you need to enter 25 or 30 in order to give it time to load the device drivers, otherwise when you boot it will be unable to mount the SD card or USB.

After you've zipped your initramfs image back up and copied it to the SD card you should be good to go. All you then need to do is press F2 at boot and change your boot order to default to your SD card and away you go!

Tuesday, 4 March 2008

Building an NLP Robot to Live on my Forum

This is starting out really just as a discussion of the technologies involved in order to create a natural language processing agent that would be able to join in conversations (probably in a rather basic way) on my Forum. As I see it the stages required are as follows:

Read input
Use dictionary database to find word types (probably implementing fuzzy matching to deal with typos). Gcide looks like a good one
Build parse tree - verb phrases, noun phrases etc
Calculate input semantic category (e.g. question, statement)
Semantic processing
Judge output semantic category
Use semantic rules to create output 'theme'
Use syntactic parse tree templates to turn output into natural language
Write output

Now of course some of these stages are much more complicated than others

TBC...

Thursday, 31 January 2008

Creating PDF Documents from XML Using Apache FOP, PHP Javabridge and PHP 5

Welcome to my first professional blog post!

I have recently been tasked with the project of creating a versatile engine to create PDF format reports from XML markup and separate data arrays.

After some research I found Apache FOP which is a very good Java XSL FO (eXtensible Style Sheet Formatting Objects) renderer. To get this to work with PHP 5 I also needed to use the PHP/Java Bridge.

In order to get from static XML Markup to a rendered PDF document I needed to follow the steps:

Parse XML to a multidimensional array with a PHP XMLReader
Parse array into valid XSL FO markup
Pass XSL FO markup though the PHP/Java Bridge to an Apache FOP renderer

However the implementation we needed had to be versatile enough to take a data array and an XML template then knit them together to allow dynamic document generation.

First, the Java FOP Implementation

The PHP/Java Bridge installs as a set of Java libraries so it's easy to implement in a singleton 'wrapper' class. The wrapper also needed to utilise the JAXP libraries to handle the XML transformation:

// Java
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.StringReader;

//JAXP
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Source;
import javax.xml.transform.Result;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.sax.SAXResult;

// FOP
import org.apache.fop.apps.FOUserAgent;
import org.apache.fop.apps.Fop;
import org.apache.fop.apps.FOPException;
import org.apache.fop.apps.FopFactory;
import org.apache.fop.apps.MimeConstants;

public class FopWrapper
{
    protected String[] renderTypes = {MimeConstants.MIME_PDF, MimeConstants.MIME_POSTSCRIPT, MimeConstants.MIME_RTF, MimeConstants.MIME_PNG, MimeConstants.MIME_GIF, MimeConstants.MIME_JPEG};
    protected FopFactory fopFactory;
    protected static FopWrapper fopWrapper;

    protected FopWrapper()
    {
        fopFactory = FopFactory.newInstance();
    }

    public static FopWrapper getInstance()
    {
        if(FopWrapper.fopWrapper == null) FopWrapper.fopWrapper = new FopWrapper();
        return FopWrapper.fopWrapper;
    }

    public void render(String xmlInput, String outputFile, int renderType)
    {
        try
        {
            FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
            OutputStream output = new BufferedOutputStream(new FileOutputStream(outputFile));
            Fop fop = fopFactory.newFop(this.renderTypes[renderType], foUserAgent, output);
            TransformerFactory factory = TransformerFactory.newInstance();
            Transformer transformer = factory.newTransformer();
            Source src = new StreamSource(new StringReader(" "+xmlInput));
            Result res = new SAXResult(fop.getDefaultHandler());
            transformer.transform(src, res);
            output.close();
        }
        catch (Exception e)
        {
            System.out.println(e.getMessage());
        }
    }
}

This wrapper can then be instantiated in PHP by the following code:

java_require($path_to_file);
$class = new JavaClass("FopWrapper");
$this->reporter = $class->getInstance();

And it can be made to output rendered XSL FO markup by the simple command

$this->reporter->render($xsl_markup, $outputFile, $renderType);

where:
$xsl markup is a string of valid XSL FO markup
$outputFile is the output file (including path) to create
$renderType is an integer giving the type of file to render to (index of the

The Java wrapper needs to be packaged into a JAR file to be utilised by the PHP/Java Bridge.
And that's it! The rest of the work is done purely in PHP.

Pages, Blocks and Formatting Objects

W3Schools has a great XSL FO tutorial, detailing the elements needed to build a page in XSL FO, the tags and their arguments.

Effectively every valid XSL FO markup must have a <fo:root> node containing one or more <fo:page> nodes which themselves contain one or more <fo:block> nodes which may contain any number of formatting objects (plain text, tables, lists, images etc). These can be very nicely represented in PHP5 Objects.

The way I constructed the objects was to have them all implement an interface that simply defined an init() and a render() function which would always output the data as valid XSL FO.
Page objects can be added to Root objects, Block objects can be added to Page objects, FO objects can be added to Block objects and as the render function is called in one, it then calls the render function in all the objects it holds, recursing through the object tree.

I won't go too much into the implementation of these objects, my implementation (within the APLC Repository) uses init functions to set default attributes (in this case I'm using 'attribute' to mean the arguments within the XML tags) and also has setter functions for each which are named by upper-casing the first letter of the attribute name and prefixing it with 'set'. The 'method_exists' function is useful here to check if there is a setter method defined for an attribute.
I pass the data and parameters to the formatting object or renderer's 'init' function as a single array (actually as an instance of Aplc_Registry, which is a feature-rich wrapper around a multidimensional array, but for the purposes of this blog it's unnecessary complication).
If you are looking to create large multi-page reports it's useful to render each Page to a temporary file as it's added to the Root object.

So now we have a set of objects that can be created, updated, combined and rendered to produce an output document.
It is a fairly intuitive task then to create a document 'template' as a multidimensional array and parse it like so:

$renderedDocument = Xslfo_Document();

$parsedHeader = new Xslfo_Header();
$parsedHeader->init($document['header']);
$renderedDocument->addHeader($parsedHeader);

$parsedFooter = new Xslfo_Footer();
$parsedFooter->init($document['Footer']);
$renderedDocument->addFooter($parsedFooter);

foreach($document['pages'] as $page)
{
    $renderedPage = new Xslfo_Page();
    $renderedPage->init($page['attributes']);

    foreach($page['blocks'] as $block)
    {
        $renderedBlock = new Xslfo_Block();
        $renderedBlock->init($block['attributes']);

        foreach($block['formatting_objects'] as $fo)
        {
            $renderedFo = new Xslfo_Fo();
            $renderedFo->init($fo);
            $renderedBlock->addFo($renderedFo);
        }

        $renderedPage->addBlock($renderedBlock);
    }

    $renderedDocument->addPage($renderedPage);
}

$xsl = $renderedDocument->render();

This assumes that $document is the multidimensional array and 'header', 'footer' and 'formatting_objects' elements contain both formatting 'attributes' and data.

Bringing in the XML

I started by creating my own DTD to define my XML schema for these documents.
The DTD is not complicated it simply defines what formatting attributes are allowed to be included in document, header, footer, page and block tags and also defines 'fo' (formatting object) and 'renderer' tags.
'Fo' and 'Renderer' tags contain the actual data to be put in the document or references to elements within the data array. They also name the class that will be used to process them and give class-specific parameters.
In simple terms the difference between an 'Fo' and a 'Renderer' is that an 'Fo' produces a standard element with the data it is passed from the template or the data array. A renderer will perform more advanced operations such as extracting and filtering data from the database according to the defined parameters. A renderer will create and return a formatting object.

$data = array(array('renderFile' => '/tmp/leave.pdf', 'renderType' => 0, 'sender' => 8543, 'recipient' => 7365, 'senderaddress' => 65465, 'recipientaddress' => 34566, 'body' =>'Please make an appointment to see me regarding your son\'s behaviour. Frankly I didn\'t know you could do that with a melon and a block of soft cheese.', 'image1' => 'http://www.thedaddy.org/images/banner.png'));

<document>
<header align="right" fontsize="12" fontfamily="verdana" fontweight="bold">Letter Template</header>
<page>
    <block align="left">
        <fo type="Aplc_Report_Fo_Image">
            <data>image1</data>
        </fo>
    </block>
    <block align="right">
        <renderer type="Iris_Renderer_Site_Id">
            <data>senderaddress</data>
            <parameter name="newline">true</parameter>
            <parameter name="field">name</parameter>
            <parameter name="field">postcode</parameter>
        </renderer>
        <fo type="Aplc_Report_Fo_Break"></fo>
        <fo type="Aplc_Report_Fo_Date"></fo>
    </block>
    <block align="left">
        <renderer type="Iris_Renderer_Site_Id">
            <data>recipientaddress</data>
            <parameter name="newline">true</parameter>
            <parameter name="field">name</parameter>
            <parameter name="field">postcode</parameter>
        </renderer>
        <fo type="Aplc_Report_Fo_Break"></fo>
    </block>
    <block align="left" linefeedtreatment="none">
        Dear
        <renderer type="Iris_Renderer_User_Id">
            <data>recipient</data>
            <parameter name="field">fullname</parameter>
        </renderer>
    </block>
    <block align="justify">
        <fo type="Aplc_Report_Fo_Text">
            <data>body</data>
        </fo>
    </block>
    <block>
        <fo type="Aplc_Report_Fo_Break"></fo>
        Sincerely
        <fo type="Aplc_Report_Fo_Break"></fo>
        <renderer type="Iris_Renderer_User_Id">
            <data>sender</data>
            <parameter name="field">fullname</parameter>
        </renderer>
    </block>
</page>
</document>

This XML is parsed by a simple wrapper that we build in APLC around the PHP XMLReader class which validates it against the DTD and then turns it into a multidimensional array.
This array is iterated over, creating the node objects, passing the data and parameters and adding them to their parent nodes. For the Renderers and Formatting objects I simply took the name and checked if it was valid using class_exists. If not I ignored that whole node.

Renderers and Formatting Objects

How these are implemented really just comes down to personal style. Formatting objects are quite simple, so long as it's ensured that valid XSL FO is returned by the render function. You'll see that I created some that take data to produce tables, lists and plain text and others that will simply render a line break or print the date.

The way I made the renderers is to use singleton 'data objects' which connects to the database, pulls all the data specified data into a 2 dimensional array and then serializes it into a cache file. Then the renderers pull the rows and columns asked for in the template or data. This greatly lowers processor and memory requirements.

Mostly the data you pass to the parser will be from the data array but every now and then, particularly for plain text, you will want to embed it directly in the template. For this I used a simple trick - when the parser encounters a data tag it checks to see if the contents of the tag is a key in the data array. If so then the data from the key is rendered, if not then the contents of the data tag itself is used. This also helps greatly with debugging as it will print to the document any key that it couldn't find in the data array.

Gotchas

When blocks are added to a page in FOP they are always added vertically. There is no native way to specify you want to align a block horizontally with another. In order to do this I made another class to handle 'horizontal' tags which simply creates a block containing an xsl fo table and any blocks placed within it are set in table cells within a single row.