We recently came across a requirement to consume large volumes of data from a system that published a text feed of individually formatted XML “records”. It’s an odd integration mechanism, involving a text based feed of header blocks and individual XML:
-- Header attributes
<xml>Description of what's to come</xml>
-- Record 1 Header
-- Record 1 Attributes
-- Record n Header
-- Record n Attributes
There could be in excess of 10 million records published in this way – crazy, I know, but in the Enterprise world you do come across some legacy mechanisms that were built for some purpose or other, often long forgotten.
Anyway, my goal was to consume this data en mass into Siebel.
Looking at the format of what we’re consuming, my first thought was how to get at the structured data. Header lines in text files fill me with dread – as does the use of arbitrary delimiters to separate records. XML is a wonderfully descriptive language, ideally suited to describe payloads and define relationships. A quick Perl script later and I have a list of records in a format that I’m comfortable with.
Working with XML? Got to be EAI, right? Well, not when you’re dealing with millions of records it isn’t. We all know that EIM is the mechanism of choice for bulk loading data, but how do we get from structured XML to EIM staging tables? The answer that I found lies in Microsoft’s Integration Services technology.
SQL Server Integration Services (SSIS) is an integration technology that allows you to extract, transform and load from one data format to another. It used to be called Data Transformation Services (DTS), if you’re old enough to remember. As it is, SSIS allows the specification of an XML file, backed up by an XSD, as input; we can then define either an Oracle schema as output (mapping directly to EIM, if you want to get straight to the point) or to automatically generated MS SQL Server staging tables. I went for the latter, as it gives you an ability to do some transformation and / or reconciliation before loading in to EIM.
Creating a process flow to extract from XML to a structures table schema is surprisingly straightforward. Here’s how I achieved it using MS SQL Server 2014:
- Create a new Execute SQL task in your Control Flow. Call this “Truncate Tables” and leave it empty for now
- Create a new Data Flow object in the Control Flow. Name this “Load Staging from XML” and double click for more detail
- Drag an XML source into the flow
- Double click this and select an XSD to determine the data types, size and cardinality for the input source. SSIS will ensure that data flowing through the component is validated against the XSD.
- Select an XML file for input
- Now drag an OLE DB Destination component into the flow. Left click and give it a sensible name, based on the XML node data that it will consume
- Drag an output from the XML input into the OLE DB Destination – this will conveniently inform the component of the XSD schema for the input data. Double click, click “New” for the “Name of table or the view”
- SISS will automatically create a table schema based on the XSD that is being consumed – great stuff:
- Click the “Mappings” item and SISS will automatically map the XSD schema to your new table
- Repeat the above for all node levels of your XSD:
- Now revisit your Control flow and add some SQL to truncate the tables created as part of your mapping
The beauty of this process is that SISS will maintain the referential integrity of your XML as it maps it out to the table schema. So, if you’ve got “ParentNode” with one or more “ChildNode” records, the process will generate a primary key for the parent and use this as a foreign key in the child table, via a column called “ParentNodeId”. This can then be used when mapping your staging tables out to EIM.
SISS is an excellent ETL tool, if you’re licensed to use it. Mapping large XML data sources out to a structured schema is the first step in getting data into EIM – you can even bypass the staging altogether and map XML directly.
Do you have experience with other ETL tools? Post below and share your experience!