Maclochlainn's Weblog

Michael McLaughlin's Old Technical Blog

How to convert XML to CSV and upload into Oracle

with 6 comments

Quick update, I moved the current blog because WordPress didn’t let me format the code segments or configure plug-ins. Unfortunately, there’s no way to redirect you from WordPress. You can find the updated and copiable code here.

There are several XSLT Processors that you can download. I like the Apache XML Project – Xalan. You can read an overview or download it. You’ll need to have the Java 5 SDK loaded on your machine because this utility is written in Java and called by it (in this example).

The command line instructions are a bit out of date. After you download the software, you’ll need to source your Java %PATH% (or $PATH) and %CLASSPATH% (or $CLASSPATH) environment variables. You can set your environment with the following references.

set PATH=C:\Sun\SDK\jdk\bin;%PATH%
set CLASSPATH=C:\JavaDev\Java5\ojdbc5.jar;.

After sourcing the local SDK, you need to source the XML Processor utility with these Java Archives:

set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xalan.jar;.
set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\serializer.jar
set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xercesImpl.jar
set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xml-apis.jar
set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xsltc.jar

If you mistype the *.jar files in your %CLASSPATH% (or $CLASSPATH) variable, you’ll raise the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xalan/xslt/Process

Now that the environment is configured, you can do the following steps.

1. Create an XML file like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<character>
<name>
<role>Indiana Jones</role>
<actor>Harrison Ford</actor>
<part>protagonist</part>
<film>Indiana Jones and Raiders of the Lost Ark</film>
<film>Indiana Jones and the Temple of Doom</film>
<film>Indiana Jones and the Last Crusade</film>
<film>Indiana Jones and the Kingdom of the Crystal Skull</film>
</name>
<name>
<role>Wilhelmina Scott</role>
<actor>Kate Capshaw</actor>
<part>support</part>
<film>Indiana Jones and the Temple of Doom</film>
</name>
<name>
<role>Marion Ravenwood</role>
<actor>Karen Allen</actor>
<part>support</part>
<film>Indiana Jones and Raiders of the Lost Ark</film>
<film>Indiana Jones and the Kingdom of the Crystal Skull</film>
</name>
<name>
<role>Elsa Schneider</role>
<actor>Alison Doody</actor>
<part>support</part>
<film>Indiana Jones and the Last Crusade</film>
</name>
<name>
<role>Short Round</role>
<actor>Jonathan Ke Quan</actor>
<part>support</part>
<film>Indiana Jones and the Temple of Doom</film>
</name>
<name>
<role>Sallah</role>
<actor>Jonn Rhys-Davies</actor>
<part>support</part>
<film>Indiana Jones and Raiders of the Lost Ark</film>
<film>Indiana Jones and the Last Crusade</film>
</name>
<name>
<role>Professor Henry Jones</role>
<actor>Sean Connery</actor>
<part>support</part>
<film>Indiana Jones and the Last Crusade</film>
</name>
<name>
<role>Henry "Mutt" Williams</role>
<actor>Shia LaBeouf</actor>
<part>support</part>
<film>Indiana Jones and the Kingdom of the Crystal Skull</film>
</name>
<name>
<role>Marcus Brody</role>
<actor>Denholm Elliott</actor>
<part>support</part>
<film>Indiana Jones and Raiders of the Lost Ark</film>
<film>Indiana Jones and the Last Crusade</film>
</name>
<name>
<role>Amrish Puri</role>
<actor>Mola Ram</actor>
<part>antagonist</part>
<film>Indiana Jones and the Temple of Doom</film>
</name>
<name>
<role>Rene Belloq</role>
<actor>Belloq</actor>
<part>antagonist</part>
<film>Indiana Jones and Raiders of the Lost Ark</film>
</name>
<name>
<role>Walter Donovan</role>
<actor>Julian Glover</actor>
<part>antagonist</part>
<film>Indiana Jones and the Last Crusade</film>
</name>
<name>
<role>Colonel Vogel</role>
<actor>Michael Bryne</actor>
<part>antagonist</part>
<film>Indiana Jones and the Last Crusade</film>
</name>
<name>
<role>Irina Spalko</role>
<actor>Cate Blanchett</actor>
<part>antagonist</part>
<film>Indiana Jones and the Kingdom of the Crystal Skull</film>
</name>
</character>

2. Create an XSL file to transform the XML. This one actually filters the file by movie, checks for non-antogonists, and sorts the data. It also provides a couple tricks, like entering the decimal commas, apostrophes, and line returns. An apostrophe is a decimal $#39, a comma is a decimal $#44, and a line return is a $#10.

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<!-- This loops through the branch when a sibling meets a condition. -->
<xsl:for-each select="character/name[film='Indiana Jones and the Last Crusade']">
<!-- Sorts based on the value in the "role" element. -->
<xsl:sort select="role" />
<!-- Eliminates anyone that has a "part" element value of "antagonist". -->
<xsl:if test="part != 'antagonist'">
<!-- An apostrophe before and after with a line return. -->
<xsl:text>$#39;</xsl:text>
<xsl:value-of select="role"/>
<xsl:text>$#39;&$#44</xsl:text>
<xsl:text>$#39;</xsl:text>
<xsl:value-of select="actor"/>
<xsl:text>$#39;&$#44</xsl:text>
<xsl:text>$#39;</xsl:text>
<xsl:value-of select="film"/>
<xsl:text>$#39;&$#10</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

3. Run the XSLT Processor (Xalan) to convert it to a comma separated values file:

C:\> java org.apache.xalan.xslt.Process -IN RaidersXML.xml -XSL RaidersXML.xsl -TEXT > FilteredRaidersXML.csv

4. Test the contents (use cat on Linux or Unix):

C:\> type FilteredRaidersXML.csv
'Elsa Schneider','Alison Doody','Indiana Jones and the Last Crusade'
'Indiana Jones','Harrison Ford','Indiana Jones and Raiders of the Lost Ark'
'Marcus Brody','Denholm Elliott','Indiana Jones and Raiders of the Lost Ark'
'Professor Henry Jones','Sean Connery','Indiana Jones and the Last Crusade'
'Sallah','Jonn Rhys-Davies','Indiana Jones and Raiders of the Lost Ark'

An earlier post has the details for uploading and downloading from Oracle external tables. You can check it here.

5. Create the virtual directory as a privileged user:

SQL> CREATE DIRECTORY download AS ‘C:\Download’

6. As the privileged user grant the read only privilege to the schema that will read the data:

SQL> GRANT READ ON DIRECTORY download TO importer;

7. Alternatively, if you want to write log or exception files, grant read and write like:

SQL> GRANT READ, WRITE ON DIRECTORY download TO importer;

8. Connect to the database and create a read-only external table that doesn’t log success, failures or bad rows:

CREATE TABLE movie
( role VARCHAR2(30)
, actor VARCHAR2(30)
, movie VARCHAR2(60))
ORGANIZATION EXTERNAL
( TYPE oracle_loader
DEFAULT DIRECTORY download
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
NOBADFILE
NODISCARDFILE
NOLOGFILE
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY "'"
MISSING FIELD VALUES ARE NULL )
LOCATION ('FilteredRaidersXML.csv'))
REJECT LIMIT 0;

9. Query the external table movie to verify that it works.

SELECT * FROM movie;

If you don’t like this solution, you can always default back to the Oracle XML Developer Guide for Oracle 11g, Chapter 35 for an alternative. Thanks to Kevin for reminding me that Oracle APEX lets you upload XML data through the interface: Utilities -> Data Load/Unload -> Load XML Data. Chapter 35 also contains a link to Chapter 3 which directs you to information about configuring and using Oracle XDB. Oracle XDB is the foundation component of Oracle APEX.

Advertisements

Written by maclochlainn

June 22, 2008 at 10:59 pm

6 Responses

Subscribe to comments with RSS.

  1. […] A quick example … « How to relate table, virtual directory, and external file names […]

  2. As an alternative, if you have Apex installed (10g) or enabled (11g), you have the option to load data in either XML or CSV format.

    Kevin

    June 23, 2008 at 12:33 am

  3. […] Xalan-Java of the example.  Xalan-Java is a convenient open source XSLT Processor, covered in an earlier blog. You can refer to that blog page for the download location and general configuration of Xalan-Java. […]

  4. I tried your example and got this error

    -bash-3.00$ java org.apache.xalan.xslt.Process -IN test.xml -XSL filter.xsl -TEXT
    [Fatal Error] filter.xsl:2:25: Open quote is expected for attribute “version” associated with an element type “xsl:stylesheet”.
    file:///export/home/ywu/xml/filter.xsl; Line #2; Column #25; org.xml.sax.SAXParseException: Open quote is expected for attribute “version” associated with an element type “xsl:stylesheet”.

    (Location of error unknown)XSLT Error (javax.xml.transform.TransformerException): No xml-stylesheet PI found in: test.xml
    Exception in thread “main” java.lang.RuntimeException: No xml-stylesheet PI found in: test.xml
    at org.apache.xalan.xslt.Process.doExit(Process.java:1155)
    at org.apache.xalan.xslt.Process.main(Process.java:1128)

    Yong

    September 18, 2008 at 11:12 pm

  5. Yong,

    Unfortunately, when I copy code in as plain text there are two outcomes. The first is people can cut and paste. The second is that the apostrophes aren’t purely ASCII apostrophes because WordPress converts them to specialized characters.

    You should be able to to replace the extended character set apostrophes in an editor and run the sample code. This error appears to indicate that the quotes aren’t matched.

    maclochlainn

    September 18, 2008 at 11:39 pm

  6. […] How to convert XML to CSV and upload into Oracle […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: