Maclochlainn's Weblog

Michael McLaughlin's Old Technical Blog

Hierarchical query basics

with 2 comments

You can use hierarchical queries to step through tree-like data stored in self-referencing tables, like this one for organizations and organization structures. The ORG_STRUCTURE table contains two foreign keys (ORG_PARENT_ID and ORG_CHILD_ID). They both reference the ORGANIZATION_ID column. The top-most, or root node, contains a zero as an ORG_PARENT_ID because organizations start numbering at 1. You could also define the root node ORG_PARENT_ID as a null value. Bottom nodes, or leaf nodes, are those organizations that appear as ORG_CHILD_ID but not as ORG_PARENT_ID.

Flexible Organization Hierarchy

Flexible Organization Hierarchy

You’ll find the seeding scripts at the end of this web page.

This page covers four hierarchical query techniques. It shows you how to navigate a complete tree from the top down, and order by siblings. It shows you how to navigate from a leaf, or any intermediary, node up a tree to the root node, and how to limit the depth of traversal. It also shows you how to find all leaf nodes, and use the result to navigate several trees concurrently. Lastly, it shows you how to use NOCYCLE to identify rows that cause an ORA-01436 error, which means the tree linking relationship is broken.

A quick caveat, the START WITH clause is technically optional. If you exclude it all nodes are root nodes and the default depth is only two. So, really it’s not optional for meaningful work.

Down the Tree

This query allows you to navigate down the tree from the root node. You navigate down the tree when the PRIOR keyword precedes the child or dependent node. The SQL*Plus formatting commands generate the output shown.

COL org_id FORMAT A20
COL org_name FORMAT A20

SELECT   LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,        o.organization_name org_name
FROM     organization o
,        org_structure os
WHERE    o.organization_id = os.org_child_id
START
WITH     os.org_parent_id = 0
CONNECT
BY PRIOR os.org_child_id = os.org_parent_id;

It produces the following output:

If there were an offending row in the table that didn’t have a connecting parent and child set of foreign keys, you’d raise an ORA-01436 error. It means you can’t CONNECT BY PRIOR because a row’s values are non-conforming. You can enter a row to break the hierarchy provided the ORG_PARENT_ID and ORG_CHILD_ID have the same value. Then, this modified query will identify the offending row for you:

SELECT        LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,             o.organization_name org_name
FROM          organization o
,             org_structure os
WHERE         o.organization_id = os.org_child_id
START WITH    os.org_parent_id = 0
CONNECT BY
NOCYCLE PRIOR os.org_child_id = os.org_parent_id;

The next query changes the output because it orders by siblings. This means that the numeric ordering of the parent and child nodes is overridden.

SELECT   LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,        o.organization_name org_name
FROM     organization o
,        org_structure os
WHERE    o.organization_id = os.org_child_id
START
WITH     os.org_parent_id = 0
CONNECT
BY PRIOR os.org_child_id = os.org_parent_id
ORDER
SIBLINGS
BY       o.organization_name;

The sort now represents a tree in alphabetical ordering:

Up the Tree

The query switches the position of the PRIOR keyword. It now precedes the parent node. This means that it will go up the tree. The START WITH in this case starts with an intermediary node.

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
START
WITH    os.org_child_id = 6
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id;

The output is:

Restricting the Depth of Search

This is another up the tree hierarchical query. It starts from the bottom most leaf node, which is at depth five from the root node. The query restricts upward navigation to three levels or the two immediate parents (or if you prefer parent and grandparent).

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
AND     LEVEL <= 3
START
WITH    os.org_child_id = 20
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id;

The output is:

Leaf Node Up

Traversing from a leaf node can start by inspecting which are leaf nodes first, and hard coding a value in the START WITH clause. A better approach is to use a subquery to identify leaf nodes and then a filter in the outer WHERE clause to limit the leaf nodes. The leaf nodes are plural in both of these solutions.
This solution works prior to Oracle 10g, and continues to work through Oracle 11g:

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
START
WITH    os.org_child_id IN (SELECT os1.org_child_id
FROM    org_structure os1 LEFT JOIN org_structure os2
ON      os2.org_parent_id = os1.org_child_id
MINUS
SELECT  os1.org_child_id
FROM    org_structure os1 JOIN org_structure os2
ON      os2.org_parent_id = os1.org_child_id)
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id;

This solution uses the CONNECT_BY_LEAF pseudocolumn (introduced in Oracle 10g) to replace the outer join minus the join in the subquery. I tested it in Oracle 10gR2 and Oracle 11gR1 (The terminal release is near, isn’t it?). It also uses the ORDER SIBLINGS BY clause to order the tree by the alphabetical ORGANIZATION_NAME column values.

SELECT  LPAD(' ', 2*(LEVEL - 1)) || os.org_child_id AS org_id
,       o.organization_name org_name
FROM    organization o
,       org_structure os
WHERE   o.organization_id = os.org_child_id
START
WITH    os.org_child_id IN (SELECT   org_child_id
FROM     org_structure
WHERE    CONNECT_BY_ISLEAF = 1
START
WITH     org_child_id = 1
CONNECT
BY PRIOR org_child_id = org_parent_id)
CONNECT
BY      os.org_child_id = PRIOR os.org_parent_id
ORDER
SIBLINGS
BY      o.organization_name;

The snapshot of output for the latter is:

I did leave out the CONNECT_BY_ROOT and SYS_CONNECT_BY_PATH. If you’ve got the urge, post me a comment with an example. Otherwise, I’ll try to update this page later.

The setup code for the test sample:

BEGIN
FOR i IN (SELECT null
FROM user_tables
WHERE table_name = 'ORGANIZATION') LOOP
EXECUTE IMMEDIATE 'DROP TABLE organization CASCADE CONSTRAINTS';
END LOOP;
FOR i IN (SELECT null
FROM user_tables
WHERE table_name = 'ORG_STRUCTURE') LOOP
EXECUTE IMMEDIATE 'DROP TABLE org_structure CASCADE CONSTRAINTS';
END LOOP;
FOR i IN (SELECT null
FROM user_sequences
WHERE sequence_name = 'ORGANIZATION_S1') LOOP
EXECUTE IMMEDIATE 'DROP SEQUENCE organization_s1';
END LOOP;
FOR i IN (SELECT null
FROM user_sequences
WHERE sequence_name = 'ORG_STRUCTURE_S1') LOOP
EXECUTE IMMEDIATE 'DROP SEQUENCE org_structure_s1';
END LOOP;
END;
/

CREATE TABLE organization
( organization_id NUMBER
, organization_name VARCHAR2(10));

CREATE SEQUENCE ORGANIZATION_S1;

DECLARE
TYPE list IS TABLE OF VARCHAR2(10);
ordinal LIST := list ('One','Two','Three','Four'
,'Five','Six','Seven','Eight'
,'Nine','Ten','Eleven','Twelve'
,'Thirteen','Fourteen','Fifteen','Sixteen'
,'Seventeen','Eighteen','Nineteen','Twenty');
BEGIN
FOR i IN 1..ordinal.COUNT LOOP
INSERT INTO organization VALUES (organization_s1.nextval,ordinal(i));
END LOOP;
COMMIT;
END;
/

CREATE TABLE org_structure
( org_structure_id NUMBER
, org_parent_id NUMBER
, org_child_id NUMBER );

CREATE SEQUENCE org_structure_s1;

INSERT INTO org_structure VALUES (1,0,1);
INSERT INTO org_structure VALUES (1,1,2);
INSERT INTO org_structure VALUES (1,1,3);
INSERT INTO org_structure VALUES (1,1,4);
INSERT INTO org_structure VALUES (1,2,5);
INSERT INTO org_structure VALUES (1,2,6);
INSERT INTO org_structure VALUES (1,3,7);
INSERT INTO org_structure VALUES (1,3,8);
INSERT INTO org_structure VALUES (1,4,9);
INSERT INTO org_structure VALUES (1,4,10);
INSERT INTO org_structure VALUES (1,5,11);
INSERT INTO org_structure VALUES (1,5,12);
INSERT INTO org_structure VALUES (1,6,13);
INSERT INTO org_structure VALUES (1,6,14);
INSERT INTO org_structure VALUES (1,7,15);
INSERT INTO org_structure VALUES (1,8,16);
INSERT INTO org_structure VALUES (1,8,17);
INSERT INTO org_structure VALUES (1,9,18);
INSERT INTO org_structure VALUES (1,9,19);
INSERT INTO org_structure VALUES (1,14,20);

COMMIT;

Written by maclochlainn

July 16, 2008 at 2:36 am

2 Responses

Subscribe to comments with RSS.

  1. […] a tree down or up, but they can die when you fail to connect rows correctly. I’ve put together a Hierarchical Query Basics page. Hopefully somebody finds it useful. Tagged with: CONNECT BY NOCYCLE PRIOR, CONNECT BY PRIOR, […]

  2. […] they better illustrate how you navigate trees in recursive tables. You’ll find them in the Hierarchical Queries Basics page if you’re […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: