Traversing Hierarchical Data Structures in SQL and NoSQL Databases

Introduction

Hierarchical data structures are everywhere — from file systems and organizational charts to product categories and dependency trees. Effectively querying and traversing these structures is crucial for building dynamic applications, whether you're working with SQL or NoSQL databases.

In this article, we'll dive deep into how to work with hierarchical data, comparing SQL's recursive queries with NoSQL's native tree structures. By the end, you'll have a solid grasp of how to traverse and manipulate hierarchies in both paradigms.

Hierarchical Data 101

A hierarchical data structure organizes elements into a tree-like format, where each node has one parent and potentially multiple children.

For example, a file system might look like this:

root
├── folder1
│   ├── file1
│   └── file2
└── folder2
    └── file3

In databases, this structure is typically represented using parent-child relationships:

id	name	parent_id
1	root	NULL
2	folder1	1
3	folder2	1
4	file1	2
5	file2	2
6	file3	3

The parent_id column points to a node's parent, with NULL indicating a root node.

Traversing Hierarchical Data in SQL

Relational databases like PostgreSQL and MySQL use recursive common table expressions (CTEs) to query hierarchical data.

Recursive CTE Example

WITH RECURSIVE FileSystemCTE AS (
    -- Anchor member: start from the root node
    SELECT id, name, parent_id, name AS path
    FROM FileSystem
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive member: join with the CTE to traverse the tree
    SELECT fs.id, fs.name, fs.parent_id, CONCAT(fsc.path, '/', fs.name) AS path
    FROM FileSystem fs
    INNER JOIN FileSystemCTE fsc ON fs.parent_id = fsc.id
)
SELECT * FROM FileSystemCTE;

Explanation:

The anchor member selects the root node (parent_id IS NULL).
The recursive member repeatedly joins the CTE to itself, appending each child's name to the path.
The final query retrieves all nodes with their full paths.

Output:

id	name	parent_id	path
1	root	NULL	root
2	folder1	1	root/folder1
3	folder2	1	root/folder2
4	file1	2	root/folder1/file1
5	file2	2	root/folder1/file2
6	file3	3	root/folder2/file3

Use Cases

Even if you don't load folder contents immediately, recursive CTEs are useful for:

Breadcrumb navigation: Generating paths for dynamic UIs.
Access control: Propagating permissions through a hierarchy.
Analytics: Counting nested items without fully expanding trees.
Path-based search: Querying paths with LIKE or regex.

Limitations

However, SQL recursion has downsides:

Flat result sets: Queries produce rows, not true nested structures.
Performance: Deep trees can cause slow joins.
Complexity: Recursion logic in SQL can be hard to debug.

Traversing Hierarchical Data in NoSQL

NoSQL databases like MongoDB handle tree structures differently by allowing nested documents and graph lookups.

Sample MongoDB Collection

[
  { "_id": 1, "name": "root", "parent_id": null },
  { "_id": 2, "name": "folder1", "parent_id": 1 },
  { "_id": 3, "name": "folder2", "parent_id": 1 },
  { "_id": 4, "name": "file1", "parent_id": 2 },
  { "_id": 5, "name": "file2", "parent_id": 2 },
  { "_id": 6, "name": "file3", "parent_id": 3 }
]

Using `$graphLookup`

db.FileSystem.aggregate([
  { $match: { parent_id: null } },
  { $graphLookup: {
      from: "FileSystem",
      startWith: "$_id",
      connectFromField: "_id",
      connectToField: "parent_id",
      as: "children"
  }}
])

Explanation:

$match: Finds the root node.
$graphLookup: Recursively joins documents by matching _id to parent_id.
The result nests children directly under each node.

Result:

[
  {
    "_id": 1,
    "name": "root",
    "children": [
      {
        "_id": 2,
        "name": "folder1",
        "children": [
          { "_id": 4, "name": "file1" },
          { "_id": 5, "name": "file2" }
        ]
      },
      {
        "_id": 3,
        "name": "folder2",
        "children": [
          { "_id": 6, "name": "file3" }
        ]
      }
    ]
  }
]

Advantages

Native nesting: Hierarchies are represented directly in JSON.
Efficient recursion: $graphLookup fetches descendants in one query.
Flexibility: Adding extra metadata is easy — just extend the document.

Limitations

Depth limits: Recursion depth can hit performance bottlenecks.
Atomic updates: Modifying deeply nested trees requires careful transaction handling.

SQL vs. NoSQL: When to Use Which?

Feature	SQL	NoSQL (MongoDB)
Structure	Flat rows	Nested JSON
Recursion	Recursive CTEs	`$graphLookup`
Query flexibility	Strong relational joins	Flexible document fields
Performance	Joins can slow down deep trees	Efficient for nested docs
Schema evolution	Rigid schema	Dynamic schema
Ideal use case	Fixed hierarchies (e.g., org charts)	Dynamic nested data (e.g., file systems)

Conclusion

Traversing hierarchical data requires different strategies depending on your database choice:

SQL excels at structured, relational data but requires recursion for tree traversal.
NoSQL like MongoDB natively supports nested documents and recursive lookups.

Choosing the right approach depends on your use case. For dynamic, nested data, NoSQL often shines. For strongly typed, relational hierarchies, SQL is reliable.

Would you like to see how this works in your stack? Let’s build something together — whether it’s a file system, org chart, or category tree!

Traversing Hierarchical Data Structures in SQL and NoSQL Databases

Introduction

Hierarchical Data 101

Traversing Hierarchical Data in SQL

Recursive CTE Example

Use Cases

Limitations

Traversing Hierarchical Data in NoSQL

Sample MongoDB Collection

Using $graphLookup

Advantages

Limitations

SQL vs. NoSQL: When to Use Which?

Conclusion

Using `$graphLookup`