Traversing Hierarchical Data Structures in SQL and NoSQL Databases
Introduction
Hierarchical data structures are everywhere — from file systems and organizational charts to product categories and dependency trees. Effectively querying and traversing these structures is crucial for building dynamic applications, whether you're working with SQL or NoSQL databases.
In this article, we'll dive deep into how to work with hierarchical data, comparing SQL's recursive queries with NoSQL's native tree structures. By the end, you'll have a solid grasp of how to traverse and manipulate hierarchies in both paradigms.
Hierarchical Data 101
A hierarchical data structure organizes elements into a tree-like format, where each node has one parent and potentially multiple children.
For example, a file system might look like this:
root
├── folder1
│ ├── file1
│ └── file2
└── folder2
└── file3
In databases, this structure is typically represented using parent-child relationships:
id | name | parent_id |
1 | root | NULL |
2 | folder1 | 1 |
3 | folder2 | 1 |
4 | file1 | 2 |
5 | file2 | 2 |
6 | file3 | 3 |
The parent_id
column points to a node's parent, with NULL
indicating a root node.
Traversing Hierarchical Data in SQL
Relational databases like PostgreSQL and MySQL use recursive common table expressions (CTEs) to query hierarchical data.
Recursive CTE Example
WITH RECURSIVE FileSystemCTE AS (
-- Anchor member: start from the root node
SELECT id, name, parent_id, name AS path
FROM FileSystem
WHERE parent_id IS NULL
UNION ALL
-- Recursive member: join with the CTE to traverse the tree
SELECT fs.id, fs.name, fs.parent_id, CONCAT(fsc.path, '/', fs.name) AS path
FROM FileSystem fs
INNER JOIN FileSystemCTE fsc ON fs.parent_id = fsc.id
)
SELECT * FROM FileSystemCTE;
Explanation:
The anchor member selects the root node (
parent_id IS NULL
).The recursive member repeatedly joins the CTE to itself, appending each child's name to the path.
The final query retrieves all nodes with their full paths.
Output:
id | name | parent_id | path |
1 | root | NULL | root |
2 | folder1 | 1 | root/folder1 |
3 | folder2 | 1 | root/folder2 |
4 | file1 | 2 | root/folder1/file1 |
5 | file2 | 2 | root/folder1/file2 |
6 | file3 | 3 | root/folder2/file3 |
Use Cases
Even if you don't load folder contents immediately, recursive CTEs are useful for:
Breadcrumb navigation: Generating paths for dynamic UIs.
Access control: Propagating permissions through a hierarchy.
Analytics: Counting nested items without fully expanding trees.
Path-based search: Querying paths with
LIKE
or regex.
Limitations
However, SQL recursion has downsides:
Flat result sets: Queries produce rows, not true nested structures.
Performance: Deep trees can cause slow joins.
Complexity: Recursion logic in SQL can be hard to debug.
Traversing Hierarchical Data in NoSQL
NoSQL databases like MongoDB handle tree structures differently by allowing nested documents and graph lookups.
Sample MongoDB Collection
[
{ "_id": 1, "name": "root", "parent_id": null },
{ "_id": 2, "name": "folder1", "parent_id": 1 },
{ "_id": 3, "name": "folder2", "parent_id": 1 },
{ "_id": 4, "name": "file1", "parent_id": 2 },
{ "_id": 5, "name": "file2", "parent_id": 2 },
{ "_id": 6, "name": "file3", "parent_id": 3 }
]
Using $graphLookup
db.FileSystem.aggregate([
{ $match: { parent_id: null } },
{ $graphLookup: {
from: "FileSystem",
startWith: "$_id",
connectFromField: "_id",
connectToField: "parent_id",
as: "children"
}}
])
Explanation:
$match
: Finds the root node.$graphLookup
: Recursively joins documents by matching_id
toparent_id
.The result nests children directly under each node.
Result:
[
{
"_id": 1,
"name": "root",
"children": [
{
"_id": 2,
"name": "folder1",
"children": [
{ "_id": 4, "name": "file1" },
{ "_id": 5, "name": "file2" }
]
},
{
"_id": 3,
"name": "folder2",
"children": [
{ "_id": 6, "name": "file3" }
]
}
]
}
]
Advantages
Native nesting: Hierarchies are represented directly in JSON.
Efficient recursion:
$graphLookup
fetches descendants in one query.Flexibility: Adding extra metadata is easy — just extend the document.
Limitations
Depth limits: Recursion depth can hit performance bottlenecks.
Atomic updates: Modifying deeply nested trees requires careful transaction handling.
SQL vs. NoSQL: When to Use Which?
Feature | SQL | NoSQL (MongoDB) |
Structure | Flat rows | Nested JSON |
Recursion | Recursive CTEs | $graphLookup |
Query flexibility | Strong relational joins | Flexible document fields |
Performance | Joins can slow down deep trees | Efficient for nested docs |
Schema evolution | Rigid schema | Dynamic schema |
Ideal use case | Fixed hierarchies (e.g., org charts) | Dynamic nested data (e.g., file systems) |
Conclusion
Traversing hierarchical data requires different strategies depending on your database choice:
SQL excels at structured, relational data but requires recursion for tree traversal.
NoSQL like MongoDB natively supports nested documents and recursive lookups.
Choosing the right approach depends on your use case. For dynamic, nested data, NoSQL often shines. For strongly typed, relational hierarchies, SQL is reliable.
Would you like to see how this works in your stack? Let’s build something together — whether it’s a file system, org chart, or category tree!