Traversing Hierarchical Data Structures in SQL and NoSQL Databases
Introduction
Hierarchical data structures are everywhere — from file systems and organizational charts to product categories and dependency trees. Effectively querying and traversing these structures is crucial for building dynamic applications, whether you're working with SQL or NoSQL databases.
Actually, this all started when I was building a simple file manager in Node.js for fun. You know, one of those side projects that pulls you in way deeper than you expected? The goal was to recursively traverse a file system and display it in a neat little UI. Sounds simple, right? Well, let's just say I ended up learning a lot more about hierarchical data than I initially signed up for.
What began as a straightforward weekend project quickly became a fascinating journey through the world of tree traversal algorithms, query optimization, and database design patterns. I found myself sketching tree diagrams on napkins and dreaming in nested JSON structures. The rabbit hole was deep, but the knowledge gained was worth it.
In this article, I'll dive deep into how to work with hierarchical data, comparing SQL's recursive queries with NoSQL's native tree structures. By the end, you'll (hopefully) have a solid grasp of how to traverse and manipulate hierarchies in both paradigms. And if you spot any typos, that's probably because I stayed up way too late hacking on this.
Hierarchical Data 101
A hierarchical data structure organizes elements into a tree-like format, where each node has one parent and potentially multiple children. This pattern appears naturally in many domains:
File systems with folders and files
Organization charts with managers and reports
Comment threads with parent and child comments
Product categories and subcategories
Navigation menus with nested items
For example, a file system might look like this (yes, this is what I ended up staring at for hours while debugging):
root
├── folder1
│ ├── file1
│ └── file2
└── folder2
└── file3
In databases, this structure is typically represented using parent-child relationships:
id | name | parent_id | path |
1 | root | NULL | root |
2 | folder1 | 1 | root/folder1 |
3 | folder2 | 1 | root/folder2 |
4 | file1 | 2 | root/folder1/file1 |
5 | file2 | 2 | root/folder1/file2 |
6 | file3 | 3 | root/folder2/file3 |
The parent_id
column points to a node's parent, with NULL indicating a root node. This approach, known as the "adjacency list model," is straightforward but requires recursive queries for traversal.
Alternative Representation: Nested Set Model
While we're on the topic, it's worth mentioning another common approach to representing hierarchies in SQL: the nested set model. Instead of parent-child relationships, this model assigns left and right values to each node:
id | name | left | right |
1 | root | 1 | 12 |
2 | folder1 | 2 | 7 |
3 | folder2 | 8 | 11 |
4 | file1 | 3 | 4 |
5 | file2 | 5 | 6 |
6 | file3 | 9 | 10 |
The beauty of this model is that you can find all descendants of a node with a simple query:
SELECT * FROM FileSystem
WHERE left > (SELECT left FROM FileSystem WHERE id = 2)
AND right < (SELECT right FROM FileSystem WHERE id = 2);
This would return all descendants of folder1. However, this model makes insertions and deletions more complex, as you need to update multiple nodes' left and right values.
I experimented with this approach but found it overkill for simple projects. It shines in read-heavy applications where hierarchies change infrequently like Content management systems with complex category structures, Product category trees in e-commerce platforms.
Traversing Hierarchical Data in SQL
Relational databases like PostgreSQL and MySQL use recursive common table expressions (CTEs) to query hierarchical data. This powerful feature allows you to build recursive queries without resorting to procedural code that could have performance bottlenecks.
Recursive CTE Example
WITH RECURSIVE FileSystemCTE AS (
-- Anchor member: start from the root node
SELECT id, name, parent_id, name AS path, 0 AS depth
FROM FileSystem
WHERE parent_id IS NULL
UNION ALL
-- Recursive member: join with the CTE to traverse the tree
SELECT fs.id, fs.name, fs.parent_id,
CONCAT(fsc.path, '/', fs.name) AS path,
fsc.depth + 1 AS depth
FROM FileSystem fs
INNER JOIN FileSystemCTE fsc ON fs.parent_id = fsc.id
)
SELECT * FROM FileSystemCTE;
Explanation:
The anchor member selects the root node (parent_id IS NULL) and establishes the initial path and depth.
The recursive member repeatedly joins the CTE to itself, appending each child's name to the path and incrementing the depth counter.
The final query retrieves all nodes with their full paths and depth levels.
Output:
id | name | parent_id | path | depth |
1 | root | NULL | root | 0 |
2 | folder1 | 1 | root/folder1 | 1 |
3 | folder2 | 1 | root/folder2 | 1 |
4 | file1 | 2 | root/folder1/file1 | 2 |
5 | file2 | 2 | root/folder1/file1 | 2 |
6 | file3 | 3 | root/folder2/file3 | 2 |
I've added a depth column to track how deep each node is in the hierarchy. This becomes invaluable when you want to limit recursion depth or create indented visual representations. I admit this was a bit of a head-turner for me that took a while to grasp, What hepled me was focusing recursive JOIN. I learnt to visualize how each iteration connects parent nodes to their children, building the complete tree path step by step. Think of it like following a family tree, where each recursive step reveals the next branch. Practice with small examples, tracing how the query walks through the hierarchy. The key is patience and breaking down the complex concept into simple, digestible pieces.
Some Considerations
Recursive CTEs are powerful but can be resource-intensive for very deep hierarchies. Here are some tips I learned the hard way:
Add a depth limit: Prevent infinite recursion with a depth check.
WHERE fsc.depth < 10 -- Only traverse 10 levels deep
Use indexes: Ensure parent_id columns are properly indexed. Generally speaking, indexing critical columns can give needed performace gains to querying any table, so this should be a no-brainer
CREATE INDEX idx_parent_id ON FileSystem(parent_id);
In a practical file system, you don’t have to load folder contents immediately as with this example, you could simply load contents lazilly with pagination and defer sub-folder contents to load when the sub-folder is opened. Our example is best used for breadcrumb navigation especially when you’re traversing upward; finding ancestors. Recursive CTEs are also useful for:
Access control: Propagating permissions through a hierarchy.
Analytics: Counting nested items without fully expanding trees.
Path-based search: Querying paths with LIKE or regex.
Tree pruning: Selectively returning parts of a hierarchy based on conditions.
Traversing Hierarchical Data in NoSQL
NoSQL databases like MongoDB handle tree structures differently by allowing nested documents and graph lookups. This approach often feels more natural when working with tree-like data.
Sample MongoDB Collection
[
{ "_id": 1, "name": "root", "parent_id": null },
{ "_id": 2, "name": "folder1", "parent_id": 1 },
{ "_id": 3, "name": "folder2", "parent_id": 1 },
{ "_id": 4, "name": "file1", "parent_id": 2 },
{ "_id": 5, "name": "file2", "parent_id": 2 },
{ "_id": 6, "name": "file3", "parent_id": 3 }
]
At first glance, this looks similar to our SQL table. The key difference is how we query and structure the data.
Using $graphLookup
on MongoDB
MongoDB's $graphLookup is a powerful aggregation stage that performs recursive searches on collections:
db.FileSystem.aggregate([
{ $match: { parent_id: null } },
{ $graphLookup: {
from: "FileSystem",
startWith: "$_id",
connectFromField: "_id",
connectToField: "parent_id",
depthField: "depth",
as: "descendants"
}},
{ $project: {
_id: 1,
name: 1,
descendants: {
$map: {
input: "$descendants",
as: "descendant",
in: {
_id: "$$descendant._id",
name: "$$descendant.name",
parent_id: "$$descendant.parent_id",
depth: "$$descendant.depth"
}
}
}
}}
])
Explanation:
$match
: Finds the root node.$graphLookup
: Recursively joins documents by matching id to parentid.depthField
: Tracks the depth of each node in the hierarchy.$project
: Shapes the output to include only the fields we need.
Result:
[
{
"_id": 1,
"name": "root",
"descendants": [
{ "_id": 2, "name": "folder1", "parent_id": 1, "depth": 1 },
{ "_id": 3, "name": "folder2", "parent_id": 1, "depth": 1 },
{ "_id": 4, "name": "file1", "parent_id": 2, "depth": 2 },
{ "_id": 5, "name": "file2", "parent_id": 2, "depth": 2 },
{ "_id": 6, "name": "file3", "parent_id": 3, "depth": 2 }
]
}
]
Notice how this returns a flat list of descendants rather than a nested structure. If you want a true tree structure, you'll need additional aggregation stages.
Creating a Nested Tree Structure
To transform our flat list into a proper tree, we can use the $graphLookup
followed by a recursive array reduction:
db.FileSystem.aggregate([
{ $match: { parent_id: null } },
{ $graphLookup: {
from: "FileSystem",
startWith: "$_id",
connectFromField: "_id",
connectToField: "parent_id",
as: "allDescendants"
}},
{ $addFields: {
"allNodes": { $concatArrays: [ ["$$ROOT"], "$allDescendants" ] }
}},
{ $project: {
"tree": {
$function: {
body: function(nodes) {
function createTree(nodes, parentId = null) {
return nodes
.filter(node => node.parent_id === parentId)
.map(node => ({
...node,
children: createTree(nodes, node._id)
}));
}
return createTree(nodes);
},
args: ["$allNodes"],
lang: "js"
}
}
}}
])
This complex aggregation uses the $function
stage (available in MongoDB 4.4+) to recursively build a proper tree structure.
I'll admit, though, I found $graphLookup's syntax a bit intimidating at first. But once it clicked, it was way smoother than debugging my SQL joins.
Alternative: Embedded Documents Approach
Another common approach in MongoDB is to embed child documents directly within their parents:
This approach makes retrieving entire subtrees trivial—just one query! However, it has drawbacks:
Document size limits: MongoDB documents are capped at 16MB, limiting tree size.
Update complexity: Modifying deeply nested nodes requires complex updates.
Partial tree queries: It's harder to retrieve just a portion of the tree.
For my file manager project, I started with embedded documents but switched to parent references when I realized I needed more flexible querying.
Advantages of NoSQL for Hierarchies
Native nesting: Hierarchies are represented directly in JSON.
Efficient recursion: $graphLookup fetches descendants in one query.
Flexibility: Adding extra metadata is easy — just extend the document.
Schema evolution: No migrations needed when adding new node properties.
Horizontal scaling: NoSQL databases often scale out more easily.
Materialized Path Pattern
Before we compare SQL and NoSQL directly, let's discuss another pattern that works well in both systems: the materialized path.
This approach stores the full path to each node as a string:
With this pattern, finding descendants becomes a simple string matching operation:
This pattern excels in read-heavy applications and works equally well in SQL and NoSQL. The trade-off is maintaining the path strings during updates.
SQL vs. NoSQL: When to Use Which?
When to Choose SQL
You need strong transaction support across multiple operations
Your hierarchy has a stable, well-defined schema
You're performing complex joins with other relational data
You need advanced reporting tools
Your teams are more familiar with SQL
When to Choose NoSQL
Your hierarchy changes frequently or has varying node properties
You need to store and retrieve entire subtrees frequently
Your application is JavaScript/Node.js based
You value development speed over strict schema validation
You need horizontal scaling for very large hierarchies
Real-World Implementation: My File Manager
In my Node.js file manager project, I ended up using MongoDB with a hybrid approach:
Parent references for the basic structure
Materialized paths for efficient traversal
Embedded metadata for file/folder properties
The schema looked something like this:
This design gave me the best of both worlds: efficient traversal with materialized paths and the flexibility of document-based storage.
Conclusion
So, yeah, what started as a simple Node.js project spiraled into this deep dive into hierarchical data. Traversing hierarchical data requires different strategies depending on your database choice:
SQL excels at structured, relational data but requires recursion for tree traversal.
NoSQL like MongoDB natively supports nested documents and recursive lookups.
Choosing the right approach depends on your use case. For dynamic, nested data, NoSQL often shines. For strongly typed, relational hierarchies, SQL is reliable.
If you're building something similar, save yourself some time: debug your recursion early. Trust me on this one.
And remember—there's no one-size-fits-all solution. The best approach is the one that matches your application's specific requirements and your team's expertise.
Now, if you'll excuse me, I'm off to add the finishing touches to my file manager. Maybe I'll even restore that folder3 I accidentally deleted...