Traversing Hierarchical Data Structures in SQL and NoSQL Databases

Introduction

Hierarchical data structures are everywhere — from file systems and organizational charts to product categories and dependency trees. Effectively querying and traversing these structures is crucial for building dynamic applications, whether you're working with SQL or NoSQL databases.

Actually, this all started when I was building a simple file manager in Node.js for fun. You know, one of those side projects that pulls you in way deeper than you expected? The goal was to recursively traverse a file system and display it in a neat little UI. Sounds simple, right? Well, let's just say I ended up learning a lot more about hierarchical data than I initially signed up for.

What began as a straightforward weekend project quickly became a fascinating journey through the world of tree traversal algorithms, query optimization, and database design patterns. I found myself sketching tree diagrams on napkins and dreaming in nested JSON structures. The rabbit hole was deep, but the knowledge gained was worth it.

In this article, I'll dive deep into how to work with hierarchical data, comparing SQL's recursive queries with NoSQL's native tree structures. By the end, you'll (hopefully) have a solid grasp of how to traverse and manipulate hierarchies in both paradigms. And if you spot any typos, that's probably because I stayed up way too late hacking on this.

Hierarchical Data 101

A hierarchical data structure organizes elements into a tree-like format, where each node has one parent and potentially multiple children. This pattern appears naturally in many domains:

  • File systems with folders and files

  • Organization charts with managers and reports

  • Comment threads with parent and child comments

  • Product categories and subcategories

  • Navigation menus with nested items

For example, a file system might look like this (yes, this is what I ended up staring at for hours while debugging):

root
├── folder1
│   ├── file1
│   └── file2
└── folder2
    └── file3

In databases, this structure is typically represented using parent-child relationships:

idnameparent_idpath
1rootNULLroot
2folder11root/folder1
3folder21root/folder2
4file12root/folder1/file1
5file22root/folder1/file2
6file33root/folder2/file3

The parent_id column points to a node's parent, with NULL indicating a root node. This approach, known as the "adjacency list model," is straightforward but requires recursive queries for traversal.

Alternative Representation: Nested Set Model

While we're on the topic, it's worth mentioning another common approach to representing hierarchies in SQL: the nested set model. Instead of parent-child relationships, this model assigns left and right values to each node:

idnameleftright
1root112
2folder127
3folder2811
4file134
5file256
6file3910

The beauty of this model is that you can find all descendants of a node with a simple query:

SELECT * FROM FileSystem 
WHERE left > (SELECT left FROM FileSystem WHERE id = 2)
AND right < (SELECT right FROM FileSystem WHERE id = 2);

This would return all descendants of folder1. However, this model makes insertions and deletions more complex, as you need to update multiple nodes' left and right values.

I experimented with this approach but found it overkill for simple projects. It shines in read-heavy applications where hierarchies change infrequently like Content management systems with complex category structures, Product category trees in e-commerce platforms.

Traversing Hierarchical Data in SQL

Relational databases like PostgreSQL and MySQL use recursive common table expressions (CTEs) to query hierarchical data. This powerful feature allows you to build recursive queries without resorting to procedural code that could have performance bottlenecks.

Recursive CTE Example

WITH RECURSIVE FileSystemCTE AS (
    -- Anchor member: start from the root node
    SELECT id, name, parent_id, name AS path, 0 AS depth
    FROM FileSystem
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive member: join with the CTE to traverse the tree
    SELECT fs.id, fs.name, fs.parent_id, 
           CONCAT(fsc.path, '/', fs.name) AS path,
           fsc.depth + 1 AS depth
    FROM FileSystem fs
    INNER JOIN FileSystemCTE fsc ON fs.parent_id = fsc.id
)
SELECT * FROM FileSystemCTE;

Explanation:

  1. The anchor member selects the root node (parent_id IS NULL) and establishes the initial path and depth.

  2. The recursive member repeatedly joins the CTE to itself, appending each child's name to the path and incrementing the depth counter.

  3. The final query retrieves all nodes with their full paths and depth levels.

Output:

idnameparent_idpathdepth
1rootNULLroot0
2folder11root/folder11
3folder21root/folder21
4file12root/folder1/file12
5file22root/folder1/file12
6file33root/folder2/file32

I've added a depth column to track how deep each node is in the hierarchy. This becomes invaluable when you want to limit recursion depth or create indented visual representations. I admit this was a bit of a head-turner for me that took a while to grasp, What hepled me was focusing recursive JOIN. I learnt to visualize how each iteration connects parent nodes to their children, building the complete tree path step by step. Think of it like following a family tree, where each recursive step reveals the next branch. Practice with small examples, tracing how the query walks through the hierarchy. The key is patience and breaking down the complex concept into simple, digestible pieces.

Some Considerations

Recursive CTEs are powerful but can be resource-intensive for very deep hierarchies. Here are some tips I learned the hard way:

  1. Add a depth limit: Prevent infinite recursion with a depth check. WHERE fsc.depth < 10 -- Only traverse 10 levels deep

  2. Use indexes: Ensure parent_id columns are properly indexed. Generally speaking, indexing critical columns can give needed performace gains to querying any table, so this should be a no-brainer

    CREATE INDEX idx_parent_id ON FileSystem(parent_id);

In a practical file system, you don’t have to load folder contents immediately as with this example, you could simply load contents lazilly with pagination and defer sub-folder contents to load when the sub-folder is opened. Our example is best used for breadcrumb navigation especially when you’re traversing upward; finding ancestors. Recursive CTEs are also useful for:

  • Access control: Propagating permissions through a hierarchy.

  • Analytics: Counting nested items without fully expanding trees.

  • Path-based search: Querying paths with LIKE or regex.

  • Tree pruning: Selectively returning parts of a hierarchy based on conditions.

Traversing Hierarchical Data in NoSQL

NoSQL databases like MongoDB handle tree structures differently by allowing nested documents and graph lookups. This approach often feels more natural when working with tree-like data.

Sample MongoDB Collection

[
  { "_id": 1, "name": "root", "parent_id": null },
  { "_id": 2, "name": "folder1", "parent_id": 1 },
  { "_id": 3, "name": "folder2", "parent_id": 1 },
  { "_id": 4, "name": "file1", "parent_id": 2 },
  { "_id": 5, "name": "file2", "parent_id": 2 },
  { "_id": 6, "name": "file3", "parent_id": 3 }
]

At first glance, this looks similar to our SQL table. The key difference is how we query and structure the data.

Using $graphLookup on MongoDB

MongoDB's $graphLookup is a powerful aggregation stage that performs recursive searches on collections:

db.FileSystem.aggregate([
  { $match: { parent_id: null } },
  { $graphLookup: {
      from: "FileSystem",
      startWith: "$_id",
      connectFromField: "_id",
      connectToField: "parent_id",
      depthField: "depth",
      as: "descendants"
  }},
  { $project: {
      _id: 1,
      name: 1,
      descendants: {
        $map: {
          input: "$descendants",
          as: "descendant",
          in: {
            _id: "$$descendant._id",
            name: "$$descendant.name",
            parent_id: "$$descendant.parent_id",
            depth: "$$descendant.depth"
          }
        }
      }
  }}
])

Explanation:

  1. $match: Finds the root node.

  2. $graphLookup: Recursively joins documents by matching id to parentid.

  3. depthField: Tracks the depth of each node in the hierarchy.

  4. $project: Shapes the output to include only the fields we need.

Result:

[
  {
    "_id": 1,
    "name": "root",
    "descendants": [
      { "_id": 2, "name": "folder1", "parent_id": 1, "depth": 1 },
      { "_id": 3, "name": "folder2", "parent_id": 1, "depth": 1 },
      { "_id": 4, "name": "file1", "parent_id": 2, "depth": 2 },
      { "_id": 5, "name": "file2", "parent_id": 2, "depth": 2 },
      { "_id": 6, "name": "file3", "parent_id": 3, "depth": 2 }
    ]
  }
]

Notice how this returns a flat list of descendants rather than a nested structure. If you want a true tree structure, you'll need additional aggregation stages.

Creating a Nested Tree Structure

To transform our flat list into a proper tree, we can use the $graphLookup followed by a recursive array reduction:

db.FileSystem.aggregate([
  { $match: { parent_id: null } },
  { $graphLookup: {
      from: "FileSystem",
      startWith: "$_id",
      connectFromField: "_id",
      connectToField: "parent_id",
      as: "allDescendants"
  }},
  { $addFields: {
      "allNodes": { $concatArrays: [ ["$$ROOT"], "$allDescendants" ] }
  }},
  { $project: { 
      "tree": {
        $function: {
          body: function(nodes) {
            function createTree(nodes, parentId = null) {
              return nodes
                .filter(node => node.parent_id === parentId)
                .map(node => ({
                  ...node,
                  children: createTree(nodes, node._id)
                }));
            }
            return createTree(nodes);
          },
          args: ["$allNodes"],
          lang: "js"
        }
      }
  }}
])

This complex aggregation uses the $function stage (available in MongoDB 4.4+) to recursively build a proper tree structure.

I'll admit, though, I found $graphLookup's syntax a bit intimidating at first. But once it clicked, it was way smoother than debugging my SQL joins.

Alternative: Embedded Documents Approach

Another common approach in MongoDB is to embed child documents directly within their parents:

This approach makes retrieving entire subtrees trivial—just one query! However, it has drawbacks:

  1. Document size limits: MongoDB documents are capped at 16MB, limiting tree size.

  2. Update complexity: Modifying deeply nested nodes requires complex updates.

  3. Partial tree queries: It's harder to retrieve just a portion of the tree.

For my file manager project, I started with embedded documents but switched to parent references when I realized I needed more flexible querying.

Advantages of NoSQL for Hierarchies

  • Native nesting: Hierarchies are represented directly in JSON.

  • Efficient recursion: $graphLookup fetches descendants in one query.

  • Flexibility: Adding extra metadata is easy — just extend the document.

  • Schema evolution: No migrations needed when adding new node properties.

  • Horizontal scaling: NoSQL databases often scale out more easily.

Materialized Path Pattern

Before we compare SQL and NoSQL directly, let's discuss another pattern that works well in both systems: the materialized path.

This approach stores the full path to each node as a string:

With this pattern, finding descendants becomes a simple string matching operation:

This pattern excels in read-heavy applications and works equally well in SQL and NoSQL. The trade-off is maintaining the path strings during updates.

SQL vs. NoSQL: When to Use Which?

When to Choose SQL

  • You need strong transaction support across multiple operations

  • Your hierarchy has a stable, well-defined schema

  • You're performing complex joins with other relational data

  • You need advanced reporting tools

  • Your teams are more familiar with SQL

When to Choose NoSQL

  • Your hierarchy changes frequently or has varying node properties

  • You need to store and retrieve entire subtrees frequently

  • Your application is JavaScript/Node.js based

  • You value development speed over strict schema validation

  • You need horizontal scaling for very large hierarchies

Real-World Implementation: My File Manager

In my Node.js file manager project, I ended up using MongoDB with a hybrid approach:

  1. Parent references for the basic structure

  2. Materialized paths for efficient traversal

  3. Embedded metadata for file/folder properties

The schema looked something like this:

This design gave me the best of both worlds: efficient traversal with materialized paths and the flexibility of document-based storage.

Conclusion

So, yeah, what started as a simple Node.js project spiraled into this deep dive into hierarchical data. Traversing hierarchical data requires different strategies depending on your database choice:

  • SQL excels at structured, relational data but requires recursion for tree traversal.

  • NoSQL like MongoDB natively supports nested documents and recursive lookups.

Choosing the right approach depends on your use case. For dynamic, nested data, NoSQL often shines. For strongly typed, relational hierarchies, SQL is reliable.

If you're building something similar, save yourself some time: debug your recursion early. Trust me on this one.

And remember—there's no one-size-fits-all solution. The best approach is the one that matches your application's specific requirements and your team's expertise.

Now, if you'll excuse me, I'm off to add the finishing touches to my file manager. Maybe I'll even restore that folder3 I accidentally deleted...

Resources