Traversing Hierarchical Data Structures in SQL and NoSQL Databases

Introduction

Hierarchical data structures are everywhere — from file systems and organizational charts to product categories and dependency trees. Effectively querying and traversing these structures is crucial for building dynamic applications, whether you're working with SQL or NoSQL databases.

In this article, we'll dive deep into how to work with hierarchical data, comparing SQL's recursive queries with NoSQL's native tree structures. By the end, you'll have a solid grasp of how to traverse and manipulate hierarchies in both paradigms.


Hierarchical Data 101

A hierarchical data structure organizes elements into a tree-like format, where each node has one parent and potentially multiple children.

For example, a file system might look like this:

root
├── folder1
│   ├── file1
│   └── file2
└── folder2
    └── file3

In databases, this structure is typically represented using parent-child relationships:

idnameparent_id
1rootNULL
2folder11
3folder21
4file12
5file22
6file33

The parent_id column points to a node's parent, with NULL indicating a root node.


Traversing Hierarchical Data in SQL

Relational databases like PostgreSQL and MySQL use recursive common table expressions (CTEs) to query hierarchical data.

Recursive CTE Example

WITH RECURSIVE FileSystemCTE AS (
    -- Anchor member: start from the root node
    SELECT id, name, parent_id, name AS path
    FROM FileSystem
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive member: join with the CTE to traverse the tree
    SELECT fs.id, fs.name, fs.parent_id, CONCAT(fsc.path, '/', fs.name) AS path
    FROM FileSystem fs
    INNER JOIN FileSystemCTE fsc ON fs.parent_id = fsc.id
)
SELECT * FROM FileSystemCTE;

Explanation:

  • The anchor member selects the root node (parent_id IS NULL).

  • The recursive member repeatedly joins the CTE to itself, appending each child's name to the path.

  • The final query retrieves all nodes with their full paths.

Output:

idnameparent_idpath
1rootNULLroot
2folder11root/folder1
3folder21root/folder2
4file12root/folder1/file1
5file22root/folder1/file2
6file33root/folder2/file3

Use Cases

Even if you don't load folder contents immediately, recursive CTEs are useful for:

  • Breadcrumb navigation: Generating paths for dynamic UIs.

  • Access control: Propagating permissions through a hierarchy.

  • Analytics: Counting nested items without fully expanding trees.

  • Path-based search: Querying paths with LIKE or regex.

Limitations

However, SQL recursion has downsides:

  • Flat result sets: Queries produce rows, not true nested structures.

  • Performance: Deep trees can cause slow joins.

  • Complexity: Recursion logic in SQL can be hard to debug.


Traversing Hierarchical Data in NoSQL

NoSQL databases like MongoDB handle tree structures differently by allowing nested documents and graph lookups.

Sample MongoDB Collection

[
  { "_id": 1, "name": "root", "parent_id": null },
  { "_id": 2, "name": "folder1", "parent_id": 1 },
  { "_id": 3, "name": "folder2", "parent_id": 1 },
  { "_id": 4, "name": "file1", "parent_id": 2 },
  { "_id": 5, "name": "file2", "parent_id": 2 },
  { "_id": 6, "name": "file3", "parent_id": 3 }
]

Using $graphLookup

db.FileSystem.aggregate([
  { $match: { parent_id: null } },
  { $graphLookup: {
      from: "FileSystem",
      startWith: "$_id",
      connectFromField: "_id",
      connectToField: "parent_id",
      as: "children"
  }}
])

Explanation:

  • $match: Finds the root node.

  • $graphLookup: Recursively joins documents by matching _id to parent_id.

  • The result nests children directly under each node.

Result:

[
  {
    "_id": 1,
    "name": "root",
    "children": [
      {
        "_id": 2,
        "name": "folder1",
        "children": [
          { "_id": 4, "name": "file1" },
          { "_id": 5, "name": "file2" }
        ]
      },
      {
        "_id": 3,
        "name": "folder2",
        "children": [
          { "_id": 6, "name": "file3" }
        ]
      }
    ]
  }
]

Advantages

  • Native nesting: Hierarchies are represented directly in JSON.

  • Efficient recursion: $graphLookup fetches descendants in one query.

  • Flexibility: Adding extra metadata is easy — just extend the document.

Limitations

  • Depth limits: Recursion depth can hit performance bottlenecks.

  • Atomic updates: Modifying deeply nested trees requires careful transaction handling.


SQL vs. NoSQL: When to Use Which?

FeatureSQLNoSQL (MongoDB)
StructureFlat rowsNested JSON
RecursionRecursive CTEs$graphLookup
Query flexibilityStrong relational joinsFlexible document fields
PerformanceJoins can slow down deep treesEfficient for nested docs
Schema evolutionRigid schemaDynamic schema
Ideal use caseFixed hierarchies (e.g., org charts)Dynamic nested data (e.g., file systems)

Conclusion

Traversing hierarchical data requires different strategies depending on your database choice:

  • SQL excels at structured, relational data but requires recursion for tree traversal.

  • NoSQL like MongoDB natively supports nested documents and recursive lookups.

Choosing the right approach depends on your use case. For dynamic, nested data, NoSQL often shines. For strongly typed, relational hierarchies, SQL is reliable.

Would you like to see how this works in your stack? Let’s build something together — whether it’s a file system, org chart, or category tree!