How to Build Flexible Data Models in Django with JSONField and Pydantic

Traditional relational databases have long struggled with storing semi-structured data. This inflexibility comes from the need for a predefined schema, which can require upfront planning. But in recent years, RDBMS have evolved, offering more flexibility without losing the reliability that makes them so dependable in software development.

One game-changer for me has been the addition of JSON and JSONB data types, particularly in PostgreSQL. These features allow relational databases to store semi-structured data directly, which is great for handling things like log files, sensor data, or social media posts without completely changing the schema. JSONB, for example, offers efficient storage, supports indexing, and enables advanced queries, making it a powerful tool for dynamic data models.

In this article, I’ll walk you through how Django’s JSONField (a JSON & JSONB wrapper) can be used to model semi-structured data and how you can enforce a schema on that data using Pydantic—an approach that should feel natural for a Python web developer.

Flexible type definitions

Let’s consider a system that processes payments, the Transaction table for example. It’s going to look like this:

from django.db import models

class Transaction(models.Model):
    # Other relevant fields...
    payment_method = models.JSONField(default=dict, null=True, blank=True)

our focus is on the payment_method field. In a real-world situation, we are going to have existing methods for processing payments:

  • Credit card

  • PayPal

  • Buy Now, Pay Later

  • Cryptocurrency

Our system must be adaptable to store the specific data required by each payment method while maintaining a consistent and validatable structure.

We'll use Pydantic to define precise schemas for different payment methods:

from typing import Optional
from pydantic import BaseModel

class CreditCardSchema(BaseModel):
    last_four: str
    expiry_month: int
    expiry_year: int
    cvv: str


class PayPalSchema(BaseModel):
    email: EmailStr
    account_id: str


class CryptoSchema(BaseModel):
    wallet_address: str
    network: Optional[str] = None


class BillingAddressSchema(BaseModel):
    street: str
    city: str
    country: str
    postal_code: str
    state: Optional[str] = None


class PaymentMethodSchema(BaseModel):
    credit_card: Optional[CreditCardSchema] = None
    paypal: Optional[PayPalSchema] = None
    crypto: Optional[CryptoSchema] = None
    billing_address: Optional[BillingAddressSchema] = None

This approach offers several significant benefits:

  1. Only one payment method can have a non-null value at a time.

  2. It’s easy to extend or modify without complex database migrations.

  3. Ensures data integrity at the model level.

To enforce a schema on our payment_method field, we leverage the Pydantic model to ensure that any data passed to the field aligns with the schema we've defined.

from typing import Optional, Mapping, Type, NoReturn
from pydantic import ValidationError as PydanticValidationError
from django.core.exceptions import ValidationError

def payment_method_validator(value: Optional[dict]) -> Optional[Type[BaseModel] | NoReturn]:
    if value is None:
        return

    if not isinstance(value, Mapping):
        raise TypeError("Payment method must be a dictionary")

    try:
        PaymentMethodSchema(**value)
    except (TypeError, PydanticValidationError) as e:
        raise ValidationError(f"Invalid payment method: {str(e)}")

Here, we perform a few checks to make sure the data entering our validator is of the correct type so that Pydantic can validate it. We do nothing for nullable values, and we raise a type error if the value passed in is not a subclass of a Mapping type, such as a Dict or an OrderedDict.

When we create an instance of the Pydantic model using the value we pass into the constructor, Pydantic validates that value. It raises an exception if the value doesn't match the schema. If the structure of the value doesn't fit the defined schema for PaymentMethodSchema, Pydantic will raise a validation error. For example, if we pass an invalid email value for the email field in PayPalSchema, Pydantic will raise a validation error like this:

ValidationError: 1 validation error for PaymentMethodSchema
paypal.email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='Check me out on LinkedIn: https://linkedin.com/in/daniel-c-olah', input_type=str]

We can enforce this validation in two ways:

  1. Custom Validation Method

    During the save process, we call the validation function to ensure the payment method matches the expected schema.

     from django.db import models
    
     class Transaction(models.Model):
         # ... other fields ...
         payment_method = models.JSONField(null=True, blank=True)
         def save(self, *args, **kwargs):
             # Override save method to include custom validation
             payment_method_validator(self.payment_method)
             super().save(*args, **kwargs)
    

    While effective, this approach can become cumbersome and less idiomatic in Django. We could even replace the function with a class method that does the same thing to make the code cleaner.

  2. Using Field Validators

    This method leverages Django's built-in field validation mechanism:

     from django.db import models
    
     class Transaction(models.Model):
         # The validator is attached directly to the field
         payment_method = models.JSONField(
             help_text="Extensible payment method info.",
             validators=[payment_method_validator], 
             null=True, 
             blank=True
         )
    
         def save(self, *args, **kwargs):
             # Ensures full validation is triggered
             self.full_clean()
             super().save(*args, **kwargs)
    
    💡
    In Django, field validators are only triggered when full_clean() is explicitly called—this typically occurs when using Django Forms or calling is_valid() on DRF serializers. For more details, you can refer to the Django validator documentation.
    💡
    A more advanced approach to address this would be implementing a custom Django field that integrates Pydantic to handle both serialization and validation of JSON data internally. While this warrants a dedicated article, for now, you can explore libraries that offer ready-made solutions for this problem for example: django-pydantic-jsonfield

    A more advanced approach to address this would be implementing a custom Django field that integrates Pydantic to handle both serialization and validation of JSON data internally. While this warrants a dedicated article, for now, you can explore libraries that offer ready-made solutions for this problem.For our use case, we’ll use the second approach which feels more idiomatic to Django, especially for models or tables not directly exposed to end users.

This approach balances flexibility and control over the values stored in the payment_method field. It allows us to adapt to future changes in requirements without compromising the integrity of existing data in that field. For example, we could include a Paystack ID field in our Paystack schema. This change would be seamless, as we wouldn't have to deal with complex database migrations.

We could even add a pay_later method in the future without any hassle. The types of fields could also change, and we wouldn't face database field migration constraints, like those encountered when migrating from integer primary keys to UUID primary keys. You can check out the complete code here to understand the concept completely.

Denormalization

Denormalization involves the deliberate duplication of data across multiple documents or collections to optimize for performance and scalability. This approach contrasts with the strict normalization used in traditional relational databases, and NoSQL databases have been instrumental in popularizing denormalization by introducing flexible, document-oriented storage paradigms.

Consider an e-commerce scenario with separate tables for products and orders. When a customer places an order, it’s essential to capture a snapshot of the product details included in the cart. Rather than referencing the current product records, which could change over time due to updates or deletions, we store the product information directly within the order. This ensures that the order retains its original context and integrity, reflecting the exact state of the products at the time of purchase. Denormalization plays a crucial role in achieving this consistency.

One possible approach might involve duplicating some product fields in the orders table. However, this method can introduce scalability challenges and compromise the cohesion of the order schema. A more effective solution is to serialize the relevant product fields into a JSON structure, allowing the order to maintain a self-contained record of the products without relying on external queries. The following code illustrates this technique:

from uuid import UUID
from typing import Optional, Any
from pydantic import constr, BaseModel, ValidationError as PydanticValidationError
from django.db import models
from django.core.exceptions import Validation as ValidationError

class FrozenProductSchema(BaseModel):
    id: UUID
    type: Any
    name: constr(min_length=2, strict=True)
    description = ""
    pricing: dict

class FrozenProductsSchema(BaseModel):
    products: list[FrozenProductSchema]

def frozen_products_validator(value: Optional[list[dict]]):
    if not value:
        return
    try:
        FrozenProductsSchema({"products": value})
    except PydanticValidationError as e:
        raise ValidationError(e.errors())

class Order(models.Model):
    # Other fields 
    products = models.JSONField(
            help_text="Validated snapshot of product details at time of order",
            validators=[frozen_products_validator]
            null=True,
            blank=True,
    )

Since we’ve covered most of the concepts in the previous section, you should begin to appreciate Pydantic’s role in all of this. In the example above, we use Pydantic to validate a list of products linked to an order. By defining a schema for the product structure, Pydantic ensures that every product added to the order meets the expected requirements. If the data provided does not conform to the schema, Pydantic raises a validation error.

Querying JSONField in Django

We can query JSONField keys the same way we perform looks in Django fields. Here are a few examples based on our use case.

# Find transactions with credit card payments
credit_card_transactions = Transaction.objects.filter(
    payment_method__credit_card__isnull=False
)

# Find transactions from a specific country in billing address
us_transactions = Transaction.objects.filter(
    payment_method__billing_address__country='USA'
)

# Complex nested filtering
complex_filter = Transaction.objects.filter(
    payment_method__credit_card__last_four__startswith='4111',
    payment_method__billing_address__city='New York'
)

# Orders where any product has id = 1
orders = Order.objects.filter(products__contains=[{"id": 1}])

You can check out the documentation to learn more about filtering JSON fields.

Conclusion

Using JSON and JSONB in PostgreSQL provides great flexibility for working with semi-structured data in relational databases. Tools like Pydantic and Django’s JSONField help enforce rules for data structure, making it easier to maintain accuracy and adapt to changes. However, this flexibility needs to be used carefully. Without proper planning, it can lead to slower performance or unnecessary complexity as your data changes over time.

Denormalization is another useful strategy, allowing you to optimize specific tasks by combining structured and unstructured data. Together, these features make relational databases more versatile and powerful. But they aren’t magic solutions. Using them effectively requires clear documentation, careful design, and a good understanding of how your data might evolve.