Friday, April 29, 2011

Map raw SQL to multiple related Django models

Due to performance reasons I can't use the ORM query methods of Django and I have to use raw SQL for some complex questions. I want to find a way to map the results of a SQL query to several models.

I know I can use the following statement to map the query results to one model, but I can't figure how to use it to be able to map to related models (like I can do by using the select_related statement in Django).

model_instance = MyModel(**dict(zip(field_names, row_data)))

Is there a relatively easy way to be able to map fields of related tables that are also in the query result set?

From stackoverflow
  • First, can you prove the ORM is stopping your performance? Sometimes performance problems are simply poor database design, or improper indexes. Usually this comes from trying to force-fit Django's ORM onto a legacy database design. Stored procedures and triggers can have adverse impact on performance -- especially when working with Django where the trigger code is expected to be in the Python model code.

    Sometimes poor performance is an application issue. This includes needless order-by operations being done in the database.

    The most common performance problem is an application that "over-fetches" data. Casually using the .all() method and creating large in-memory collections. This will crush performance. The Django query sets have to be touched as little as possible so that the query set iterator is given to the template for display.

    Once you choose to bypass the ORM, you have to fight out the Object-Relational Impedance Mismatch problem. Again. Specifically, relational "navigation" has no concept of "related": it has to be a first-class fetch of a relational set using foreign keys. To assemble a complex in-memory object model via SQL is simply hard. Circular references make this very hard; resolving FK's into collections is hard.

    If you're going to use raw SQL, you have two choices.

    1. Eschew "select related" -- it doesn't exist -- and it's painful to implement.

    2. Invent your own ORM-like "select related" features. A common approach is to add stateful getters that (a) check a private cache to see if they've fetched the related object and if the object doesn't exist, (b) fetch the related object from the database and update the cache.

    In the process of inventing your own stateful getters, you'll be reinventing Django's, and you'll probably discover that it isn't the ORM layer, but a database design or an application design issue.

    Michael : The performance problem is due to the way I had to go around some limitations in the ORM itself. The database design is good (no legacy database). Maybe I should ask if there is a simpler way to write the query with Django. In SQL the query is really simple. But that would be a topic on its own. ;-)
    S.Lott : That's my point -- get the Django ORM query to actually work in the Django ORM and everything will be better. Whatever the "limitations" are, it may be a simple misunderstanding or an application design issue that can be fixed.


Post a Comment