Sunday, March 6, 2011

DB Design: more tables vs less tables

Say I want to design a database for a community site with blogs, photos, forums etc., one way to do this is to single out the concept of a "post", as a blog entry, a blog comment, a photo, a photo comment, a forum post all can be thought as a post. So, I could potentially have one table named Post [PostID, PostType, Title, Body .... ], the PostType will tell what type of post it is.

Or I could design this whole thing with more tables, BlogPost, PhotoPost, ForumPost, and I'll leave Comment just it's own table with a CommentType column.

Or have a Post table for all types of post, but have a separate Comment table.

To be complete I'm using ADO.NET Entity Framework to implement my DAL.

Now the question what are some of the implications if I go with any route described above that will influence on my DB performance and manageability, middle tier design and code cleaness, EF performance etc.?

Thank you very much!

Ray.

From stackoverflow
  • Let me ask you this:

    What happens if two years from now you decide to add a 'music post' as a blog type? Do you have to create a new table for MusicPost, and then re-code your application to integrate it? Or would you rather log on to your blog admin panel, add a blog type in a drop-down box called 'Music', and be on your merry way?

    In this case, less tables!

    Kev : Wouldn't you have to add more specific code, e.g., to have a music player object embed in the page? But generally I agree.
    cLFlaVA : Potentially, yes. However, the modification you mention would still be far less work than revisiting the entire database structure.
  • The problem is similar to the question of how deep your hierarchy should be in an OO design.

    A simple approach in OO terms would be to have a base class Post and children for BlogPost, ForumPost and so on. Comment could either be a child of Post or its own hierarchy, depending on your requirements.

    Then how this is going to be mapped to DB tables is an entirely different question. This classical essay by Scott Ambler deals with the different mapping strategies and explains their advantages and disadvantages in a rather detailed way.

    Tony Andrews : Essays by Scott Ambler on database design always make me queasy - but I'm not down-voting you for it!
    Joachim Sauer : @Tony: really? is there a background to that feeling? I'd be interested in that.
  • Generally, life will be easier if you can have all the posts in one table:

    • less joins to perform
    • less tables to maintain
    • common attributes are not repeated between tables
    • code more generic

    However, you could run into some issues:

    • if each subtype has a lot of its own attributes, you could end up with many columns - maybe too many for your DBMS
    • if a subtype has an attribute (e.g. a stored picture) that is expensive for your DBMS to maintain even when unused, you might not want that column in all rows

    Should you run unto such an issue, you can create a new table just for the specific attributes of that post subtype - for example:

    create table posts (post_id number primary key, 
                        post_date date,
                        post_title ...); /* All the common attributes */
    
    create table photo_post (post_id references posts, photograph ...);
    

    In many cases, no such issues arise and a single table for all will suffice.

    I can't think of any merit in creating a distinct table for every subtype.

0 comments:

Post a Comment