I was reading Jeff Atwood's blog on Behold WordPress, Destroyer of CPUs and saw that many people there considered 20 SQL Queries per page load to be a lot. What is the average amount of queries per page nowadays for a highly dynamic page with auto suggest, auto refreshing of data, customized pages, and the kitchen sink?
For a simple example, Amazon.com practically customizes my homepage with stuff they think I will buy. To me, that doesn't look it just uses 5 or less queries for the front page.
I'm still a newbie with databases so please tell me if I'm missing something obvious.
-
You can usually bring all the data in two or three big queries instead of on twenty small ones. Minimizing the amount of queries is as important as, if not most important than, writing optimal queries to maximize performace.
Of course you should always analyze the query plans and aim towards optimal queries, be them small or big.
The thing is that badly designed webpages do many queries, one per each tiny little task, which could easily be grouped in a single query.
For example, a badly designed stackoverflow could do a query to get all the question ids it will show on the main page, then do one query per each question to get the summary and the votes. Then you have easily 20 useless queries. A well designed will do a single query getting all the information about all the questions it'll display.
Of course the impact of this all is reduced with good caching, which is what all big sites do, that way you actually can do a lot of queries and still get decent performance.
JoshBerke : problem with combing lots of queries is you need to be careful how you combine code. Don't combine unrelated things otherwise you might get coupled code that becomes difficult to break apart.Vinko Vrsalovic : I do fully agree with the principle you are stating. I'd add that it depends on how unrelated things are, and on how big (and how needed) a performance gain can be achieved. You can always break things apart on code instead of on SQL if it's really necessary. -
My rule of thumb is keep the front pages down to under 5-7 if possible, depending on the type of the site.
Interior pages, depending on what they need might have more, but I do what I can to keep it under 20.
However, at the same time, depending on what you are trying to do AND what types of caching you are doing with that information 20 may not be bad if 15 of them are heavily cached...
-
The answer really depends on a few key things: - The amount of traffic of your site - The IT budget for your support - The complexity of the site and the resources required to optimize
If you have a website that gets a few hits a day, then who cares about 20 queries. On the flip side, if you are Amazon then you are going to offer the needed content at a large infrastructure cost.
Just about everyone else in the world is somewhere between those two extremes and has to balance based on their own resources.
The only other thing I'll say is caching is your friend.
-
It's more about caching.
If you're getting a high number of concurrent page views, and each page view does a lot of queries, it doesn't make a lot of sense to hit the database every. single. time. Especially when a lot of the data coming back will by semi-dynamic reference data that only changes every now and then (as opposed to session or real time data which is always changing).
You may as well cache those database results using memcached or something similar. You don't necessarily need to cache the whole page (although that is what most Wordpress caching plugins do), as this kills interactivity, but you can cache on a data-by-data basis.
There's also the issue of optimising the queries. Especially avoiding the dreaded N+1 situation where you do one query for a parent record, then an extra query for each of its children. The latency of the round trip back and forth to the database alone will kill your page rendering performance, not to mention cause grief on the DB itself.
-
If you have to do 20 queries, then so be it, but it would make me a little nervous if it were a front page.
Combining queries where possible can help, but thinking about the caching is the most important part.
I am currently upgrading a site where data that changes 5 or 6 times a year is queried thousands of times a day, using some very nasty SQL to make it into a tree, but can be held as a tree structure in about 200k of RAM. (700k of viewstate on front page too, but that's another story...) These are the kind of things that cripple web sites for no good reason.
So, there is no magic number as to how many queries you should or should not do, but think about every one of them, even if you cache some of them for only 5 minutes, that will make a huge difference if ever you hit the front page of digg.
5 minutes of caching on just 1 query could remove thousands of DB hits when your site is under stress.
-
Given that, short of using Ajax, each page is atomic, I haven't found it that difficult to generate quite complex pages in 3 or less queries. Conceptually, a typical page-set involves:
- Context info (related to session and other global state);
- Header (and related 1:0-1 joins);
- Detail (1:M from 2).
It takes some planning ahead; but on the other hand it's an easy refactoring exercise in most cases.
-
The number of queries isn't so important all the time. It's really how you handle connections. If you have connection pooling then it really doesn't matter and the physical location of the servers matters. If your servers are next to eachother in a data center setting up a connection is probably really fast. Most of the time your website spends loading if it's a database driven site is going to be spent waiting for connections to open and for the data to be fetched. Figure to open a connection it takes 100 - 300ms. So if you have to open 20 connections for each database access, that's 4 - 6 seconds just opening and closing connections.
Since Jeff Atwood is using LINQ, I'm assuming he's only opening a single connection, executing his 20 queries and then closing the connection. It all probably happens pretty quick.
Also, Jeff's database runs on the same physical machine and uses internal machine communication to communicate with the database and not a network so there really isn't any delay you'd associate with the TCP type connection opening. (He talked about this on the Hanselminutes podcast a few weeks ago.)
I have a similar configuration for one of my sites using LINQ and with the database on the same box. When I run the site on my local machine hitting the database on the server in another state, it takes up to 6 seconds to load a couple of the data heavy pages. When I run the site on the server, the page loads in less than a second because everything is local to the server.
-
20 SQL Queries per page load is "bush-league." If you really want to destroy your web server's CPU - use ExpressionEngine. A default install of ExpressionEngine 1.6.6 will use 36 queries to build the home page, 38 queries to build a blog post page and 40 queries to build a category page.
Those WordPress guys are just dabbling!
If you want to get serious about bad performance - get ExpressionEngine!
:)
0 comments:
Post a Comment