I'm pretty new to databases, so forgive me if this is a silly question.
In modern databases, if I use an index to access a row, I believe this will be O(1) complexity. But if I do a query to select another column, will it be O(1) or O(n)? Does the database have to iterate through all the rows, or does it build a sorted list for each column?
-
You have indexes. Clustered indexes are physically sorted on the disk, you can have only one per table. Unclustered indexes are logically sorted and you can have many of those (careful not to abuse it either, it might slow down write actions). If there is no index on your column then I believe it's the good old row by row method.
-
Actually, I think access based on an index will be O(log(n)), because you'll still be searching down through a B-Tree-esque organization to get to your record.
David Schmitt : Except for a hash index, where it is O(bucket-chain-length) -
Indexes are per column, so if you use a where clause on a un-indexed column it will do a so called tablescan which is O(n).
-
To answer your literal question, yes if there is no index on a column, the database engine will have to look at all rows.
In the more interesting case of selecting by multiple columns, both with and without index, the situation becomes more complex: If the Query Optimizer chooses to use the index, then it'll first select rows based on the index and then apply a filter with the remaining constraints. Thus reducing the second filtering operation from O(number of rows) to O(number of selected rows by index). The ratio between these two number is called selectivity and an important statistic when choosing which index to use.
-
There are different types of indexes, different execution plans and different implementations for different databases. Most of the code of relations database is in search-optimising algorithms. There is not a single answer to your question. You can use a tool to visualise the execution plan when you want to know how a query is going to be executed.
Javier : true, but still a good approximation (and what he's looking for) is: O(log(n)) when indexed and O(n) when notPaco : That's true, but indexes are not always the most limiting factor in queries. In some cases you might not notice the difference between using an index or not. -
I don't know the answer, but keep in mind that big-O notation only gives you an indication of performance for data-set sizes which are arbitrarily large.
For example, the bottleneck for database performance is typically disk seeks. Therefore, performance is greatly increased if the working data-set can be kept in memory. Big-O notation won't tell you anything about such optimizations, because they are only relevant for finite data-sets.
0 comments:
Post a Comment