Monday, April 11, 2011

Is Full Text search the answer?

OK I have a mySQL Database that looks something like this

ID - an int and the unique ID of the recorded

Title - The name of the item

Description - The items description

I want to search both title and description of key words, currently I'm using.

SELECT * From ‘item’ where title LIKE %key%

And this works and as there’s not much in the database, as however searching for “this key” doesn’t find “this that key” I want to improve the search engine of the site, and may be even add some kind of ranking system to it (but that’s a long time away).

So to the question, I’ve heard about something called “Full text search” it is (as far as I can tell) a staple of database design, but being a Newby to this subject I know nothing about it so…

1) Do you think it would be useful?

And an additional questron…

2) What can I read about database design / search engine design that will point me in the right direction.

If it’s of relevance the site is currently written in stright PHP (I.E. without a framework) (thro the thought of converting it to Ruby on Rails has crossed my mind)

update

Thanks all, I'll go for Fulltext search. And for any one finding this later, I found a good tutorial on fulltext search as well.

From stackoverflow
  • The problem with the '%keyword%' type search is that there is no way to efficiently search on it in a regular table, even if you create an index on that column. Think about how you would look that string up in the phone book. There is actually no way to optimize it - you have to scan the entire phone book - and that is what MySQL does, a full table scan.

    If you change that search to 'keyword%' and use an index, you can get very fast searching. It sounds like this is not what you want, though.

    So with that in mind, I have used fulltext indexing/searching quite a bit, and here are a few pros and cons:

    Pros

    • Very fast
    • Returns results sorted by relevance (by default, although you can use any sorting)
    • Stop words can be used.

    Cons

    • Only works with MyISAM tables
    • Words that are too short are ignored (default minimum is 4 letters)
    • Requires different SQL in where clause, so you will need to modify existing queries.
    • Does not match partial strings (for example, 'word' does not match 'keyword', only 'word')

    Here is some good documentation on full-text searching.

    Another option is to use a searching system such as Sphinx. It can be extremely fast and flexible. It is optimized for searching and integrates well with MySQL.

    Scott Herbert : Thanks, I thorght about useing something like Sphinx, but to be honest, unless it's going to take a huge amount of time, I'd rather build it myself that way I understand how it works.
    jonstjohn : we're getting close to implementing sphinx in one of our applications. we do heavy searching on a ton of data. we used fulltext searching for a while, but needed to move to innodb for transactions. But, fulltext was easy to implement and really fast.
  • You might also consider Zend_Lucene. It's slightly easier to integrate than Sphinx, because it is pure PHP.

    deresh : wouldn't recomend that because it's prety slow
  • I would guess that MySQL fulltext is sufficient for your needs, but it's worth noting that the built in support doesn't scale very well. For average size documents it starts to become unusable for table sizes as small as a few hundred thousand rows. If you think that this might become a problem further on you should probably look into Sphinx already. It's becoming the defacto standard for MYSQL-users, even though I personally prefer to implement my own solution using java lucene. :)

    Also, I'd like to mention that full text search is fundamentally different from the standard LIKE '%keyword%'-search. Unlike the LIKE-search full text indexing allows you to search for several keywords that doesn't have to appear right next to each other. Standard search engines such as google are full text search engines, for example.

0 comments:

Post a Comment