Thursday, March 3, 2011

Optimum web folder structure for ~250,000 images

Hello

I will have around 200,000 images as part of my website. Each image will be stored 3 times: full size, thumbnail, larger thumbnail. Full size images are around 50Kb to 500Kb.

Normal tech: Linux, Apache, MySQL, PHP on a VPS.

What is the optimum way to store these for fast retrieval and display via a browser??

Should I store everything in a single folder? Should I store the full size images in 1 folder, the thumbails in another etc? Should I store the images in folders of 1000, and keep an index to which folder the image is in?

Thanks for any advice. Albert.

From stackoverflow
  • Depends on how you're indexing them, for how to retrieve them.

    There's nothing particularly against storing them all in a single folder, but it becomes difficult to manage. If you're storing them by filename, and the filenames are reasonably normally distributed, you might want to have subfolders separated by first letter of the name, etc. If you're indexing by date added, you may want to segregate them by that.

    As far as I know, there's no "faster" or "slower" way to store the images for browser retrieval.

    MrChrister : Would it be beneficial to store the small thumbnails in the database?
  • I'd use a split directory structure, three or four levels deep, the idea being split all the files evenly across many directories, to enable mainly easy maintenance and fast access.

    How to do it? There are various alternatives:

    • Taking the first characters of the images names
    • Taking the first characters of a hash of the name
    • Taking the last numbers of the seconds since 1970 of the date the picture was added
    • Taking the last characters of the images' id in a database (if that exists)

    Let's suppose we have IMG8993_full.jpg, IMG8993_thumb.jpg, IMG8993_smallthumb.jpg

    Then we could have, for example:

    /images/I/M/G/8/IMG8993:
    IMG8993_full.jpg
    IMG8993_thumb.jpg
    IMG8993_smallthumb.jpg
    
    Parand : I'm guessing the image namespace will be particularly crowded around a few common prefixes (eg. IMG, DSC, etc). May be better to use a hash of the name instead of the name itself for splitting the directories.
  • Whatever you do, ensure that directory indexing is enabled on the filesystem (you should choose a filesystem which supports it - but they all do)

    In practice on, say, ext3, this isn't a problem as it's enabled by default on newer systems. You can find out by using tune2fs (read the man)

  • With those kinds of numbers you may or may not run into an inode limit set on your server. That could be problematic depending upon who controls that box.

    In general, I would come up with some scheme to split them up into more manageable sizes. Even running ls on a directory that size would take ages to sort and display all of it.

  • Unless your users are going to an open folder with a directory listing of your images, I don't think folder structure will significantly increase or decrease retrieval speeds for your users. As other people have said, make sure indexing is turned on. However, if I were you, I'd look into writing (or copying and pasting) a service that dynamically serves the images, rather than storing them directly in your web file structure. Look into using LibGD within PHP -- it should be preinstalled on most LAMP servers.

    Disadvantages:

    • Serving the images via a service will be a tad slower than providing direct links
    • If you use a backend image store, such as a database, it could crash and render all of your images temporarily unavailable

    Advantages:

    • You'll save storage space by dynamically resizing the images to thumbnails, and make maintenance easier
    • Generally, processor speed is cheaper than storage space

    Using URL rewriting, you can even turn ugly URLs such as

    /imageServer.php?userID=12345imageId=67890&size=full
    

    into something sleeker and more transparent to your users:

    /jeremyZX/images/myPhoto.jpg
    /jeremyZX/images/tn/myPhoto.jpg
    

    This will give the apperance of an entire directory structure of images, whereas they're really stored in whatever backend format you'd like.

0 comments:

Post a Comment