Sunday, January 23, 2011

How can I avoid an error in this .htaccess file?

I have a blog. The blog is stored under the /blog/ prefix on my website. It has the usual URLs for a blog, so articles have URLs in the format /blog/:year/:month/:day/:title/.

First and foremost, I want to automatically redirect visitors to the www subdomain (in case they leave that off), and internally rewrite the root URL to /blog/, so that the front page of the blog appears on the front page of the site. I have accomplished that with the following set of rewrite rules in my .htaccess file:

RewriteEngine On

# Rewrite monkey-robot.com to www.monkey-robot.com
RewriteCond %{HTTP_HOST} ^monkey-robot\.com$
RewriteRule ^(.*)$ http://www.monkey-robot.com/$1 [R=301,L]

RewriteRule ^$             /blog/               [L]
RewriteRule ^feeds/blog/?$ /feeds/blog/atom.xml [L]

That works fine. The problem is that the front page of the blog now appears at two distinct URLs: / and /blog/. So I'd like to redirect the /blog/ URL to the root URL. Initially I tried to accomplish this with the following set of rewrite rules:

RewriteEngine On

# Rewrite monkey-robot.com to www.monkey-robot.com
RewriteCond %{HTTP_HOST} ^monkey-robot\.com$
RewriteRule ^(.*)$ http://www.monkey-robot.com/$1 [R=301,L]

RewriteRule ^$             /blog/               [L]
RewriteRule ^blog/?$       /                    [R,L]
RewriteRule ^feeds/blog/?$ /feeds/blog/atom.xml [L]

But that gave me an infinite redirect (maybe because of the preceding rule?). So then I tried this set:

RewriteEngine On

# Rewrite monkey-robot.com to www.monkey-robot.com
RewriteCond %{HTTP_HOST} ^monkey-robot\.com$
RewriteRule ^(.*)$ http://www.monkey-robot.com/$1 [R=301,L]

RewriteRule ^$             /blog/                       [L]
RewriteRule ^blog/?$       http://www.monkey-robot.com/ [R,L]
RewriteRule ^feeds/blog/?$ /feeds/blog/atom.xml         [L]

But I got a 500 Internal Server Error with the following log message:

Invalid command '[R,L]', perhaps misspelled or defined by a module not included in the server configuration

What gives? I don't think [R,L] is a syntax error.

  • I suspect that you might be able to avoid the infinite loop in your second ruleset by putting the rewriting directives in the main virtual host configuration, not in an .htaccess file. That's generally good practice anyway, since rewriting runs a lot faster when you put it in the virtual host, and you can get further speed benefits if you can tell Apache to ignore .htaccess files altogether.

    The logic behind this is explained in the mod_rewrite technical documentation. Here's the gist: as you know, rewrite rules can change which filename a given request corresponds to, and even which directory the resulting file is in. In fact, your

    RewriteRule ^$ /blog/
    

    is a perfect example, which changes the document root to the blog directory. But Apache needs to know the directory in order to figure out which .htaccess files to check for the request. Maybe you can see the problem here: by the time Apache reaches your .htaccess file, it has already determined the name of the file it should be accessing, which means it's too late for your rewriting rules to apply.

    Apache internally solves this problem by creating a new subrequest, which starts processing over from the beginning, and injecting the rules from your .htaccess file into the processing stream so they will apply to the request. As a side effect of this, all applicable rewrite rules are applied again, and this means that the [L] flag is kind of a lie when you use it in a .htaccess file. So even though you put [L] on your rule to try to force it to be the last rule applied, it really isn't; all the rules are applied again when Apache processes its internal subrequest.

    If you don't have access to the main server configuration, you might be able to do this:

    RewriteRule ^/blog/?$    /           [R,L,NS]
    

    The NS flag prevents the rule from being applied to subrequests.

    mipadi : I'll give the `NS` thing a try. Unfortunately, I'm on a shared host so I can't change the main host configuration file. :(
  • After reading through the comments, I think the most straightforward solution is to rewrite the root page to the actual file (index.html, I suppose?).

    RewriteRule ^blog/$ / [R,L]
    RewriteRule ^$ /blog/index.html [L]
    

    This way looping can be avoided. One problem remains though - the /blog/index.html is still accessible externally. If that's not an option, I think your best bet is following David's advice and moving the configuration one level up.

    Some alternatives that came to my mind:

    • Leave /blog/index.html accessible and use link canonical (not the best, but easy)
    • If server scripting (PHP probably) is accessible, redirect /blog/index.php to / at script level

0 comments:

Post a Comment