Making sense of Rewritemaps in Apache

Title: Using RewriteMap to ease site migrations Date: 2010-05-21 20:43:11

This post outlines the approach we took when migrating content from one site to another on a recent project, as the documentation available currently this could be much clearer, and this is clearly a problem that others working on the web are likely to have in the near future, as people realise just how how much you can do with Wordpress, and move more sites to it.

As mentioned before, we had to move a fairly large site across from one content management system based around ColdFusion a modified version of Wordpress this month, making sure we didn't break any the links amassed over the last few years the site had been online.

If we were dealing with just a few urls, we could do this by hand using Apache's more well known [RewriteRule][] directives, but with than 5500 links to preserve, this approach just isn't realistic. We need a different tool for this.

Enter RewriteMaps

Fortunately, Apache RewriteMaps are designed specifically for these situations. As you'd expect from the name they allow you to map one url to another without needing to write the same ReWriteRule snippets thousands of times.

Anything that cuts repetition reduces the chances for typos to slip in, and makes it easier to maintain in the long run, so RewriteMaps are a handy tool to add to your reportoire.

How to use a ReWritemap

I find it helps to think of using a rewrite map as a 4 step process:

  • create the rewritemap

  • define the rewritemap inside an apache conf file

  • define what conditions you want to test the map against

  • define what you want to do with the return value of the rewritemap, and redirect accordingly

Now, in more detail....

Create the rewritemap

The mapping file can be as simple a plain text file with pairs of values, separated by at least one space, like so:


/content/press		        22
/content/jobs		        23
/content/contact		    24
/content/accessibility		25

Whitespace isn't significant, so you can happily go for readability here.

Define the RewriteMap

Now that we have a mapping files we need to let Apache know that we'd like to us it:


    RewriteEngine on

    RewriteMap url_rewrite_map txt:/srv/html/domain.com/domain.migration.map

We've told Apache to switch on its url rewriting features (RewriteEngine On), and declared the urls the text file as the patterns to match against.

Define what conditions you to test the map against

Now it's time to actually use the rewritemap:


    RewriteCond ${url_rewrite_map:$1|NOT_FOUND} !NOT_FOUND 

This condition definitely needs some unpacking.

RewriteConditions in Apache are designed as tests to see if a RewriteRule should be applied, depending on whether the expression they're testing returns true. RewriteMaps work by passing a value into them, like how we have with S1, but they also a fallback value in case an expression didn't match any of the patterns in the url map file declare earlier, which should explain that the NOT_FOUND is doing inside the curly braces.

So far, we're testing if we have a corresponding value for the captured pattern passed in (which would return true), and returning NOT_FOUND if there's nothing there.

However, we only want to apply the url rewrite if the we have a match in the map, and as it stands both values return will return 'true' here. This will create an infinite rewriting loop, so we need to add a ! operator to check if the value is NOTFOUND or not, and only rewrite the rules if the result is !NOTFOUND.

Yes, we really did just test for not NOTFOUND.

I'm sorry - I have a real problem with this; it's extremely confusing to read, and utterly unintuitive, as well as just being bad english.

Sadly, this seems to have become a fairly common Apache htaccess idiom, and right now, I can't think of another way to express this that works using Apache's syntax. Any other suggestions would be gratefully accepted.

Define what to do with the results


    RewriteRule ^(.*) http://domain.com/?p=${url_rewrite_map:$1} [R=301]

This rule uses a regular expression to match the request sent to us, (that's the ^(.*) part), and then passes that through the urlmap, so a path for a content item with a unique of id "34", but a previous path along the lines of "/press/importantpressrelease" would have its numeric id returned again, giving us a rewritten url like "http://domain.com/?p=34". In our case, the unique numeric id was the only constant we could rely during migration from one system to another; thankfully Wordpress has some handy built-in rules of it's own that can convert these numeric id requests into nice friendly urls, along the lines of "http://domain.com/press/2009/06/10/important-press-release".

We're using a permanent 301 redirect flag with this ReWriteRule (that's the [R=301]), so that search engine spiders know to follow the link, retaining any search engine mojo we may have had before..

Conclusion

So, using the url mapping here, we now have a sufficiently fast way to preserve the integrity of old urls, without locking us into this structure for the future, without diving too deep into the Apache directive rabbit hole.



Copyright © 2020 Chris Adams
Powered by Cryogen
Theme by KingMob