Making sense of Rewritemaps in Apache
Title: Using RewriteMap to ease site migrations Date: 2010-05-21 20:43:11
This post outlines the approach we took when migrating content from one site to another on a recent project, as the documentation available currently this could be much clearer, and this is clearly a problem that others working on the web are likely to have in the near future, as people realise just how how much you can do with Wordpress, and move more sites to it.
As mentioned before, we had to move a fairly large site across from one content management system based around ColdFusion a modified version of Wordpress this month, making sure we didn't break any the links amassed over the last few years the site had been online.
If we were dealing with just a few urls, we could do this by hand using Apache's more well known [RewriteRule][] directives, but with than 5500 links to preserve, this approach just isn't realistic. We need a different tool for this.
Enter RewriteMaps
Fortunately, Apache RewriteMaps are designed specifically for these situations. As you'd expect from the name they allow you to map one url to another without needing to write the same ReWriteRule snippets thousands of times.
Anything that cuts repetition reduces the chances for typos to slip in, and makes it easier to maintain in the long run, so RewriteMaps are a handy tool to add to your reportoire.
How to use a ReWritemap
I find it helps to think of using a rewrite map as a 4 step process:
create the rewritemap
define the rewritemap inside an apache conf file
define what conditions you want to test the map against
define what you want to do with the return value of the rewritemap, and redirect accordingly
Now, in more detail....
Create the rewritemap
The mapping file can be as simple a plain text file with pairs of values, separated by at least one space, like so:
/content/press 22
/content/jobs 23
/content/contact 24
/content/accessibility 25
Whitespace isn't significant, so you can happily go for readability here.
Define the RewriteMap
Now that we have a mapping files we need to let Apache know that we'd like to us it:
RewriteEngine on
RewriteMap url_rewrite_map txt:/srv/html/domain.com/domain.migration.map
We've told Apache to switch on its url rewriting features (RewriteEngine On
), and declared the urls the text file as the patterns to match against.
Define what conditions you to test the map against
Now it's time to actually use the rewritemap:
RewriteCond ${url_rewrite_map:$1|NOT_FOUND} !NOT_FOUND
This condition definitely needs some unpacking.
RewriteConditions in Apache are designed as tests to see if a RewriteRule should be applied, depending on whether the expression they're testing returns true. RewriteMaps work by passing a value into them, like how we have with S1
, but they also a fallback value in case an expression didn't match any of the patterns in the url map file declare earlier, which should explain that the NOT_FOUND
is doing inside the curly braces.
So far, we're testing if we have a corresponding value for the captured pattern passed in (which would return true), and returning NOT_FOUND if there's nothing there.
However, we only want to apply the url rewrite if the we have a match in the map, and as it stands both values return will return 'true' here. This will create an infinite rewriting loop, so we need to add a ! operator to check if the value is NOTFOUND or not, and only rewrite the rules if the result is !NOTFOUND
.
Yes, we really did just test for not NOTFOUND
.
I'm sorry - I have a real problem with this; it's extremely confusing to read, and utterly unintuitive, as well as just being bad english.
Sadly, this seems to have become a fairly common Apache htaccess idiom, and right now, I can't think of another way to express this that works using Apache's syntax. Any other suggestions would be gratefully accepted.
Define what to do with the results
RewriteRule ^(.*) http://domain.com/?p=${url_rewrite_map:$1} [R=301]
This rule uses a regular expression to match the request sent to us, (that's the ^(.*)
part), and then passes that through the urlmap, so a path for a content item with a unique of id "34", but a previous path along the lines of "/press/importantpressrelease" would have its numeric id returned again, giving us a rewritten url like "http://domain.com/?p=34".
In our case, the unique numeric id was the only constant we could rely during migration from one system to another; thankfully Wordpress has some handy built-in rules of it's own that can convert these numeric id requests into nice friendly urls, along the lines of "http://domain.com/press/2009/06/10/important-press-release".
We're using a permanent 301 redirect flag with this ReWriteRule (that's the [R=301]
), so that search engine spiders know to follow the link, retaining any search engine mojo we may have had before..
Conclusion
So, using the url mapping here, we now have a sufficiently fast way to preserve the integrity of old urls, without locking us into this structure for the future, without diving too deep into the Apache directive rabbit hole.