- Mod_Rewrite: Overview
- Mod_Rewrite: Facts and Information
- Mod_Rewrite: Tutorial and Course
- Mod_Rewrite: References
Mod_Rewrite: Facts and Information
mod_rewrite is one of the most powerful and least understood modules available for Apache. Understanding when not to use it is at least as important as knowing how to use it. Throughout this book, we'll show alternate ways to solve problems, when appropriate, using methods other than mod_rewrite.
Mod_Rewrite: Tutorial and Course
mod_rewrite, frequently called the "Swiss Army Knife" of URL manipulation, is one of the most popular-and least understood-modules in the Apache Web Server's bag of tricks. In Tutorial and Course for mod_rewrite we'll discuss what it is, why it's necessary, and the basics of using it.
For many people, mod_rewrite rules, and regular expressions in general, are magical incantations that they mutter over their website to make it do wondrous things. If the results are not quite what they wanted, they'll add a pinch of this and a smidgen of that, in the hopes that doing so will nudge it in the right direction.
The goal of Tutorial and Course for Mod_Rewrite is to assist you in moving to a place where crafting a rewrite rule set is a scientific process, with predictable results. You'll know what difference a particular change will make, and you'll be able to determine, by reading a rule that has been handed to you, what it will do or why it's not doing what it's supposed to do.
Mod_Rewrite: When To Use Mod_Rewrite
Mod_Rewrite is for rewriting and redirecting URLs dynamically, using powerful pattern matching to allow for handling of very complex situations.
It becomes difficult to give a better definition than that, largely because the uses of Mod_Rewrite are almost as numerous as the people who use it. There are, however, a few very common uses, and I aim to cover the majority of these in the examples in this Tutorial and Course for Mod_Rewrite.
The uses of Mod_Rewrite tend to fall into a few broad categories, as described in the following sections.
When To Use Mod_Rewrite: SEO-Friendly URLs
Perhaps the most common use of mod_rewrite is to make ugly URLs more SEO-friendly and attractive. The reasons someone might wish to do this vary. Mostly, it's so that the URL is easier to type, easier to remember, easier to tell someone over the phone, easier to put into print - in short, easier.
There are also people who believe that URLs that do not contain question marks, ampersands, and other "special characters" will necessarily appear higher in the rankings on search engines. This is, for the most part, completely untrue. However, a large number of firms billing themselves as "search engine optimization" companies have made large sums of money by persuading people otherwise.
These types of URL rewritings will often be referred to as "clean" URLs, or perhaps as "permalinks" by various software packages. Permalinks, for example, will often remove an ID number in a URL (e.g., http://www.example.com/index.php?p=985) and make it more user-friendly (e.g., http://www.example.com/seo-friendly-urls). How one URL actually gets translated into the other one is of no concern to the end user, who only really cares that they receive the article they wanted to read.
When To Use Mod_Rewrite: Mass Virtual Hosting
When you have two or three virtual hosts, manually writing out a <VirtualHost> configuration block for each one is not a big problem. By the time you have a few hundred of them, not only does it become cumbersome to maintain the configuration for all of them, but it also makes Apache take a long time to start up, as it has to load every one of those blocks.
Many people use mod_rewrite to dynamically translate a hostname into a directory path, and are thus able to have an arbitrary number of virtual hosts with a single line in the configuration file. This imposes a number of limitations. In particular, each virtual host has to be identical, in terms of where its document root is located and what options are enabled. But for most ISPs, this is a reasonable limitation, since they have a standard way to set up new customers, and they want those customers to be as similar as possible in order to simplify maintenance.
And whatever your physical directory structure is, you'll frequently want to have root-level URLs (such as http://www.example.com/events and http://www.harvard.edu/events), which in fact map to deeper levels in the physical directory structure. You can do this with a Redirect, or you can do it transparently using mod_rewrite. Which of these is "best" depends on a number of factors, many of which just boil down to preference.
When To Use Mod_Rewrite: Site Rearrangement
No matter how carefully you plan your website, you're going to have to redesign it some day. Part of that redesign is going to involve rearranging your directory structure. What seemed like a good idea a few years ago might turn out to be not so great today. However, you want your old URLs to keep working, because people have them bookmarked.
mod_rewrite will allow you to map your old URL structure to your new URL structure without having to have dozens of redirect statements all over the place. This assumes, of course, that both the former and new directory structures follow a certain logic, so that mapping one to the other is possible.
When To Use Mod_Rewrite: Conditional Changes
Many uses of mod_rewrite are conditional. That is, I want the rewrite to happen sometimes, but not always. These can be based on the time of day, the person who is accessing the website, the user's preferred language, or any other arbitrary criterion.
mod_rewrite allows you to base your rewrite rules on any condition you want to impose or any combination of criteria.
When To Use Mod_Rewrite: Other Stuff
As soon as you think you've heard every possible use of mod_rewrite, someone will ask for a set of rewrite rules to do something that you've never considered. The amazing thing is that, in most of these cases, there's a way to twist mod_rewrite to do what is desired. It's hard to categorize these weird examples, but We'll try to illustrate some of them as we proceed through the Tutorial and Course for Mod_Rewrite.
Mod_Rewrite: When Not To Use Mod_Rewrite
As important as knowing when and how to use mod_rewrite is having a firm grasp on what other tools Apache offers, so that you know when not to use mod_rewrite. All of mod_rewrite's amazing power comes at the cost of performance. Running regular expressions consumes time and memory, and it's ideal to avoid it if alternate approaches are available. However, even when there are one or more alternate approaches, it is seldom the case that one option is clearly the best one to use all the time. There are always a number of factors that you need to consider.
Just as there are several categories in which mod_rewrite use tends to fall, there are also several categories into which common misuse of mod_rewrite falls, as we'll cover in the following sections.
When Not To Use Mod_Rewrite: Simple Redirection
Probably the most common misuse of mod_rewrite is for simple redirection. Redirection is used when a client requests one URL, and we want to give them a different one instead. In many cases, this is a simple one-to-one mapping. That is, it could be a mapping of one URL to another URL, or perhaps one directory to another directory, and sometimes even a mapping of one virtual host to another one, or perhaps to another server entirely.
In each of these cases, the Redirect directive is sufficient. The syntax of the Redirect directive is as follows:
Redirect [Original] [Target]
Where [Original] is the URL that was originally requested, and [Target] is the fully qualified URL to which you wish to redirect it. When the user requests the original URL, Apache will send a redirection message back to the browser, which will then request the new URL. The address appearing in the address bar of the user's browser will change to the new URL. This approach requires a second round-trip to the web server in order to retrieve the content.
The advantage of this approach, in addition to simplicity, is that the new corrected URL is announced to the user (who may or may not notice), but also that an automated process such as a search engine indexer will update its records to reflect the new URL and stop requesting the old one.
Several examples of the Redirect directive follow:
Redirect /index.cfm http://www.example.com/index.php
In this example, only one possible URL is redirected. That is, if someone requests http://www.example.com/index.cfm, they will be sent instead to http://www.example.com/index.php, but no other URLs will be affected.
In this next example, we've renamed our /pics/ directory to /images/ instead, and we want all requests for things in /pics/ to go to /images/ instead:
Redirect /pics/ http://www.example.com/images/
The Redirect directive is able to redirect an entire directory prefix, not just a fully qualified URI. Thus, in this example, a request for http://www.example.com/pics/camel.jpg will be redirected to http://www.example.com/images/camel.jpg as desired.
The following example is simply a special case of the previous example:
Redirect / http://other.example.com/
This is what you'd use if your website moved entirely to another website. Using this example, all URLs requested from http://www.example.com (assuming this directive appears in the configuration file for www.example.com) will be sent instead to http://other.example.com. One final special case of this follows:
Redirect / https://www.example.com/
This rule should be used with care. The goal here is to redirect all requests to http://www.example.com/, and any subcontent thereof to https://www.example.com/ - that is, to require that all access to the site be via SSL. It is important to note that the directive must appear in the non-SSL virtual host for this domain. Putting it somewhere else could result in an infinite redirection loop. That is, every request would be redirected to itself, and then redirected to itself again, and so on, until the browser gets frustrated and throws an error message.
When Not To Use Mod_Rewrite: More Complicated Redirects
For more complicated redirects, the RedirectMatch directive is available. RedirectMatch is a partway point between a standard Redirect and a RewriteRule. It allows you to do redirects in the normal way, but apply a regular expression to the requested URL, rather than having it be a fixed string.
RedirectMatch allows for quite complex redirections and is often a very acceptable solution to many problems for which you might be tempted to use mod_rewrite.
Several examples follow:
RedirectMatch (.*)\.gif http://images.example.com$1.png
In this example, we've taken all of our GIF files, converted them to PNG files, and moved them to another server. This RedirectMatch directive is able to use backreferences to retain the entire requested URI path and use that path to request the same image over on the other server.
Using RedirectMatch is going to be slower than using Redirect. However, it is marginally faster than using RewriteRule in the tests that we've performed.
When Not To Use Mod_Rewrite: Virtual Hosts
As mentioned earlier, mod_rewrite can be used to produce dynamic virtual hosts. But just because you can do this doesn't mean you should. You should consider using standard virtual hosts, as well as possibly using mod_vhost_alias, before using mod_rewrite.
mod_vhost_alias provides a hostname-to-directory mapping so that virtual hosts can be added without changing the configuration file. Although this approach is less flexible than using mod_rewrite, it is possible that it will be sufficient for your needs.
When Not To Use Mod_Rewrite: Other Stuff
Of course, we can't give a formula for when to use mod_rewrite and when not to. But we can tell you what you need to do when faced with a situation where mod_rewrite appears to be an option: consider first whether you're just doing a simple Redirect or perhaps a plain ProxyPass.
Removing mod_rewrite from a scenario removes complexity and thus makes things run faster. You should consider mod_rewrite as a last solution, rather than as the first tool you reach for in your toolbox.
It's also important to understand that mod_rewrite was written in 1996, when Apache was still rather limited. Ralf Engelschall wrote the module to solve problems that had no other solution. Many of the mod_rewrite tutorials that you may find online come from that era and don't take into consideration the fact that many of these problems now have easier solutions with standard Apache configuration directives that didn't exist in 1996. So, even if you encounter an example in a mod_rewrite tutorial or how-to somewhere, this doesn't necessarily mean that it's the best way to handle the problem.