Search engines reacting to a new WordPress permalink structure
Tuesday, July 18th, 2006This blog originally started with the default permalink structure: to reach a post, the URL would contain the ID number of that post, like this: http://www.randomsynapses.org/?p=43 . Actually, it started with a different URL, as a subdomain. It’s only been 11 months since the blog got its own domain name. Anyway, the links are only examples, take ‘em that way.
After the first few posts, I decided on using a more intuitive structure which included the date and the post “slug” (the title in lowercase, with dashes instead of spaces), like this: http://www.randomsynapses.org/2005-10-23/fixing-western-digital-scorpio-clicking-noise/ .
This is how Google and the other search spiders indexed this site, and how I posted links to my own contents on various sites and forums over nearly two years (one year, considering the current domain name).
About a week ago I became annoyed with the date being included in the link. Ok, it helps having it there as an obvious indicator of how old a post is; but it really looks ugly. So I thought, why not just change the permalink structure? What can go wrong? My two “fans” subscribed to this blog in BlogLines not finding my posts? The forums that don’t draw many visitors here having wrong URLs? I had little to lose, but a lot to learn: let’s see how fast search engines catch up with the new structure, which now looks like: http://www.randomsynapses.org/fixing-western-digital-scorpio-clicking-noise/ .
By the way, all three URLs exemplified so far point to the same post. The second one no longer works because of the new permalink structure in WordPress, but I’ll get to that a bit later.
To make things more complicated, I have a custom 404 page that redirects the browser to the main “index”, so to speak, in case a visitor tries to access an inexistent address. Although the web server should clearly indicate this with the 404 message header, I’m not exactly sure if search engines pick it up or just follow the redirect in a meta tag. Normally, search engines should identify 404 messages and drop those pages from the database. So my old URLs should disappear from Google fairly quickly, while the new URLs are being added. Just to make sure, to force search engines to drop the old URLs from the index, I added specific lines in robots.txt file to deny access to /2006-*-*/ for instance — that would match all URLs for posts published this year.
But I didn’t stop here. Google has a really neat service for webmasters, Webmaster Help Center, where you can get some neat stuff done with the sites you manage. I had interest in the way Google used my robots.txt file to decide what to index and what to skip, and in the Google Sitemaps service, as well as some statistics about what Google “sees” on your site. So I added my site to Sitemaps, confirmed it by placing a meta tag in the head section as instructed, installed Google Sitemaps Generator plugin for WordPress, generated a sitemap file and had Google use it.
And then, I waited, watching my web host’s recent access log every couple of days and Google’s tools for webmasters.
At first, the blog would get a lot of hits on old URLs with the former permalink structure, including the date. Because of WordPress’ current design, once I changed the permalink structure it no longer recognizes the parameters passed through the old structure, so all visits to old URLs were getting the 404 page and getting redirected to the main page of the blog. This probably has pissed off or turned away some people which were not getting the contents they were looking for.
It was search spiders I was interested in, not real visitors. Google, MSN and Yahoo seemed determined to hit, verify and reindex URLs with the old permalink structure. But a day or two after the switch, they all started requesting my robots.txt like mad, and indexing the most recent posts with the new permalink structure. Day after day, they kept requesting old and new URLs, with a clear trend of increasing frequency for new structure and decreasing frequency to old URLs. A promising start, isn’t it?
It’s true that Google still shows the old structure in search results and keeps a cached copy, although I added a meta tag to prevent that. It will take some time for this to sink in, apparently. See for yourself, search for wd400ve-75hdt0 — the post linked above appears 8th from the top.
Curiously enough, Google’s stats show me that it got over 25 “Page not found” (404) messages for old URLs, but also that on its most recent visit it could not find the robots.txt file on the site. Say what?
Before I close, I’m curious if there is a WordPress plugin of sorts that helps smoothing out the transition to a new permalink structure. I imagine I could do it by fiddling with the .htaccess rewrite rules, but is there a neat, clean, easy way of doing it by extending WordPress’ permalink functionality with a plugin?
[Edit] Yes, there is: Dean Lee’s Permalinks Migration Plugin. Now you don’t have to go through my pain.

