Search Engine Friendly URLs

Category: PHP
Reviewed by: redemption   
Reviewed on: Oct 11 2003
» Discuss this topic ( Posts)

Today we're going to talk about Search Engine Friendly URLs, also known as SEF URLs. Now this is a topic that has lost much of its initial popularity since it became apparent that Google was capable of parsing pages that use regular PHP URLs so long as you don't have session variables in the URL - however, I feel that SEF URLs are STILL a very useful for other search engins and double as a user friendly technique for certain areas of your sites.

Search Engine Friendly URLs: What are they, and why do we want them?

So what does an SEF URL look like? Below are some examples:

http://www.domain.com/gallery.php/cat/15/id/112/
http://www.domain.com/articles/112/
http://www.domain.com/news/20030405.html
http://www.domain.com/forums/forumid/15/threadid/2563/messageid/531
http://www.domain.com/forums/forumid/15/page/12/

The first thing about SEF URLs is that they don't contain any of the question marks (?) or ampersands(&) characteristic of a CGI or other dynamic script. Back in the 90's this was important because those particular characters signalled to a search engine that the page was a live dynamic script and might not be worth spidering because it might change at any moment. Take for instance the last URL on the list that I showed ealier - we can deduce that we're browsing page 12 of forum id 15. Now the problem is that the contents of page 12 will change over time, so even if a person were to find that page on a search engine the content that they were searching for may no longer be there. The flipside of the problem is that there are some pages that look dynamic, but which aren't. Take the second to last URL up there for instance, it presumably shows message 531 in a forum - chances are that message 531 will be the same even months after it was spidered. So what was happening was that STATIC content (ie a specific message, or article, or news post) was being ignored by search engine spiders because they assumed that any CGI generated pages would contain ever changing content. Note the distinction between dynamic CONTENT and dynamic PAGEs... the former implies that a certain URL's text and content will change over time, while the latter indicates that a PAGE itself must be generated on the fly, whether the core content changed significantly or not.

So the very initial need of SEF URL's was the need to encourage search engines to include dynamic pages into their databases. However, with the majority of users using Google as their search engine either directly or indirectly, the problem is not as pronounced as it used to be. Google WILL successfully store pages who's URLs were once taboo. The reason why we still WANT search engine friendly URLs is because some of the other engines still prefer them, and also because it's a more human readable.