Clean urls through readable slugs in PHP
In these wonderful times of compulsory festivities, this will be the last article of 2008. Since most of you will probably be drunk tonight and terribly hung-over by tomorrow, I figured this would be a good time to provide a simple, yet effective, PHP how-to. Happy new year!
Clean URLs for SEO
No rocket science here, nor the re-invention of the wheel. It's been said tons of times that clean URLs are an essential part in the process of search engine optimization. It's not that hard to implement using apache's mod_rewrite and there are a lot of good how-to's.
Readable URLs for the user
This is where the fun begins of course. How many times have you been confronted with someone sending you an indecipherable, thus untrustworthy link? Right, so we agree that for a user, it is important to have a clean URL that is readable and includes the title of the page or (at least) some description related to the content. Slug time!
The Slug
A slug is a short URL-friendly description of the content on your page. Mostly, slugs are derived from the page title or article title to make the end-result a URL-friendly and descriptive piece of text that can be used as part of the links to your article. An example:
This title: Clean URLs through readable slugs in PHP ... becomes this url: http://www.socialgeek.be/blog/read/clean-urls-through-readable-slugs-in-php
At first sight, this looks like a fairly simple task in php. Make everything lowercase and replace the spaces with dashes, right? Not exactly. This article gives a good view on what is and what is not allowed in a URL. Here is my short version:
- All lowercase
- Only alphanumeric characters
Nothing we can't handle with PHP and some regular expressions! Here we go.
function toSlug($string,$space="-") {
$string = preg_replace("/[^a-zA-Z0-9 -]/", "", $string);
$string = strtolower($string);
$string = str_replace(" ", $space, $string);
return $string;
}
As you can see, the function takes two parameters: the string to convert (duh!) and the character to use as a replacement for spaces. The dash is nothing but a personal preference here, I am fully aware that a lot of people rather use an underscore.
This function works pretty straightforward. It removes all non-alphanumeric characters using the preg_replace function. Next, it converts all characters to lowercase with the strtolower function. To finish it all off, it replaces all spaces with our predefined replacement character using str_replace.
We can do better than this
One can quickly see that a lot of characters will disappear from the URL. Imagine some accented characters (è, ë etc.) in the page title. These will be removed by the first line of the function since they are not exactly url-friendly. What we actually want is to convert these characters to their base-character, meaning that è would become a regular e etc. Make way for iconv!
function toSlug($string,$space="-") {
if (function_exists('iconv')) {
$string = @iconv('UTF-8', 'ASCII//TRANSLIT', $string);
}
$string = preg_replace("/[^a-zA-Z0-9 -]/", "", $string);
$string = strtolower($string);
$string = str_replace(" ", $space, $string);
return $string;
}
Since not all php configurations have iconv installed, it's a wise thing to check for the existence of iconv first. Next, we just call the iconv function itself (notice the @ operator to suppress warning messages). What iconv does is converting a string from one encoding to another. The nice thing here is the "//TRANSLIT" part. This makes sure that when a certain character is not found in the target-encoding, it is replaced by a similar looking character.
That's all folks
So there you have it. A simple, yet effective way to convert your strings to slugs. Should you include this in you applications, I have one more tip for you. Make a convert-class and make the toSlug function static. This way you can have nice looking code like this:
$slug = Convert::toSlug($title);
Where do we go now?
You can browse through the recent articles or go to the archive for older items.
There are 21 comments for this article
Nicruo on Dec 31, 2008
Thanks Jonas, for the wonderful article. I've been looking for something like this but it was always complex and heavy. This is the way to go. Thanks again.
Hans, Robarov on Dec 31, 2008
Nice way to deal with slugs! Few questions though...
How about:
Jonas on Dec 31, 2008
Thanks for the positive responses.
$string = ereg_replace(" +", $space, $string);
Hans, Robarov on Dec 31, 2008
One more important issue: You have to remember to set locale to some unicode encoding to make iconv handle //TRANSLIT correctly!
Jonas on Dec 31, 2008
Indeed. A good server configuration should of course already have this as a default setting.
Nevertheless, should you run into trouble, this should fix it:
setlocale(LC_ALL, 'en_US.UTF8');
Stefan Gehrig on Jan 1, 2009
Just to add something very useful for using the
iconv()-approach on a Windows server: Theiconv()-approach is not usable on Windows machines because of limitations withsetlocale()imposed by the Windows CRT. So e.g.which is the Windows equivalent of
on Linux will not work at all!
Please see http://sgehrig.wordpress.com/2008/12/08/update-on-strcoll-utf-8-issue/ and http://sgehrig.wordpress.com/2008/09/24/on-how-to-sort-an-array-of-utf-8-strings/ for details on this.
Jonas on Jan 2, 2009
Thanks Stefan, I wasn't aware of that issue. Then again ... I never use Windows as a PHP server :)
CHelmertz on Jan 7, 2009
A tiny prettyfication: use strtolower() before preg_replace() and remove the capitals from the pattern. I told you it was tiny :)
Bryan on Mar 5, 2009
Great article, thanks.
One note, iconv can't translit certain characters and will halt and throw a warning when it reaches one of these. As a patch, I have changed the line to:
$string = @iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);This will obviously translit everything it can, and ignore what it can't but won't halt early. Other then that, works like a charm.
I have a question for everyone though, with a slug'd URL, how do you link that back to a database record? Doing a string lookup? I insert a UID at the beginning of the title myself... better ways?
Bryan on Mar 5, 2009
Also, to address the multi-space comment question:
Change the last str_replace line with:
$string = preg_replace('/[\s]+/'. '-', $string);That'll replace all multiple spaces with a single dash. You can run another preg to remove double dashes as a result of- turning into --.
$string = preg_replace('/-{2,}/', '-', $string);Bryan on Mar 5, 2009
Last comment didn't finish posting... to continue:
Taking both these, we can combine it into a single call:
$string = preg_replace('/[\s-]+/', '-', $string);Take any combination of one or more spaces and dashes, and replace it with a single dash. That should solve everyone's problems.
Jonas on Mar 5, 2009
@Bryan: What I prefer is to store the slug in its own database record. This allows you to generate slugs automatically and to have user-tweaked slugs without too much extra hassle. Furthermore, it keeps your URL really clean.
Bryan on Mar 5, 2009
Thanks for the reply Jonas. That's a great solution, especially giving the user the ability to tweak it.
thomas on Apr 30, 2009
Hi all,
Really nice article
I tried it but ....
I am on Mac Os X 10.5 (but I reproduced the issue on 10.4)
and characters like é or à are turned into :
è => 'e
à => `a
I tried it at the commad line
iconv -f UTF-8 -t ASCII//TRANSLIT//IGNORE myutf8file.txt
which works fine on a Linux machine
but not on my local mac os x machine (locale LC_ALL="fr_FR.UTF-8")
I really dont understand why iconv returns this weird output on mac os x but all is fine on linux
any help ? or directions ?
thanks in advance
osu on Jul 19, 2009
Thanks for this interesting article.
However, I'm not sure how i can implement it? How would I affect the slug from within a php script? I thought you have to do it from within an .htaccess file using mod_rewrite?
Thanks
canon on Sep 2, 2009
thank to Bryan, i have use this function and found sometimes multiple dashes.
Diego on Oct 12, 2009
Here's a quick and dirt one:
$url_Freindly = preg_replace("[\W]", "-", strtolower($title));charline on Dec 10, 2009
Thank to everyone; using all your comment I've done this :
Christoph on Jan 27, 2010
Charlines code with the respect for $space and strtolower:
function toSlug($string,$space="-") {if (function_exists('iconv')) {$string = @iconv('utf-8', 'us-ascii//TRANSLIT', $string);}$string = strtolower($string);$string = preg_replace("/[^a-z0-9 -]/", "", $string);$string = preg_replace('/[\s]+/', $space, $string);$string = trim($string, " \t. -");`return $string;}`Weblap.ro on Mar 6, 2010
function remove_accent($str)}function post_slug($str){}Weblap.ro on Mar 6, 2010
function remove_accent($str) { $a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ÿ','Ā','ā','Ă','ă','Ą','ą','Ć','ć','Ĉ','ĉ','Ċ','ċ','Č','č','Ď','ď','Đ','đ','Ē','ē','Ĕ','ĕ','Ė','ė','Ę','ę','Ě','ě','Ĝ','ĝ','Ğ','ğ','Ġ','ġ','Ģ','ģ','Ĥ','ĥ','Ħ','ħ','Ĩ','ĩ','Ī','ī','Ĭ','ĭ','Į','į','İ','ı','IJ','ij','Ĵ','ĵ','Ķ','ķ','Ĺ','ĺ','Ļ','ļ','Ľ','ľ','Ŀ','ŀ','Ł','ł','Ń','ń','Ņ','ņ','Ň','ň','ʼn','Ō','ō','Ŏ','ŏ','Ő','ő','Œ','œ','Ŕ','ŕ','Ŗ','ŗ','Ř','ř','Ś','ś','Ŝ','ŝ','Ş','ş','Š','š','Ţ','ţ','Ť','ť','Ŧ','ŧ','Ũ','ũ','Ū','ū','Ŭ','ŭ','Ů','ů','Ű','ű','Ų','ų','Ŵ','ŵ','Ŷ','ŷ','Ÿ','Ź','ź','Ż','ż','Ž','ž','ſ','ƒ','Ơ','ơ','Ư','ư','Ǎ','ǎ','Ǐ','ǐ','Ǒ','ǒ','Ǔ','ǔ','Ǖ','ǖ','Ǘ','ǘ','Ǚ','ǚ','Ǜ','ǜ','Ǻ','ǻ','Ǽ','ǽ','Ǿ','ǿ'); $b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O','U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o','o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E','e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i','I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n','O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T','t','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o','U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o'); return str_replace($a, $b, $str); }
function post_slug($str) { return strtolower(preg_replace(array('/[^a-zA-Z0-9 -]/', '/[ -]+/', '/^-|-$/'), array('', '-', ''), remove_accent($str))); }