Clean urls through readable slugs in PHP

In these wonderful times of compulsory festivities, this will be the last article of 2008. Since most of you will probably be drunk tonight and terribly hung-over by tomorrow, I figured this would be a good time to provide a simple, yet effective, PHP how-to. Happy new year!

slug

Clean URLs for SEO

No rocket science here, nor the re-invention of the wheel. It's been said tons of times that clean URLs are an essential part in the process of search engine optimization. It's not that hard to implement using apache's mod_rewrite and there are a lot of good how-to's.

Readable URLs for the user

This is where the fun begins of course. How many times have you been confronted with someone sending you an indecipherable, thus untrustworthy link? Right, so we agree that for a user, it is important to have a clean URL that is readable and includes the title of the page or (at least) some description related to the content. Slug time!

The Slug

A slug is a short URL-friendly description of the content on your page. Mostly, slugs are derived from the page title or article title to make the end-result a URL-friendly and descriptive piece of text that can be used as part of the links to your article. An example:

	This title:
	Clean URLs through readable slugs in PHP
	...
	becomes this url:
	http://www.socialgeek.be/blog/read/clean-urls-through-readable-slugs-in-php

At first sight, this looks like a fairly simple task in php. Make everything lowercase and replace the spaces with dashes, right? Not exactly. This article gives a good view on what is and what is not allowed in a URL. Here is my short version:

  • All lowercase
  • Only alphanumeric characters

Nothing we can't handle with PHP and some regular expressions! Here we go.

	function toSlug($string,$space="-") {
	    $string = preg_replace("/[^a-zA-Z0-9 -]/", "", $string);
	    $string = strtolower($string);
	    $string = str_replace(" ", $space, $string);
	    return $string;
	}

As you can see, the function takes two parameters: the string to convert (duh!) and the character to use as a replacement for spaces. The dash is nothing but a personal preference here, I am fully aware that a lot of people rather use an underscore.

This function works pretty straightforward. It removes all non-alphanumeric characters using the preg_replace function. Next, it converts all characters to lowercase with the strtolower function. To finish it all off, it replaces all spaces with our predefined replacement character using str_replace.

We can do better than this

One can quickly see that a lot of characters will disappear from the URL. Imagine some accented characters (è, ë etc.) in the page title. These will be removed by the first line of the function since they are not exactly url-friendly. What we actually want is to convert these characters to their base-character, meaning that è would become a regular e etc. Make way for iconv!

	function toSlug($string,$space="-") {
		
	    if (function_exists('iconv')) {
	    	$string = @iconv('UTF-8', 'ASCII//TRANSLIT', $string);
	    }

	    $string = preg_replace("/[^a-zA-Z0-9 -]/", "", $string);
	    $string = strtolower($string);
	    $string = str_replace(" ", $space, $string);

	    return $string;

	}

Since not all php configurations have iconv installed, it's a wise thing to check for the existence of iconv first. Next, we just call the iconv function itself (notice the @ operator to suppress warning messages). What iconv does is converting a string from one encoding to another. The nice thing here is the "//TRANSLIT" part. This makes sure that when a certain character is not found in the target-encoding, it is replaced by a similar looking character.

That's all folks

So there you have it. A simple, yet effective way to convert your strings to slugs. Should you include this in you applications, I have one more tip for you. Make a convert-class and make the toSlug function static. This way you can have nice looking code like this:

	$slug = Convert::toSlug($title);

There are 21 comments for this article

Nicruo on Dec 31, 2008

Thanks Jonas, for the wonderful article. I've been looking for something like this but it was always complex and heavy. This is the way to go. Thanks again.

Hans, Robarov on Dec 31, 2008

Nice way to deal with slugs! Few questions though...

How about:

  • Titles with a trailing space? ("Trailing space test " -> "trailing-space-test-")
  • Titles with multiple spaces? ("multiple---space---test")
  • Titles with an ampersand? ("Coffee & cream" -> "coffee--cream")

Jonas on Dec 31, 2008

Thanks for the positive responses.

  • About the trailing spaces: a simple trim() before the last str_replace() should do the trick.
  • Considering multiple spaces and ampersands: You can easily write a regular expression that replaces all sequences of multiple spaces with a the selected character instead of the str_replace() function in the end:

$string = ereg_replace(" +", $space, $string);

Hans, Robarov on Dec 31, 2008

One more important issue: You have to remember to set locale to some unicode encoding to make iconv handle //TRANSLIT correctly!

Jonas on Dec 31, 2008

Indeed. A good server configuration should of course already have this as a default setting.

Nevertheless, should you run into trouble, this should fix it:

setlocale(LC_ALL, 'en_US.UTF8');

Stefan Gehrig on Jan 1, 2009

Just to add something very useful for using the iconv()-approach on a Windows server: The iconv()-approach is not usable on Windows machines because of limitations with setlocale() imposed by the Windows CRT. So e.g.

setlocale(LC_ALL 'German_Germany.65001');

which is the Windows equivalent of

setlocale(LC_ALL 'de_DE.UTF8');

on Linux will not work at all!

Please see http://sgehrig.wordpress.com/2008/12/08/update-on-strcoll-utf-8-issue/ and http://sgehrig.wordpress.com/2008/09/24/on-how-to-sort-an-array-of-utf-8-strings/ for details on this.

Jonas on Jan 2, 2009

Thanks Stefan, I wasn't aware of that issue. Then again ... I never use Windows as a PHP server :)

CHelmertz on Jan 7, 2009

A tiny prettyfication: use strtolower() before preg_replace() and remove the capitals from the pattern. I told you it was tiny :)

Bryan on Mar 5, 2009

Great article, thanks.

One note, iconv can't translit certain characters and will halt and throw a warning when it reaches one of these. As a patch, I have changed the line to:

$string = @iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);

This will obviously translit everything it can, and ignore what it can't but won't halt early. Other then that, works like a charm.

I have a question for everyone though, with a slug'd URL, how do you link that back to a database record? Doing a string lookup? I insert a UID at the beginning of the title myself... better ways?

Bryan on Mar 5, 2009

Also, to address the multi-space comment question:

Titles with multiple spaces? ("multiple---space---test") Titles with an ampersand? ("Coffee & cream" -> "coffee--cream")

Change the last str_replace line with:

$string = preg_replace('/[\s]+/'. '-', $string);

That'll replace all multiple spaces with a single dash. You can run another preg to remove double dashes as a result of - turning into --.

$string = preg_replace('/-{2,}/', '-', $string);

Bryan on Mar 5, 2009

Last comment didn't finish posting... to continue:

Taking both these, we can combine it into a single call:

$string = preg_replace('/[\s-]+/', '-', $string);

Take any combination of one or more spaces and dashes, and replace it with a single dash. That should solve everyone's problems.

Jonas on Mar 5, 2009

@Bryan: What I prefer is to store the slug in its own database record. This allows you to generate slugs automatically and to have user-tweaked slugs without too much extra hassle. Furthermore, it keeps your URL really clean.

Bryan on Mar 5, 2009

Thanks for the reply Jonas. That's a great solution, especially giving the user the ability to tweak it.

thomas on Apr 30, 2009

Hi all,

Really nice article

I tried it but ....

I am on Mac Os X 10.5 (but I reproduced the issue on 10.4)

and characters like é or à are turned into :

è => 'e

à => `a

I tried it at the commad line

iconv -f UTF-8 -t ASCII//TRANSLIT//IGNORE myutf8file.txt

which works fine on a Linux machine

but not on my local mac os x machine (locale LC_ALL="fr_FR.UTF-8")

I really dont understand why iconv returns this weird output on mac os x but all is fine on linux

any help ? or directions ?

thanks in advance

osu on Jul 19, 2009

Thanks for this interesting article.

However, I'm not sure how i can implement it? How would I affect the slug from within a php script? I thought you have to do it from within an .htaccess file using mod_rewrite?

Thanks

canon on Sep 2, 2009

thank to Bryan, i have use this function and found sometimes multiple dashes.

Diego on Oct 12, 2009

Here's a quick and dirt one:

$url_Freindly = preg_replace("[\W]", "-", strtolower($title));

charline on Dec 10, 2009

Thank to everyone; using all your comment I've done this :

if (function_exists('iconv')) {
$string = @iconv('utf-8', 'us-ascii//TRANSLIT', $string);
}
$string = strtolower($string);
$string = preg_replace("/[^a-zA-Z0-9 -]/", "", $string);
$string = preg_replace('/[\s]+/', '-', $string);
$string = trim($string, " \t. -");

Christoph on Jan 27, 2010

Charlines code with the respect for $space and strtolower:

function toSlug($string,$space="-") { if (function_exists('iconv')) { $string = @iconv('utf-8', 'us-ascii//TRANSLIT', $string); } $string = strtolower($string); $string = preg_replace("/[^a-z0-9 -]/", "", $string); $string = preg_replace('/[\s]+/', $space, $string); $string = trim($string, " \t. -"); ` return $string; }`

Weblap.ro on Mar 6, 2010

function remove_accent($str)

{

$a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ÿ','Ā','ā','Ă','ă','Ą','ą','Ć','ć','Ĉ','ĉ','Ċ','ċ','Č','č','Ď','ď','Đ','đ','Ē','ē','Ĕ','ĕ','Ė','ė','Ę','ę','Ě','ě','Ĝ','ĝ','Ğ','ğ','Ġ','ġ','Ģ','ģ','Ĥ','ĥ','Ħ','ħ','Ĩ','ĩ','Ī','ī','Ĭ','ĭ','Į','į','İ','ı','IJ','ij','Ĵ','ĵ','Ķ','ķ','Ĺ','ĺ','Ļ','ļ','Ľ','ľ','Ŀ','ŀ','Ł','ł','Ń','ń','Ņ','ņ','Ň','ň','ʼn','Ō','ō','Ŏ','ŏ','Ő','ő','Œ','œ','Ŕ','ŕ','Ŗ','ŗ','Ř','ř','Ś','ś','Ŝ','ŝ','Ş','ş','Š','š','Ţ','ţ','Ť','ť','Ŧ','ŧ','Ũ','ũ','Ū','ū','Ŭ','ŭ','Ů','ů','Ű','ű','Ų','ų','Ŵ','ŵ','Ŷ','ŷ','Ÿ','Ź','ź','Ż','ż','Ž','ž','ſ','ƒ','Ơ','ơ','Ư','ư','Ǎ','ǎ','Ǐ','ǐ','Ǒ','ǒ','Ǔ','ǔ','Ǖ','ǖ','Ǘ','ǘ','Ǚ','ǚ','Ǜ','ǜ','Ǻ','ǻ','Ǽ','ǽ','Ǿ','ǿ');

$b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O','U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o','o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E','e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i','I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n','O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T','t','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o','U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o');

return str_replace($a, $b, $str);

}

function post_slug($str)

{

return strtolower(preg_replace(array('/[^a-zA-Z0-9 -]/', '/[ -]+/', '/^-|-$/'), array('', '-', ''), remove_accent($str)));

}

Weblap.ro on Mar 6, 2010

function remove_accent($str) { $a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ÿ','Ā','ā','Ă','ă','Ą','ą','Ć','ć','Ĉ','ĉ','Ċ','ċ','Č','č','Ď','ď','Đ','đ','Ē','ē','Ĕ','ĕ','Ė','ė','Ę','ę','Ě','ě','Ĝ','ĝ','Ğ','ğ','Ġ','ġ','Ģ','ģ','Ĥ','ĥ','Ħ','ħ','Ĩ','ĩ','Ī','ī','Ĭ','ĭ','Į','į','İ','ı','IJ','ij','Ĵ','ĵ','Ķ','ķ','Ĺ','ĺ','Ļ','ļ','Ľ','ľ','Ŀ','ŀ','Ł','ł','Ń','ń','Ņ','ņ','Ň','ň','ʼn','Ō','ō','Ŏ','ŏ','Ő','ő','Œ','œ','Ŕ','ŕ','Ŗ','ŗ','Ř','ř','Ś','ś','Ŝ','ŝ','Ş','ş','Š','š','Ţ','ţ','Ť','ť','Ŧ','ŧ','Ũ','ũ','Ū','ū','Ŭ','ŭ','Ů','ů','Ű','ű','Ų','ų','Ŵ','ŵ','Ŷ','ŷ','Ÿ','Ź','ź','Ż','ż','Ž','ž','ſ','ƒ','Ơ','ơ','Ư','ư','Ǎ','ǎ','Ǐ','ǐ','Ǒ','ǒ','Ǔ','ǔ','Ǖ','ǖ','Ǘ','ǘ','Ǚ','ǚ','Ǜ','ǜ','Ǻ','ǻ','Ǽ','ǽ','Ǿ','ǿ'); $b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O','U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o','o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E','e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i','I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n','O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T','t','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o','U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o'); return str_replace($a, $b, $str); }

function post_slug($str) { return strtolower(preg_replace(array('/[^a-zA-Z0-9 -]/', '/[ -]+/', '/^-|-$/'), array('', '-', ''), remove_accent($str))); }

Add a comment

Where do we go now?

You can browse through the recent articles or go to the archive for older items.