Programmed Under The Influence
Converting Non-English Characters in PHP

In my previous post, Removing Non-English Characters in PHP, I provided a way to remove non-english characters, but I found another neat trick to convert these characters into a close ASCII equivalent. Here’s the code:

function unaccent($string) {
    if (strpos($string = htmlentities($string, ENT_QUOTES, 'UTF-8'), '&') !== false) {
        $string = html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1', $string), ENT_QUOTES, 'UTF-8');
    }

    return $string;
}
…and here’s the source.

  1. skrolikowski posted this