为了账号安全,请及时绑定邮箱和手机立即绑定

PHP:将Unicode代码点转换为UTF-8

/ 猿问

PHP:将Unicode代码点转换为UTF-8

PHP
HUH函数 2019-10-18 14:44:37

我的数据采用以下格式:U+597D或类似U+6211。我想将它们转换为UTF-8(原始字符是好和我)。我该怎么做?



查看完整描述

3 回答

?
繁星淼淼

$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $string), ENT_NOQUOTES, 'UTF-8');

这可能是最简单的解决方案。


查看完整回答
反对 回复 2019-10-18
?
MMTTMM

function utf8($num)

{

    if($num<=0x7F)       return chr($num);

    if($num<=0x7FF)      return chr(($num>>6)+192).chr(($num&63)+128);

    if($num<=0xFFFF)     return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);

    if($num<=0x1FFFFF)   return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128).chr(($num&63)+128);

    return '';

}


function uniord($c)

{

    $ord0 = ord($c{0}); if ($ord0>=0   && $ord0<=127) return $ord0;

    $ord1 = ord($c{1}); if ($ord0>=192 && $ord0<=223) return ($ord0-192)*64 + ($ord1-128);

    $ord2 = ord($c{2}); if ($ord0>=224 && $ord0<=239) return ($ord0-224)*4096 + ($ord1-128)*64 + ($ord2-128);

    $ord3 = ord($c{3}); if ($ord0>=240 && $ord0<=247) return ($ord0-240)*262144 + ($ord1-128)*4096 + ($ord2-128)*64 + ($ord3-128);

    return false;

}

utf8()和uniord()尝试在php上镜像chr()和ord()函数:


echo utf8(0x6211)."\n";

echo uniord(utf8(0x6211))."\n";

echo "U+".dechex(uniord(utf8(0x6211)))."\n";


//In your case:

$wo='U+6211';

$hao='U+597D';

echo utf8(hexdec(str_replace("U+","", $wo)))."\n";

echo utf8(hexdec(str_replace("U+","", $hao)))."\n";

输出:


25105

U+6211


查看完整回答
反对 回复 2019-10-18
?
守候你守候我

我只是polyfill针对缺少的多字节版本编写了,ord并chr牢记以下几点:


它定义了函数,mb_ord并且mb_chr仅当它们不存在时才定义。如果它们确实存在于您的框架或PHP的将来版本中,则polyfill将被忽略。


它使用广泛使用的mbstring扩展名进行转换。如果mbstring未加载该扩展名,它将使用该iconv扩展名。


我还添加了用于HTMLentity编码/解码和编码/解码为JSON格式的功能,以及一些有关如何使用这些功能的演示代码


if (!function_exists('codepoint_encode')) {

    function codepoint_encode($str) {

        return substr(json_encode($str), 1, -1);

    }

}


if (!function_exists('codepoint_decode')) {

    function codepoint_decode($str) {

        return json_decode(sprintf('"%s"', $str));

    }

}


if (!function_exists('mb_internal_encoding')) {

    function mb_internal_encoding($encoding = NULL) {

        return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);

    }

}


if (!function_exists('mb_convert_encoding')) {

    function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {

        return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);

    }

}


if (!function_exists('mb_chr')) {

    function mb_chr($ord, $encoding = 'UTF-8') {

        if ($encoding === 'UCS-4BE') {

            return pack("N", $ord);

        } else {

            return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');

        }

    }

}


if (!function_exists('mb_ord')) {

    function mb_ord($char, $encoding = 'UTF-8') {

        if ($encoding === 'UCS-4BE') {

            list(, $ord) = (strlen($char) === 4) ? @unpack('N', $char) : @unpack('n', $char);

            return $ord;

        } else {

            return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');

        }

    }

}


if (!function_exists('mb_htmlentities')) {

    function mb_htmlentities($string, $hex = true, $encoding = 'UTF-8') {

        return preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) use ($hex) {

            return sprintf($hex ? '&#x%X;' : '&#%d;', mb_ord($match[0]));

        }, $string);

    }

}


if (!function_exists('mb_html_entity_decode')) {

    function mb_html_entity_decode($string, $flags = null, $encoding = 'UTF-8') {

        return html_entity_decode($string, ($flags === NULL) ? ENT_COMPAT | ENT_HTML401 : $flags, $encoding);

    }

}

如何使用

echo "\nGet string from numeric DEC value\n";

var_dump(mb_chr(25105));

var_dump(mb_chr(22909));


echo "\nGet string from numeric HEX value\n";

var_dump(mb_chr(0x6211));

var_dump(mb_chr(0x597D));


echo "\nGet numeric value of character as DEC int\n";

var_dump(mb_ord('我'));

var_dump(mb_ord('好'));


echo "\nGet numeric value of character as HEX string\n";

var_dump(dechex(mb_ord('我')));

var_dump(dechex(mb_ord('好')));


echo "\nEncode / decode to DEC based HTML entities\n";

var_dump(mb_htmlentities('我好', false));

var_dump(mb_html_entity_decode('&#25105;&#22909;'));


echo "\nEncode / decode to HEX based HTML entities\n";

var_dump(mb_htmlentities('我好'));

var_dump(mb_html_entity_decode('&#x6211;&#x597D;'));


echo "\nUse JSON encoding / decoding\n";

var_dump(codepoint_encode("我好"));

var_dump(codepoint_decode('\u6211\u597d'));

输出量

Get string from numeric DEC value

string(3) "我"

string(3) "好"


Get string from numeric HEX value

string(3) "我"

string(3) "好"


Get numeric value of character as DEC string

int(25105)

int(22909)


Get numeric value of character as HEX string

string(4) "6211"

string(4) "597d"


Encode / decode to DEC based HTML entities

string(16) "&#25105;&#22909;"

string(6) "我好"


Encode / decode to HEX based HTML entities

string(16) "&#x6211;&#x597D;"

string(6) "我好"


Use JSON encoding / decoding

string(12) "\u6211\u597d"

string(6) "我好"


查看完整回答
反对 回复 2019-10-18

添加回答

回复

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信