PHP中GBK和UTF8编码处理(中文,韩文)
一、编码范围
1.gbk (gb2312/gb18030)
x00-xff gbk双字节编码范围
x20-x7f ascii
xa1-xff 中文
x80-xff 中文
2.utf-8 (unicode)
u4e00-u9fa5 (中文)
x3130-x318f (韩文)
xac00-xd7a3 (韩文)
u0800-u4e00 (日文)
ps: 韩文是大于[u9fa5]的字符
正则例子:
preg_replace("/([x80-xff])/","",$str);
preg_replace("/([u4e00-u9fa5])/","",$str);
二、代码例子
//判断内容里有没有中文-gbk (php教程) function check_is_chinese( $s ){ return preg_match( '/[x80-xff]./' , $s ); } //获取字符串长度-gbk (php) function gb_strlen( $str ){ $count = 0; for ( $i =0; $i < strlen ( $str ); $i ++){ $s = substr ( $str , $i , 1); if (preg_match( "/[x80-xff]/" , $s )) ++ $i ; ++ $count ; } return $count ; } //截取字符串字串-gbk (php) function gb_substr( $str , $len ){ $count = 0; for ( $i =0; $i < strlen ( $str ); $i ++){ if ( $count == $len ) break ; if (preg_match( "/[x80-xff]/" , substr ( $str , $i , 1))) ++ $i ; ++ $count ; } return substr ( $str , 0, $i ); } //统计字符串长度-utf8 (php) function utf8_strlen( $str ) { $count = 0; for ( $i = 0; $i < strlen ( $str ); $i ++){ $value = ord( $str [ $i ]); if ( $value > 127) { $count ++; if ( $value >= 192 && $value <= 223) $i ++; elseif ( $value >= 224 && $value <= 239) $i = $i + 2; elseif ( $value >= 240 && $value <= 247) $i = $i + 3; else die ( 'not a utf-8 compatible string' ); } $count ++; } return $count ; } //截取字符串-utf8(php) function utf8_substr( $str , $position , $length ){ $start_position = strlen ( $str ); $start_byte = 0; $end_position = strlen ( $str ); $count = 0; for ( $i = 0; $i < strlen ( $str ); $i ++){ if ( $count >= $position && $start_position > $i ){ $start_position = $i ; $start_byte = $count ; } if (( $count - $start_byte )>= $length ) { $end_position = $i ; break ; } $value = ord( $str [ $i ]); if ( $value > 127){ $count ++; if ( $value >= 192 && $value <= 223) $i ++; elseif ( $value >= 224 && $value <= 239) $i = $i + 2; elseif ( $value >= 240 && $value <= 247) $i = $i + 3; else die ( 'not a utf-8 compatible string' ); } $count ++; } return ( substr ( $str , $start_position , $end_position - $start_position )); } //字符串长度统计-utf8 [中文3个字节,俄文、韩文占2个字节,字母占1个字节] (ruby) def utf8_string_length(str) temp = cgi::unescape(str) i = 0; j = 0; temp.length.times{|t| if temp[t] < 127 i += 1 elseif temp[t] >= 127 and temp[t] < 224 j += 1 if 0 == (j % 2) i += 2 j = 0 end else j += 1 if 0 == (j % 3) i +=2 j = 0 end end } return i } //判断是否是含有韩文-utf-8 (网页特效) function checkkoreachar(str) { for (i=0; i<str.length; i++) { if (((str.charcodeat(i) > 0x3130 && str.charcodeat(i) < 0x318f) || (str.charcodeat(i) >= 0xac00 && str.charcodeat(i) <= 0xd7a3))) { return true; } } return false; } //开源代码phpfensi测试数据 //判断是否有中文字符-gbk (javascript) function check_chinese_char(s){ return (s.length != s.replace(/[^x00-xff]/g, "**" ).length); }查看更多关于PHP中GBK和UTF8编码处理(中文,韩文) - php函数的详细内容...
声明:本文来自网络,不代表【好得很程序员自学网】立场,转载请注明出处:http://haodehen.cn/did30864