从大的编码转换到小的编码,是会产生字符缩减的,甚至有些字在目标字符集里不存在。
比如gb2312里,“囧”、“镕”都会产生转换失败,但很奇怪,gb2312的网页里是有这样的文字的,有点不解。
先看看官方的说明吧 iconv
看示例可知
在输出字符里添加//TRANSLIT可以得到相近的意思的字符
添加//IGNORE可以忽略不能转换的
不添加,则在第一个错误的地方停止转换,也就是被截断了
如何知道被截断了或者转换失败?
有个例子可以参考一下(单位是字节还是字符,这里不确定)
原创内容如转载请注明:来自 阿权的书房
比如gb2312里,“囧”、“镕”都会产生转换失败,但很奇怪,gb2312的网页里是有这样的文字的,有点不解。
先看看官方的说明吧 iconv
引用
out_charset
The output charset.
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character and an E_NOTICE is generated.
The output charset.
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character and an E_NOTICE is generated.
看示例可知
在输出字符里添加//TRANSLIT可以得到相近的意思的字符
添加//IGNORE可以忽略不能转换的
不添加,则在第一个错误的地方停止转换,也就是被截断了
如何知道被截断了或者转换失败?
有个例子可以参考一下(单位是字节还是字符,这里不确定)
<?
//code from http://www.aslibra.com
//code by hqlulu @ 2010-1-25
$title_origin = "something";
$title = iconv('utf-8', 'gb2312//IGNORE', $title_origin);
$percent = round(strlen($title_origin)/strlen($title), 3);
//UTF-8汉字 3字节 gb2312汉字 2字节
//最大比例为1.5,如果丢失文字,那就是有转换失败,并且比例变大
//简单例子:“我”的urlencode值 = %E6%88%91(utf-8) = %CE%D2(gb2312)
if($percent > 1.5 ){
$error[] = array('str', $title_origin);
}
?>
//code from http://www.aslibra.com
//code by hqlulu @ 2010-1-25
$title_origin = "something";
$title = iconv('utf-8', 'gb2312//IGNORE', $title_origin);
$percent = round(strlen($title_origin)/strlen($title), 3);
//UTF-8汉字 3字节 gb2312汉字 2字节
//最大比例为1.5,如果丢失文字,那就是有转换失败,并且比例变大
//简单例子:“我”的urlencode值 = %E6%88%91(utf-8) = %CE%D2(gb2312)
if($percent > 1.5 ){
$error[] = array('str', $title_origin);
}
?>
原创内容如转载请注明:来自 阿权的书房
收藏本文到网摘
用KeePass来管理你的密码
emule服务器搭建

PS:博主,你的页面在FIREFOX下,上面都走型了
UTF8是会变长的,同样的文字,存储空间需要更多