python常见编码错误集合
妙音
posted @ 2018年5月17日 11:08
in python
, 1811 阅读
- 错误1:变量加载到内存出现编码错误
文件内容 a="中国" print a #报错 ➜ ~ python a.py File "a.py", line 1 SyntaxError: Non-ASCII character '\xe4' in file b.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
编译会读取变量a存到内存,因为包含中文,转化为str(ascii默认编码)报错
磁盘字节串utf-8-->无法转为内存字节串str(ascii)
- 错误2:str直接encode报错
>>> sys.getdefaultencoding() 'ascii' >>> a="中国" >>> a.encode("gbk") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128) >>> b=u"中国" >>> b.encode("gbk") '\xd6\xd0\xb9\xfa'
变量a的类型是str(utf-8字节串),会自动调用decode(默认编码ascii),转为unicode,再调encode
按照默认编码去decode变量utf-8编码字节串会报错
- 错误3:编码相互转换报错
>>> "中国".decode("utf-8").encode("ascii") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) >>> "中国".decode("utf-8").encode("gbk") '\xd6\xd0\xb9\xfa' >>> "a".decode("utf-8").encode("ascii") 'a'
ascii是utf-8的子集合,小集合转换大集合没有问题
大集合转为小集合就报错不支持
- unicode不能再decode
>>> a = "中国" >>> b = a.decode("utf-8") >>> b u'\u4e2d\u56fd' >>> b.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
- 总结
理解编码错误时,首先要判断str时以什么编码存放在内存中的,然后判断是否符合转换公式(str-(decode)->unicode-(encode)->str)。
判断方法,在python2中用len(str)或者bytes(str)可以判断,中文utf-8占3个字节,gbk占2个字节,ascii占一个字节。
印光大师十念法(胡小林主讲第1集)
此生必看的科学实验-水知道答案