字节对象以“repr 格式”存储为 b'foo'...

字节对象以“repr 格式”存储为 b'foo' 而不是 encode()-ing

一些倒霉的同事将一些数据保存到这样的文件中：s = b'The em dash: \xe2\x80\x94'with open('foo.txt', 'w') as f: f.write(str(s))当他们应该使用s = b'The em dash: \xe2\x80\x94'with open('foo.txt', 'w') as f: f.write(s.decode())现在foo.txt看起来像b'The em-dash: \xe2\x80\x94'代替The em dash: —我已经将此文件作为字符串读取：with open('foo.txt') as f: bad_foo = f.read()现在如何将bad_foo错误保存的格式转换为正确保存的字符串？

查看完整描述

3 回答

忽然笑

TA贡献1806条经验获得超5个赞

您可以尝试文字 eval

from ast import literal_eval

test = r"b'The em-dash: \xe2\x80\x94'"

print(test)

res = literal_eval(test)

print(res.decode())

反对回复 2021-09-14

BIG阳

TA贡献1859条经验获得超6个赞

如果您相信输入不是恶意的，则可以ast.literal_eval在损坏的字符串上使用。

import ast

# Create a sad broken string

s = "b'The em-dash: \xe2\x80\x94'"

# Parse and evaluate the string as raw Python source, creating a `bytes` object

s_bytes = ast.literal_eval(s)

# Now decode the `bytes` as normal

s_fixed = s_bytes.decode()

否则，您将不得不手动解析并删除或替换有问题的重复转义。

反对回复 2021-09-14

热搜

最近搜索清空

字节对象以“repr 格式”存储为 b'foo' 而不是 encode()-ing

字节对象以“repr 格式”存储为 b'foo' 而不是 encode()-ing

3 回答

添加回答