我正在尝试从网页中获取文本,它使'Traceback(最近一次调用最后):文件“C:\用户\用户名\桌面\Python\parsing.py”,第21行,在textFile.write(str(结果))UnicodeEncodeError:'cp949'编解码器无法在位置37971编码字符'\xa9':非法多字节序列'我已经搜索并尝试了 textFile.write(str(results).decode('utf-8')) 并且它没有属性错误。import requestsimport osfrom bs4 import BeautifulSoupoutputFolderName = "output"currentPath = os.path.dirname(os.path.realpath(__file__))outputDir = currentPath + "/" +outputFolderNamer = requests.get('https://yahoo.com/')soup = BeautifulSoup(r.text, 'html.parser')results = soup.findAll(text=True)try : os.mkdir(outputDir) print("output directory generated")except : print("using existing directory")textFile = open(outputDir + '/output.txt', 'w')textFile.write(str(results))textFile.close()有没有办法转换 str(results) 的编解码器并正确保存?python版本是3.7.3
1 回答
紫衣仙女
TA贡献1839条经验 获得超15个赞
请指定此示例中的编码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import os
from bs4 import BeautifulSoup
outputFolderName = "output"
currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName
r = requests.get('https://yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)
try :
os.mkdir(outputDir)
print("output directory generated")
except :
print("using existing directory")
textFile = open(outputDir + '/output.txt', mode='w', encoding='utf8')
textFile.write(str(results))
textFile.close()
添加回答
举报
0/150
提交
取消
