为了账号安全,请及时绑定邮箱和手机立即绑定

UnicodeEncodeError:'ascii'编解码器无法编码字符'\ xe9'--

UnicodeEncodeError:'ascii'编解码器无法编码字符'\ xe9'--

白衣染霜花 2020-02-02 14:53:00
我正在编写一个脚本,该脚本转到链接列表并解析信息。它适用于大多数站点,但在某些情况下令人窒息:“ UnicodeEncodeError:'ascii'编解码器无法在位置13编码字符'\ xe9':序数不在范围内(128)”它在python3上urlib的client.py上停止确切的链接是:http : //finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html这里有很多类似的帖子,但是似乎没有答案对我有用。我的代码是:from urllib import requestdef __request(link,debug=0):      try:    html = request.urlopen(link, timeout=35).read() #made this long as I was getting lots of timeouts    unicode_html = html.decode('utf-8','ignore')# NOTE the except HTTPError must come first, otherwise except URLError will also catch an HTTPError.except HTTPError as e:    if debug:        print('The server couldn\'t fulfill the request for ' + link)        print('Error code: ', e.code)    return ''except URLError as e:    if isinstance(e.reason, socket.timeout):        print('timeout')        return ''    else:    return unicode_html这调用了请求功能链接=' http: //finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html'页面= __request(链接)追溯是:Traceback (most recent call last):  File "<string>", line 250, in run_nodebug  File "C:\reader\get_news.py", line 276, in <module>    main()  File "C:\reader\get_news.py", line 255, in main    body = get_article_body(item['link'],debug=0)  File "C:\reader\get_news.py", line 155, in get_article_body    page = __request('na',url)  File "C:\reader\get_news.py", line 50, in __request    html = request.urlopen(link, timeout=35).read()  File "C:\Python33\Lib\urllib\request.py", line 156, in urlopen    return opener.open(url, data, timeout)  File "C:\Python33\Lib\urllib\request.py", line 469, in open    response = self._open(req, data)  File "C:\Python33\Lib\urllib\request.py", line 487, in _open任何帮助表示赞赏它使我发疯,我想我已经尝试过x.decode和类似内容的所有组合
查看完整描述

3 回答

?
MM们

TA贡献1886条经验 获得超2个赞

我不确定在URL的其他部分是否会出现问题,所以我将其拆分然后重新构建url_tuple = parse.urlsplit(link)parse.quote_plus(url_tuple [2])+ url_tuple [3] + parse.quote_plus(url_tuple [4]))encode_link =“%s://%s%s?%s%s”%(url_tuple [0],url_tuple [1],parse.quote(url_tuple [2]) ,url_tuple [3],parse.quote(url_tuple [4])) 

查看完整回答
反对 回复 2020-02-02
  • 3 回答
  • 0 关注
  • 546 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信