请求库在Python 2和Python 3上崩溃

我正在尝试使用以下代码解析带有requests和BeautifulSoup库的任意网页：try: response = requests.get(url)except Exception as error: return Falseif response.encoding == None: soup = bs4.BeautifulSoup(response.text) # This is line 809else: soup = bs4.BeautifulSoup(response.text, from_encoding=response.encoding)在大多数网页上，这都可以正常工作。但是，在某些任意页面（<1％）上，出现此崩溃：Traceback (most recent call last): File "/home/dotancohen/code/parser.py", line 155, in has_css soup = bs4.BeautifulSoup(response.text) File "/usr/lib/python3/dist-packages/requests/models.py", line 809, in text content = str(self.content, encoding, errors='replace') TypeError: str() argument 2 must be str, not None作为参考，这是请求库的relevent方法：@propertydef text(self): """Content of the response, in unicode. if Response.encoding is None and chardet module is available, encoding will be guessed. """ # Try charset from content-type content = None encoding = self.encoding # Fallback to auto-detected encoding. if self.encoding is None: if chardet is not None: encoding = chardet.detect(self.content)['encoding'] # Decode unicode from given encoding. try: content = str(self.content, encoding, errors='replace') # This is line 809 except LookupError: # A LookupError is raised if the encoding was not found which could # indicate a misspelling or similar mistake. # # So we try blindly encoding. content = str(self.content, errors='replace') return content可以看出，抛出此错误时，我没有传递编码。我如何错误地使用该库，以及如何防止该错误？这是在Python 3.2.3上实现的，但我也可以在Python 2上获得相同的结果。

查看完整描述

1 回答

天涯尽头无女友

TA贡献1831条经验获得超9个赞

这意味着服务器未发送标头中内容的编码，并且chardet库也无法确定内容的编码。实际上，您实际上是在测试是否缺少编码；如果没有可用的编码，为什么要尝试获取解码的文本？

您可以尝试将解码留给BeautifulSoup解析器：

if response.encoding is None:

soup = bs4.BeautifulSoup(response.content)

并有没有必要在编码BeautifulSoup通过，因为如果.text没有失败，你正在使用Unicode和BeautifulSoup反正会忽略编码参数：

else:

soup = bs4.BeautifulSoup(response.text)

反对回复 2021-04-01

热搜

最近搜索清空

请求库在Python 2和Python 3上崩溃

请求库在Python 2和Python 3上崩溃

1 回答

添加回答