Scrapy - 要求太多？

我尝试从以下网址获取城市的经纬度坐标：https : //www.latlong.net/。我的代码是：# -*- coding: utf-8 -*-import reimport jsonimport scrapyclass geo_spider(scrapy.Spider): name = "geo" allowed_domains = ["www.latlong.net"] start_urls = ['https://www.latlong.net/'] custom_settings = { 'COOKIES_ENABLED': True, 'DOWNLOAD_DELAY' : 1, } LAT_LONG_REGEX = 'sm\((?P<lat>.+),(?P<long>.+),' def start_requests(self): FILE_PATH = 'C:/Users/coppe/tutorial/cities.json' with open(FILE_PATH) as json_file: cities_data = json.load(json_file) for d in cities_data: yield scrapy.Request( url='https://www.latlong.net/', callback=self.gen_csrftoken, meta={'city': d['city']}, dont_filter=True, ) def gen_csrftoken(self, response): city = response.meta['city'] yield scrapy.FormRequest.from_response( response, formid='frmPlace', formdata={'place': city}, callback=self.get_geo, meta={'city': city} ) def get_geo(self, response): lat_long_search = re.search(self.LAT_LONG_REGEX, response.body.decode('utf-8')) if lat_long_search: yield { 'coord': (lat_long_search.group('lat'), lat_long_search.group('long')), 'city': response.meta['city'] } else: from scrapy.shell import inspect_response inspect_response(response, self)我应该得到类似 (50,5) 的内容作为 JSON 文件中包含的 589 个城市的坐标。除了每个城市我都得到 (0,0) 之外，一切正常。我认为这是 javascript 的问题，但事实并非如此。事实上，当我将 JSON 文件减少到例如 6 个城市时，我会得到每个城市的正确坐标。我尝试使用DOWNLOAD_DELAY具有不同值（1,2 和 3）的设置，但仍然不起作用。我的 JSON 文件太重了吗？有人对这个问题有线索吗？

查看完整描述

1 回答

红糖糍粑

TA贡献1815条经验获得超6个赞

看起来该网站正在使用像 Google Maps geocoding API 这样的 API，记录在 https://developers.google.com/maps/documentation/geocoding/intro
那个文档（不是说一次做几个请求，所以不是实际使用的 API？）表示 API 链接的最大长度为 8192 个字符，包括链接本身和您要查找的所有位置。
所以是的，除了可能受到速率限制之外，您的城市名称中必须有最大字符数！

地理编码 API 请求采用以下形式：https : //maps.googleapis.com/maps/api/geocode/outputFormat ? parameters ...
注意：URL 必须正确编码才能有效，并且所有网络的 URL 限制为 8192 个字符服务。构建 URL 时请注意此限制。请注意，不同的浏览器、代理和服务器也可能具有不同的 URL 字符限制。

反对回复 2021-12-26

热搜

最近搜索清空

Scrapy - 要求太多？

Scrapy - 要求太多？

1 回答

添加回答