遍历索引目录的服务器 URL 并读取文件

http服务器上有一个目录，其url是http://somehost/maindir/recent/。这个“最近”目录包含 50 个 zip 子目录。我可以读取一个 zip 文件 zfile = "http://somehost/maindir/recent/1.zip" with RemoteZip(zfile) as zip: for zip_info in zip.infolist(): data = zip.read(zip_info.filename)但是我不知道要遍历“http://somehost/maindir/recent/”并从每个 zip 中读取数据。我尝试了 glob、os.join、os.walk 但在静脉中。我想要这样的东西：for zfile in baseurl: //unable to do this line. with RemoteZip(zfile) as zip: for zip_info in zip.infolist(): data = zip.read(zip_info.filename)

查看完整描述

1 回答

月关宝盒

TA贡献1772条经验获得超5个赞

您无法直接获取目录列表，因为它是负责返回响应的 HTTP 服务器，在某些情况下，您将获得一个 HTML 页面，显示指向“目录”内所有文件的链接，如您的情况“http：/ /somehost/maindir/recent/" 将以 html 格式列出最近目录中的所有 zip 文件。

一种解决方案可能是使用 Beautifulsoup 解析该 html 页面并从该“最近”目录页面获取所有指向 zip 文件的链接。

from bs4 import BeautifulSoup

import requests

url = 'http://somehost/maindir/recent/'

def get_files(url):

page = requests.get(url).text

soup = BeautifulSoup(page, 'html.parser')

return [url + '/' + node.get('href') for node in soup.find_all('a') if

node.get('href').endswith('.zip')]

file_links = get_files(url)

for zfile in file_links:

with RemoteZip(zfile) as zip:

for zip_info in zip.infolist():

data = zip.read(zip_info.filename)

反对回复 2023-06-06

热搜

最近搜索清空

遍历索引目录的服务器 URL 并读取文件

遍历索引目录的服务器 URL 并读取文件

1 回答

添加回答