2 回答
TA贡献1876条经验 获得超6个赞
它可能比你做的更简单。让我们尝试简化它:
import requests
from bs4 import BeautifulSoup as bs
boston_url = 'https://www.mass.gov/service-details/request-for-proposal-rfp-notices'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = requests.get(boston_url,headers=hdr)
soup = bs(req.text,'lxml')
soup.select('main main div.ma__rich-text>p')[0].text
输出:
'PERAC has not reviewed the RFP notices or other related materials posted on this page for compliance with M.G.L. Chapter 32, section 23B. The publication of these notices should not be interpreted as an indication that PERAC has made a determination as to that compliance.'
TA贡献1813条经验 获得超2个赞
您可以使用bs.find('p', text=re.compile('PERAC'))来提取该段落:
from bs4 import BeautifulSoup
import requests
import re
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/83.0.4103.61 Safari/537.36'
}
boston_url = (
'https://www.mass.gov/service-details/request-for-proposal-rfp-notices'
)
resp = requests.get(boston_url, headers=headers)
bs = BeautifulSoup(resp.text)
bs.find('p', text=re.compile('PERAC'))
添加回答
举报
