为了账号安全,请及时绑定邮箱和手机立即绑定

如何从网页获取链接 - BeautifulSoup/Python

如何从网页获取链接 - BeautifulSoup/Python

慕容708150 2023-12-08 17:15:25
我想获得该网站所有链接(href)的列表 -https://clinicaltrials.gov/ct2/results?cond=brain+tumor&term=&cntry=&state=&city=&dist=这是我正在使用的代码,它生成一个空白列表import requestsfrom bs4 import BeautifulSoupimport pandas as pdproductlinks=[]url='https://clinicaltrials.gov/ct2/results?cond=brain+tumor&term=&cntry=&state=&city=&dist='r=requests.get(url)soup=BeautifulSoup(r.content,'html.parser')section=soup.find_all('tr', class_='odd parent')for link in section:    productlinks.append(link.a['href'])print(productlinks)
查看完整描述

2 回答

?
慕盖茨4494581

TA贡献1850条经验 获得超11个赞

尝试使用selenium而不是requests从页面上抓取所有链接。这是执行此操作的完整代码:


from selenium import webdriver

from bs4 import BeautifulSoup

import time


productlinks=[]

url='https://clinicaltrials.gov/ct2/results?cond=brain+tumor&term=&cntry=&state=&city=&dist='


driver = webdriver.Chrome()

driver.get(url)


time.sleep(4)


html = driver.page_source


driver.close()


soup=BeautifulSoup(html,'html5lib')


a_tags = soup.find_all('a')


for a in a_tags:

    if a.get('href'):

        productlinks.append(a.get('href'))


print(productlinks)

输出:


['https://clinicaltrials.gov/ct2/manage-recs/resources#DataElement', 'https://prsinfo.clinicaltrials.gov/results_definitions.html#DelayResultsType', 'https://www.fda.gov/news-events/public-health-focus/expanded-access', 'https://prsinfo.clinicaltrials.gov/results_definitions.html#DelayResultsType', 'https://clinicaltrials.gov/ct2/about-site/history', 'https://clinicaltrials.gov/ct2/about-studies/learn#Participating', 'https://clinicaltrials.gov/ct2/about-studies/learn#Participating', '#main-content', 'https://www.coronavirus.gov', 'https://www.nih.gov/coronavirus', '/ct2/home', '/ct2/search/index', '/ct2/home', '/ct2/search/advanced', '/ct2/search/browse?brwse=cond_cat', '/ct2/search/map', '/ct2/help/how-find/index', '/ct2/help/how-use-search-results', '/ct2/help/how-find/find-study-results', '/ct2/help/how-read-study', '/ct2/about-studies', '/ct2/about-studies/learn', '/ct2/about-studies/other-sites', '/ct2/about-studies/glossary', '/ct2/manage-recs', '/ct2/manage-recs/submit-study', '/ct2/manage-recs/background', '/ct2/manage-recs/fdaaa', '/ct2/manage-recs/how-apply', '/ct2/manage-recs/how-register', '/ct2/manage-recs/how-edit', '/ct2/manage-recs/how-report', '/ct2/manage-recs/faq', '/ct2/manage-recs/resources', '/ct2/manage-recs/present', '/ct2/resources', '/ct2/resources/pubs', '/ct2/resources/alert', '/ct2/resources/rss', '/ct2/resources/trends', '/ct2/resources/download', '/ct2/about-site', '/ct2/about-site/new', '/ct2/about-site/background', '/ct2/about-site/results', '/ct2/about-site/history', '/ct2/about-site/modernization', '/ct2/about-site/for-media', '/ct2/about-site/link-to', '/ct2/about-site/terms-conditions', '/ct2/about-site/disclaimer', '/ct2/manage-recs/register', '/ct2/search/index', '/ct2/home', '/ct2/search/advanced', '/ct2/search/browse?brwse=cond_cat', '/ct2/search/map', '/ct2/help/how-find/index', '/ct2/help/how-use-search-results', '/ct2/help/how-find/find-study-results', '/ct2/help/how-read-study', '/ct2/about-studies', '/ct2/about-studies/learn', '/ct2/about-studies/other-sites', '/ct2/about-studies/glossary', '/ct2/manage-recs', '/ct2/manage-recs/submit-study', '/ct2/manage-recs/background', '/ct2/manage-recs/fdaaa', '/ct2/manage-recs/how-apply', '/ct2/manage-recs/how-register', '/ct2/manage-recs/how-edit', '/ct2/manage-recs/how-report', '/ct2/manage-recs/faq', '/ct2/manage-recs/resources', '/ct2/manage-recs/present', '/ct2/resources', '/ct2/resources/pubs', '/ct2/resources/alert', '/ct2/resources/rss', '/ct2/resources/trends', '/ct2/resources/download', '/ct2/about-site', '/ct2/about-site/new', '/ct2/about-site/background', '/ct2/about-site/results', '/ct2/about-site/history', '/ct2/about-site/modernization', '/ct2/about-site/for-media', '/ct2/about-site/link-to', '/ct2/about-site/terms-conditions', '/ct2/about-site/disclaimer', '/ct2/manage-recs/register', '/ct2/home', '#', '/ct2/home', '/ct2/results/refine?cond=brain+tumor', '/ct2/results/details?cond=brain+tumor', '/ct2/results/browse?cond=brain+tumor&brwse=cond_alpha_all', '/ct2/results/map?cond=brain+tumor&map=', '/ct2/results/details?cond=brain+tumor', '/ct2/resources/rss', '/ct2/resources/download', '/ct2/resources/download#DownloadAllData', '/ct2/show/NCT03286335?cond=brain+tumor&draw=2&rank=1', '/ct2/show/NCT02740933?cond=brain+tumor&draw=2&rank=2', '/ct2/show/NCT03328858?cond=brain+tumor&draw=2&rank=3', '/ct2/show/NCT02367469?cond=brain+tumor&draw=2&rank=4', '/ct2/show/NCT01627535?cond=brain+tumor&draw=2&rank=5', '/ct2/show/NCT03980431?cond=brain+tumor&draw=2&rank=6', '/ct2/show/NCT02956291?cond=brain+tumor&draw=2&rank=7', '/ct2/show/NCT02824731?cond=brain+tumor&draw=2&rank=8', '/ct2/show/results/NCT02034708?cond=brain+tumor&draw=2&rank=9', '/ct2/show/NCT02034708?cond=brain+tumor&draw=2&rank=9', '/ct2/show/NCT00557375?cond=brain+tumor&draw=2&rank=10', '#wrapper', '/ct2/help/for-patient', '/ct2/help/for-researcher', '/ct2/help/for-manager', '/ct2/home', '/ct2/resources/rss', '/ct2/sitemap', '/ct2/about-site/terms-conditions', '/ct2/about-site/disclaimer', 'https://support.nlm.nih.gov/knowledgebase/category/?id=CAT-01242&category=clinicaltrials.gov&hd_url=https%3A%2F%2Fclinicaltrials.gov%2Fct2%2Fresults%3Fcond%3Dbrain%2Btumor', 'https://www.nlm.nih.gov/copyright.html', 'https://www.nlm.nih.gov/privacy.html', '/ct2/accessibility', 'https://www.nlm.nih.gov/plugins.html', 'https://www.nih.gov/icd/od/foia/index.htm', 'https://www.usa.gov/', 'https://www.nlm.nih.gov/', 'https://www.nih.gov/', 'https://www.hhs.gov/']



查看完整回答
反对 回复 2023-12-08
?
拉风的咖菲猫

TA贡献1995条经验 获得超2个赞

import re

import json

import requests

from bs4 import BeautifulSoup



url = 'https://clinicaltrials.gov/ct2/results?cond=brain+tumor&term=&cntry=&state=&city=&dist='

ajax_url = 'https://clinicaltrials.gov/' + re.search(r'"url": "(.*?)"', requests.get(url).text).group(1)


payload = {

    'start': 0,

    'length':  10

}


for payload['start'] in range(0, 100, 10):              # <-- increase to number of pages

    data = requests.post(ajax_url, data=payload).json()


    # uncomment this to see all data:

    # print(json.dumps(data, indent=4))


    for d in data['data']:

        s = BeautifulSoup(d[3], 'html.parser')

        print( s.a['title'] )

        print( s.a['href'] )

    print('-' * 80)

印刷:


...


--------------------------------------------------------------------------------

Show study NCT00003475: Antineoplaston Therapy in Treating Patients With Primary Malignant Brain Tumors

/ct2/show/NCT00003475?cond=brain+tumor&rank=81

Show study NCT00949026: Assessment of Systemically Administered Torisel Delivery to Brain Tumors by Intratumoral Microdialysis

/ct2/show/NCT00949026?cond=brain+tumor&rank=82

Show study NCT04118426: Cognitive Function After Radiation Therapy for Brain Tumours

/ct2/show/NCT04118426?cond=brain+tumor&rank=83

Show study NCT03033706: Intraoperative Goal Directed Fluid Management in Supratentorial Brain Tumor Craniotomy

/ct2/show/NCT03033706?cond=brain+tumor&rank=84

Show study NCT00996450: Educational Follow-up in a Cohort of Children at the Royal Marsden Hospital (RMH)

/ct2/show/NCT00996450?cond=brain+tumor&rank=85

Show study NCT03373487: Cognitive Rehabilitation in Brain Tumor Patients After Neurosurgery

/ct2/show/NCT03373487?cond=brain+tumor&rank=86

Show study NCT03619694: Role of MR Spectroscopy in Brain Tumors

/ct2/show/NCT03619694?cond=brain+tumor&rank=87

Show study NCT00850278: Assessment of [18F]FLT-PET Imaging for Diagnosis and Prognosis of Brain Tumors

/ct2/show/NCT00850278?cond=brain+tumor&rank=88

Show study NCT02575521: Effect of Propofol-Dexmedetomidine on Cerebral Oxygenation and Metabolism During Brain Tumor Resection

/ct2/show/NCT02575521?cond=brain+tumor&rank=89

Show study NCT03248544: Therapeutic Benefit of Preoperative Supplemental Vitamin D in Patients Undergoing Brain Tumor Surgery

/ct2/show/NCT03248544?cond=brain+tumor&rank=90

--------------------------------------------------------------------------------

Show study NCT03248544: Therapeutic Benefit of Preoperative Supplemental Vitamin D in Patients Undergoing Brain Tumor Surgery

/ct2/show/NCT03248544?cond=brain+tumor&rank=91

Show study NCT00418899: Gliogene: Brain Tumor Linkage Study

/ct2/show/NCT00418899?cond=brain+tumor&rank=92

Show study NCT03216148: 18F-FET PET in Childhood Brain Tumours

/ct2/show/NCT03216148?cond=brain+tumor&rank=93

Show study NCT00961922: Pediatric Research on Improving Speed, Memory and Attention

/ct2/show/NCT00961922?cond=brain+tumor&rank=94

Show study NCT02006563: Metabolic Tumor Volumes in Radiation Treatment of Primary Brain Tumors

/ct2/show/NCT02006563?cond=brain+tumor&rank=95

Show study NCT03234309: Ferumoxytol in Magnetic Resonance Imaging of Pediatric Patients With Brain Tumors

/ct2/show/NCT03234309?cond=brain+tumor&rank=96

Show study NCT01737671: Methotrexate Infusion Into the Fourth Ventricle in Children With Malignant Fourth Ventricular Brain Tumors: A Pilot Study

/ct2/show/NCT01737671?cond=brain+tumor&rank=97

Show study NCT03649880: Feasibility of FMISO in Brain Tumors

/ct2/show/NCT03649880?cond=brain+tumor&rank=98

Show study NCT03465618: A First in Human Study Using 89Zr-cRGDY Ultrasmall Silica Particle Tracers for Malignant Brain Tumors

/ct2/show/NCT03465618?cond=brain+tumor&rank=99

Show study NCT02389530: Use of Fluorescein Dye for the Removal of Brain Tumors

/ct2/show/NCT02389530?cond=brain+tumor&rank=100

--------------------------------------------------------------------------------



...



查看完整回答
反对 回复 2023-12-08
  • 2 回答
  • 0 关注
  • 58 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信