为了账号安全,请及时绑定邮箱和手机立即绑定

如何从此源代码中提取信息。我想从此链接中提取 Name , address,course

如何从此源代码中提取信息。我想从此链接中提取 Name , address,course

qq_笑_17 2022-12-27 15:44:51
我想从此代码中提取姓名、地址、课程、机构类型。我猜是因为桌子的缘故,我做不到。每次我尝试它都会给我一个空白列表。我不知道该怎么办<div class="row">  <div class="col-md-12">    <div class="panel panel-default">      <div class="panel-body ">        <div class="row">          <div id="ContentPlaceHolder1_pnldefault">            <table id="ContentPlaceHolder1_dlstCollege" class="table table-bordered table responsive" cellspacing="0" style="border-collapse:collapse;">              <tr>                <td>                  <input type="hidden" name="ctl00$ContentPlaceHolder1$dlstCollege$ctl00$hdnInstituteId" id="ContentPlaceHolder1_dlstCollege_hdnInstituteId_0" value="968  " />                  <a id="ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0" href="CollegeDetailedInformation.aspx?Inst=968  ">**A R INSTITUTE OF PHARMACY , BIJNOR (968)**</a>                  <br />                  <b>Location:</b>                  <span id="ContentPlaceHolder1_dlstCollege_lblAddress_0">**TAJPUR** </span>                  <br />                  <b>Course:</b>                  <span id="ContentPlaceHolder1_dlstCollege_lblCourse_0">**B.Pharm**,</span>                  <br />                  <b>Category:</b>                  <span id="ContentPlaceHolder1_dlstCollege_lblInstituteType_0">**Private**</span>                  <br />                  <b>Web Address:</b>                  <a id="lnkBtnWebURL" href='' target="_blank"></a>                  <br />                </td>              </tr>res = requests.get('http://kyc.aktu.ac.in/')soup = BeautifulSoup(res.content, 'html.parser')weblinks = soup.find_all('a', attrs = {'id':'ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0'})pagelinks = []for link in weblinks:      link = link.find('a')  pagelinks.append(link.get('href'))
查看完整描述

1 回答

?
慕妹3242003

TA贡献1824条经验 获得超6个赞

试试这个:


from bs4 import BeautifulSoup as bs


html = '<div class="row"><div class="col-md-12"><div class="panel panel-default"><div class="panel-body "><div class="row"><div id="ContentPlaceHolder1_pnldefault"><table id="ContentPlaceHolder1_dlstCollege" class="table table-bordered table responsive" cellspacing="0" style="border-collapse:collapse;"><tr><td><input type="hidden" name="ctl00$ContentPlaceHolder1$dlstCollege$ctl00$hdnInstituteId" id="ContentPlaceHolder1_dlstCollege_hdnInstituteId_0" value="968  " /><a id="ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0" href="CollegeDetailedInformation.aspx?Inst=968  ">**A R INSTITUTE OF PHARMACY , BIJNOR (968)**</a><br /><b>Location:</b><span id="ContentPlaceHolder1_dlstCollege_lblAddress_0">**TAJPUR** </span><br /><b>Course:</b><span id="ContentPlaceHolder1_dlstCollege_lblCourse_0">**B.Pharm**,</span><br /><b>Category:</b><span id="ContentPlaceHolder1_dlstCollege_lblInstituteType_0">**Private**</span><br /><b>Web Address:</b><a id="lnkBtnWebURL" href='' target="_blank"></a><br /></td></tr>'


soup = bs(html , 'lxml')


name = soup.find('a', id='ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0').text.strip()

address = soup.find('span', id= 'ContentPlaceHolder1_dlstCollege_lblAddress_0').text.strip()

course = soup.find('span', id = 'ContentPlaceHolder1_dlstCollege_lblCourse_0').text.strip()

institute_type = soup.find('span', id = 'ContentPlaceHolder1_dlstCollege_lblInstituteType_0').text.strip()


print(name)

print(address)

print(course)

print(institute_type)

输出:


**A R INSTITUTE OF PHARMACY , BIJNOR (968)**

**TAJPUR**

**B.Pharm**,

**Private**


查看完整回答
反对 回复 2022-12-27
  • 1 回答
  • 0 关注
  • 95 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号