为了账号安全,请及时绑定邮箱和手机立即绑定

Beautifulsoup 无法提取所有 html

Beautifulsoup 无法提取所有 html

皈依舞 2023-11-13 10:27:17
我尝试创建一个程序来提取 Spotify 中 Daily Mix 1 中的所有歌曲。我知道我必须使用的逻辑,但我无法获得整个源代码。这是我写的代码:import requests from bs4 import BeautifulSoupheaders = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}result = requests.get("https://open.spotify.com/playlist/37i9dQZF1E38L6D2gtQHWw", headers=headers)src = result.contentsoup = BeautifulSoup(src, 'lxml')print(soup.prettify())这是我得到的输出:我使用的标题适用于亚马逊和维基百科等其他网站,所以我认为这不是问题。我也不认为问题与 javascript 有关,因为在其他用于抓取网站(例如亚马逊(也包含很多<script>标签))的程序中,代码显示得非常好。请告诉问题是什么。PS - 请不要在您的解决方案中推荐 selenium 或 scrapy。
查看完整描述

1 回答

?
隔江千里

TA贡献1906条经验 获得超10个赞

您尝试抓取的日期是由 Javascript 填充的,因此您不会在页面的源代码中找到它,但您可以通过网站正在使用的 api 获取它:


import  json , requests

from bs4 import BeautifulSoup as bs



base_url = 'https://open.spotify.com/playlist/37i9dQZF1E38L6D2gtQHWw'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}

# Getting the access token first to send it with the header to the api endpoint

page              = requests.get(base_url,headers=headers)

soup              = bs(page.text,'html.parser')

access_token_tag  = soup.find('script',{'id':'config'})

json_obj          = json.loads(access_token_tag.text)

access_token_text = json_obj['accessToken']


endpoint = "https://api.spotify.com/v1/playlists/37i9dQZF1E38L6D2gtQHWw"

headers.update({"authorization": f"Bearer {access_token_text}",

                'referer': base_url,

                'accept': 'application/json',

                'app-platform': 'WebPlayer'})

url_paramters = {'type': 'track,episode','market': 'EG'}

data = requests.get(endpoint, params=url_paramters, headers=headers).json()

tracks = data['tracks']['items']

for index , track in enumerate(tracks,1):

        print(f'{index } - ' , track['track']['name'] )

输出:


1 -  Tu Hi Haqeeqat

2 -  Hasi - Female Version

3 -  Kabhi Jo Baadal Barse

4 -  Tere Bin Nahi Laage (Male Version)

5 -  Dekhte Dekhte (Rahat Fateh Ali Khan Version) [From "Batti Gul Meter

Chalu"]

6 -  Panchhi Bole

7 -  Jame Raho

8 -  Banjaara (From "Ek Villain")

9 -  Mitwa

10 -  Agar Tu Hota (From "Baaghi")

11 -  Aasan Nahin Yahan

12 -  Jiyo Re Bahubali

13 -  Pyaar Manga Hai

14 -  Kaun Hain Voh

15 -  Mamta Se Bhari

16 -  Zehnaseeb

17 -  Dil Ibaadat

18 -  Tu Hi Tu (Reprise)

19 -  Haule Haule

20 -  Manohari

21 -  Ilahi (From "Yeh Jawaani Hai Deewani")

22 -  Humsafar (From "Badrinath Ki Dulhania")

23 -  Kiya Kiya

24 -  Sunn Raha Hai (Female)

25 -  Phir Le Aya Dil

26 -  Tere Naal Nachna (From "Nawabzaade")

27 -  Galliyan (From "Ek Villain")

28 -  Valentine's Mashup 2019(Remix By Kedrock,Sd Style)

29 -  Halka Halka

30 -  Raabta (From "Agent Vinod")

31 -  Mere Bina - Unplugged

32 -  Agar Tum Saath Ho-Maahi Ve

33 -  Swapn Sunehere

34 -  Radha

35 -  Behti Hawa Sa Tha Woh

36 -  Mere Rashke Qamar

37 -  Kehta Hai Pal Pal

38 -  Maana Ke Hum Yaar Nahin

39 -  Khoya Hain

40 -  O Re Piya

41 -  Jal Rahin Hain

42 -  Zero Hour Mashup 2015(Remix By Dj Kiran Kamath)

43 -  Aashiq Banaya Aapne

44 -  Bikhri Bikhri

45 -  Maula Mere Lele Meri Jaan

46 -  Yadaan Teriyaan (Version 2)

47 -  Tujh Mein Rab Dikhta Hai

48 -  Veeron Ke Veer Aa

49 -  Bolo Har Har Har (feat. Mohit Chauhan, Sukhwinder Singh, Badshah, Megha Sriram Dalton, Anugrah, Sandeep Shrivastava)

50 -  Main Agar


查看完整回答
反对 回复 2023-11-13
  • 1 回答
  • 0 关注
  • 47 浏览

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信