为了账号安全,请及时绑定邮箱和手机立即绑定

将网页抓取数据转换为数据框

将网页抓取数据转换为数据框

紫衣仙女 2023-09-19 14:56:51
我抓取了数据并尝试转换为 json 格式。但是,它似乎不成功,我想用键和值转换字典,然后转换为数据帧。from bs4 import BeautifulSoupimport bs4import requestsimport jsonreq = Request(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6)})webpage = urlopen(req).read().decode("utf-8")webpage = json.loads(webpage)输出:{'data': [{'id': 'GILD',   'attributes': {'longDesc': "Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.",    'sectorname': 'Health Care',    'sectorgics': 35,    'primaryname': 'Biotechnology',    'primarygics': 35201010,    'numberOfEmployees': 11800.0,    'yearfounded': 1987,    'streetaddress': '333 Lakeside Drive',    'streetaddress2': None,    'streetaddress3': None,    'streetaddress4': None,    'city': 'Foster City',    'peRatioFwd': 9.02045209903122,    'lastClosePriceEarningsRatio': None,    'divRate': 2.72,    'divYield': 4.33,    'shortIntPctFloat': 1.433,    'impliedMarketCap': None,    'marketCap': 78796576654.0,    'divTimeFrame': 'forward'}}]}我想要的结果是:df = {'id':'GILD', 'longDesc', 'Gildead...}
查看完整描述

1 回答

?
当年话下

TA贡献1890条经验 获得超9个赞

根据您的数据,您可以尝试此操作,作为代码的一部分:


d = {

    'data': [

        {

            'id': 'GILD',

            'attributes': {

                'longDesc': "Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.",

                'sectorname': 'Health Care',

                'sectorgics': 35,

                'primaryname': 'Biotechnology',

                'primarygics': 35201010,

                'numberOfEmployees': 11800.0,

                'yearfounded': 1987,

                'streetaddress': '333 Lakeside Drive',

                'streetaddress2': None,

                'streetaddress3': None,

                'streetaddress4': None,

                'city': 'Foster City',

                'peRatioFwd': 9.02045209903122,

                'lastClosePriceEarningsRatio': None,

                'divRate': 2.72,

                'divYield': 4.33,

                'shortIntPctFloat': 1.433,

                'impliedMarketCap': None,

                'marketCap': 78796576654.0,

                'divTimeFrame': 'forward'}

        }

    ]

}


try:

    _id = d['data'][0]['id']

    ld = d['data'][0]['attributes']['longDesc']

    df = {"id": _id, 'longDesc': ld}

except (KeyError, ValueError) as error:

    print(f"Failed to load data: {error}")


print(df)

输出:


{'id': 'GILD', 'longDesc': 'Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.'}

注意: df通常被称为dataframe,大多是用pandas模块创建的。但是,您拥有的可能是JSON从您发出的请求返回的对象。话虽如此,您想要的输出实际上是 a dictionary,但我保留了您的命名约定。


编辑:


要将您的转换dict为df只需执行以下操作:


import pandas as pd

d = {'id': 'GILD', 'longDesc': 'Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.'}

df = pd.Dataframe(d.items())

print(df)

这输出:


          0                                                  1

0        id                                               GILD

1  longDesc  Gilead Sciences, Inc., a research-based biopha...


查看完整回答
反对 回复 2023-09-19
  • 1 回答
  • 0 关注
  • 49 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信