为了账号安全,请及时绑定邮箱和手机立即绑定

使用VBA进行Web抓取(当HTML <> DOM时)

/ 猿问

使用VBA进行Web抓取(当HTML <> DOM时)

我在抓取该特定网页的数据时经历了可怕的时间...基本上,当我在浏览器中加载URL并手动点击F12时,但是在以编程方式尝试执行以下操作时,可以在“ DOM资源管理器”中看到所需的信息相同(请参阅下文)HTMLDoc不包含我在“ DOM资源管理器”中可以看到的相同信息...


Public Sub testCode()


    Dim IE As SHDocVw.InternetExplorer

    Dim HTMLDoc As MSHTML.HTMLDocument

    Set IE = New SHDocVw.InternetExplorer

    With IE

        .navigate "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW"

        While .Busy = True Or .ReadyState <> READYSTATE_COMPLETE: Wend

        Set HTMLDoc = .Document

    End With


End Sub

有人可以帮我访问“ DOM资源管理器”中的信息吗?我知道HTML并不总是您在浏览器中看到的内容,而是创建在浏览器中可以看到的内容的说明,但是必须有一种从HTML以编程方式创建DOM的方法...


另外,我相信我要获取的数据是由脚本或iFrame生成的,但是我一直无法生成我正在寻找的数据。


更新


请参见下面的DOM Explorer图片:

http://img.mukewang.com/5daf13180001c11719201279.jpg

查看完整描述

2 回答

?
MMTTMM

大纲:


使GET XHR到https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW。


从HTML响应行中提取位置,var query = 'zmw:' + '00000.271.03969';并从这些行中提取键var citypage_options = {k: 'c991975b7f4186c0', ...。


使用https://api-ak-aws.wunderground.com/api/c991975b7f4186c0/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/的位置00000.271.03969和密钥c991975b7f4186c0进行GET XHR bestfct:1 / q / ZMW:00000.271.03969.json。


使用解析JSON响应(例如,使用VBA JSON解析器),使用Parse()转换所需的数据ToArray(),并作为表格输出到工作表。


实际上,每次打开该网页时,Web浏览器都会执行几乎相同的操作。


您可以使用下面的VBA代码来解析响应并输出结果。将JSON.bas模块导入VBA项目以进行JSON处理。


Sub TestScrapeWunderground()


    Dim sContent As String

    Dim sKey As String

    Dim sLocation As String

    Dim vJSON As Variant

    Dim sState As String

    Dim oDays As Object

    Dim oHours As Object

    Dim vDay As Variant

    Dim vHour As Variant

    Dim aRows() As Variant

    Dim aHeader() As Variant


    ' GET XHR to retrieve location and key

    With CreateObject("MSXML2.ServerXMLHTTP")

        .Open "GET", "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW", False

        .Send

        sContent = .responseText

    End With

    ' Extract location and key from HTML content

    sLocation = Split(Split(sContent, "var query = 'zmw:' + '", 2)(1), "'", 2)(0)

    sKey = Split(Split(sContent, vbTab & "k: '", 2)(1), "'", 2)(0)

    ' GET XHR to retrieve JSON data

    With CreateObject("MSXML2.ServerXMLHTTP")

        .Open "GET", "https://api-ak-aws.wunderground.com/api/" & sKey & "/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/bestfct:1/q/zmw:" & sLocation & ".json", False

        .Send

        sContent = .responseText

    End With

    ' Parse JSON response to data structure

    JSON.Parse sContent, vJSON, sState

    ' Populate dictionaries with daily and hourly forecast data

    Set oDays = CreateObject("Scripting.Dictionary")

    Set oHours = CreateObject("Scripting.Dictionary")

    For Each vDay In vJSON("forecast")("days")

        oDays(vDay("summary")) = ""

        For Each vHour In vDay("hours")

            oHours(vHour) = ""

        Next

    Next

    ' Convert daily forecast data to arrays

    JSON.ToArray oDays.Keys(), aRows, aHeader

    ' Output daily forecast data to table

    With Sheets(1)

        .Cells.Delete

        OutputArray .Cells(1, 1), aHeader

        Output2DArray .Cells(2, 1), aRows

        .Columns.AutoFit

    End With

    ' Convert hourly forecast data to arrays

    JSON.ToArray oHours.Keys(), aRows, aHeader

    ' Output hourly forecast data to table

    With Sheets(2)

        .Cells.Delete

        OutputArray .Cells(1, 1), aHeader

        Output2DArray .Cells(2, 1), aRows

        .Columns.AutoFit

    End With

    ' Convert response data to arrays

    JSON.ToArray Array(vJSON("response")), aRows, aHeader

    ' Output response transposed data to table

    With Sheets(3)

        .Cells.Delete

        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

        .Columns.AutoFit

    End With

    ' Convert current data to arrays

    JSON.ToArray Array(vJSON("current_observation")), aRows, aHeader

    ' Output current transposed data to table

    With Sheets(4)

        .Cells.Delete

        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

        .Columns.AutoFit

    End With

    ' Populate dictionary with daily astronomy data

    Set oDays = CreateObject("Scripting.Dictionary")

    For Each vDay In vJSON("astronomy")("days")

        oDays(vDay) = ""

    Next

    ' Convert daily astronomy data to arrays

    JSON.ToArray oDays.Keys(), aRows, aHeader

    ' Output daily astronomy transposed data to table

    With Sheets(5)

        .Cells.Delete

        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

        .Columns.AutoFit

    End With

    ' Convert hourly history data to arrays

    JSON.ToArray vJSON("history")("days")(0)("hours"), aRows, aHeader

    ' Output hourly history data to table

    With Sheets(6)

        .Cells.Delete

        OutputArray .Cells(1, 1), aHeader

        Output2DArray .Cells(2, 1), aRows

        .Columns.AutoFit

    End With

    MsgBox "Completed"


End Sub


Sub OutputArray(oDstRng As Range, aCells As Variant)


    With oDstRng

        .Parent.Select

        With .Resize( _

                1, _

                UBound(aCells) - LBound(aCells) + 1)

            .NumberFormat = "@"

            .Value = aCells

        End With

    End With


End Sub


Sub Output2DArray(oDstRng As Range, aCells As Variant)


    With oDstRng

        .Parent.Select

        With .Resize( _

                UBound(aCells, 1) - LBound(aCells, 1) + 1, _

                UBound(aCells, 2) - LBound(aCells, 2) + 1)

            .NumberFormat = "@"

            .Value = aCells

        End With

    End With


End Sub

第二个XHR返回JSON数据,以清楚说明如何从中提取必要的数据,您可以将JSON保存到文件中,复制内容并将其粘贴到任何JSON查看器中以进行进一步研究。我使用在线工具http://jsonviewer.stack.hu,根元素结构如下所示:

//img1.mukewang.com/5daf132a00012c5202550495.jpg

有6个主要部分,提取了数据的相关部分并将其输出到6个工作表(必须在运行前手动创建):


Sheet1 - Daily forecast

Sheet2 - Horly forecast

Sheet3 - Response data (transposed)

Sheet4 - Current data (transposed)

Sheet5 - Astronomy (transposed)

Sheet6 - Hourly history data

有了该示例,您可以从该JSON响应中提取所需的数据。


查看完整回答
反对 回复 2019-10-22
?
慕丝7291255

最终通过了所有测试,效果非常好,非常感谢您的帮助,但我确实做了几件事-1)调整了API URL,以仅请求我感兴趣的数据,并将单位更改为英语; 2)使用JSON解析器解决方案的过程中,我实际上使用了一系列拆分和循环来解析数据-这样做的一个原因是因为我很难遵循JSON解析代码,而且我不喜欢使用复制和粘贴的代码我不明白,但是我也发现它比使用Rage和Scripting Dictionary解析快60%。无论如何再次感谢!

查看完整回答
反对 回复 2019-10-22

添加回答

回复

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信