为了账号安全,请及时绑定邮箱和手机立即绑定

vba 中的网页抓取 - 构造工作数据并从左到右单元格写入

vba 中的网页抓取 - 构造工作数据并从左到右单元格写入

一只萌萌小番薯 2024-01-11 14:12:34
刚刚在这里注册了一个帐户,是的,我是一个真正的菜鸟 - 请对我好一点。现在我面临的挑战是:我正在用 VBA 构建一个网络抓取工具,并找到了一个代码,我根据自己的需要做了一些修改。一切都很完美,而且实际上非常顺利。现在我希望加载到我的 exel 文档中的文本不要太长,而是很宽。我怀疑它与“.Offset(I,j)”有关。我玩过一点,但我只是设法毁了一切。这是我使用的代码:Dim IE As InternetExplorerDim htmldoc As MSHTML.IHTMLDocument 'Document objectDim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tagsDim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tagsDim eleRow As MSHTML.IHTMLElement 'Row elementsDim eleCol As MSHTML.IHTMLElement 'Column elementsDim ieURL As String 'URL'Open InternetExplorerSet IE = CreateObject("InternetExplorer.Application")IE.Visible = True'Navigate to webpageieURL = "#"IE.Navigate ieURL'WaitDo While IE.Busy Or IE.ReadyState <> 4 DoEventsLoopSet htmldoc = IE.Document 'Document webpageSet eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags'This section populates ExcelI = 0 'start with first value in tr collectionFor Each eleRow In eleColtr 'for each element in the tr collection Set eleColtd = htmldoc.getElementsByTagName("tr")(I).getElementsByTagName("td") 'get all the td elements in that specific tr j = 0 'start with the first value in the td collection For Each eleCol In eleColtd 'for each element in the td collection Sheets("Sheet1").Range("A1").Offset(I, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time j = j + 1 'move to next element in td collection Next eleCol 'rinse and repeat I = I + 1 'move to next element in td collectionNext eleRow 'rinse and repeatEnd Sub ```
查看完整描述

1 回答

?
慕村225694

TA贡献1880条经验 获得超4个赞

你不需要浏览器。您可以使用更快的 xhr。抓取表格并循环行,然后循环填充预先确定大小的数组的列(请务必删除标题所在的行。它们可以被识别为[colspan='2']在第一个中具有td)。然后转置数组并写入工作表。

Option Explicit


Public Sub TransposeTable()

    Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument, table As MSHTML.htmltable

    'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ;  Microsoft XML, v6 (your version may vary)


    Set xhr = New MSXML2.XMLHTTP60

    Set html = New MSHTML.HTMLDocument

    '  7NXBG2 ;  8QT2E3


    With xhr

        .Open "GET", "https://www.chrono24.com/watch/8QT2E3", False

        .send

        html.body.innerHTML = .responseText

    End With


    Set table = html.querySelector(".specifications table")


    Dim results(), rowCountToExclude As Long


    rowCountToExclude = html.querySelectorAll(".specifications table [colspan='2']").Length

    ReDim results(1 To table.rows.Length - rowCountToExclude, 1 To table.getElementsByTagName("tr")(0).Children(0).getAttribute("colspan"))


    Dim r As Long, c As Long, outputRow As Long, outputColumn As Long, html2 As MSHTML.HTMLDocument


    Set html2 = New MSHTML.HTMLDocument


    For r = 0 To table.getElementsByTagName("tr").Length - 1

        Dim row As Object


        Set row = table.getElementsByTagName("tr")(r)

        html2.body.innerHTML = "<body> <table>" & row.outerHTML & "</table></body> "


        If html2.querySelectorAll("[colspan='2']").Length = 0 Then

            outputRow = outputRow + 1: outputColumn = 1

            For c = 0 To row.getElementsByTagName("td").Length - 1

                results(outputRow, outputColumn) = row.getElementsByTagName("td")(c).innerText

                outputColumn = outputColumn + 1

            Next

        End If

        Set row = Nothing

    Next


    results = Application.Transpose(results)

    ActiveSheet.Cells(1, 1).Resize(UBound(results, 1), UBound(results, 2)) = results

End Sub


查看完整回答
反对 回复 2024-01-11
  • 1 回答
  • 0 关注
  • 35 浏览

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信