首页猿问消除网页上的重复链接并避免链接过时错误

消除网页上的重复链接并避免链接过时错误

Java

达令说 2023-07-13 13:49:16

我有 20 个链接的列表，其中一些是重复的。我单击第一个链接，将我带到下一页，我从下一页下载一些文件。第1页链接1链接2链接3链接1链接3链接4链接2链接 1（点击）-->（打开）第 2 页第 2 页（单击后退按钮浏览器）-->（返回）第 1 页现在我单击链接 2 并重复相同的操作。 System.setProperty("webdriver.chrome.driver", "C:\\chromedriver.exe"); String fileDownloadPath = "C:\\Users\\Public\\Downloads"; //Set properties to supress popups Map<String, Object> prefsMap = new HashMap<String, Object>(); prefsMap.put("profile.default_content_settings.popups", 0); prefsMap.put("download.default_directory", fileDownloadPath); prefsMap.put("plugins.always_open_pdf_externally", true); prefsMap.put("safebrowsing.enabled", "false"); //assign driver properties ChromeOptions option = new ChromeOptions(); option.setExperimentalOption("prefs", prefsMap); option.addArguments("--test-type"); option.addArguments("--disable-extensions"); option.addArguments("--safebrowsing-disable-download-protection"); option.addArguments("--safebrowsing-disable-extension-blacklist"); WebDriver driver = new ChromeDriver(option); driver.get("http://www.mywebpage.com/"); List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')]")); Thread.sleep(500); pageSize = listOfLinks.size(); System.out.println( "The number of links in the page is: " + pageSize); //iterate through all the links on the page for ( int i = 0; i < pageSize; i++) { System.out.println( "Clicking on link: " + i ); try { linkText = listOfLinks.get(i).getText(); listOfLinks.get(i).click(); }该代码运行良好，单击所有链接并下载文件。现在我需要改进逻辑，省略重复的链接。我尝试过滤掉列表中的重复项，但不确定应该如何处理 org.openqa.selenium.StaleElementReferenceException。我正在寻找的解决方案是单击第一次出现的链接，并避免在再次出现时单击该链接。（这是从门户下载多个文件的复杂逻辑的一部分，我无法控制。因此，请不要带着诸如为什么页面上首先存在重复链接之类的问题回来。）

查看完整描述

3 回答

ibeautiful

TA贡献1993条经验获得超5个赞

首先，我不建议您重复向 WebDriver 发出请求（findElements），沿着这条路径您会看到很多性能问题，主要是如果您有很多链接和页面。

另外，如果您始终在同一个选项卡上执行相同的操作，则需要等待刷新两次（链接页面和下载页面），现在如果您在新选项卡中打开每个链接，则只需等待您要下载的页面刷新。

我有一个建议，就像@supputuri所说的不同的重复链接，并在新选项卡中打开每个链接，这样您就不需要处理过时的内容，不需要每次都在屏幕上搜索链接，不需要在每次迭代中等待带有链接的页面刷新。

List<WebElement> uniqueLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));

for ( int i = 0; i < uniqueLinks.size(); i++)

{

new Actions(driver)

.keyDown(Keys.CONTROL)

.click(uniqueLinks.get(i))

.keyUp(Keys.CONTROL)

.build()

.perform();

// if you want you can create the array here on this line instead of create inside the method below.

driver.switchTo().window(new ArrayList<>(driver.getWindowHandles()).get(1));

//do your wait stuff.

driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();

//do your wait stuff.

driver.close();

driver.switchTo().window(new ArrayList<>(driver.getWindowHandles()).get(0));

}

我现在无法正确测试我的代码，此代码的任何问题只需评论，我将更新答案，但这个想法是正确的，而且非常简单。

反对回复 2023-07-13

叮当猫咪

TA贡献1776条经验获得超12个赞

首先让我们看看xpath。

示例 HTML：

<!DOCTYPE html>

<html>

<body>

<div>

<a href='https://google.com'>Google</a>

<a href='https://yahoo.com'>Yahoo</a>

<a href='https://google.com'>Google</a>

</body>

</html>

让我们看看 xpath 以从上面获取不同的链接。

//a[not(@href = following::a/@href)]

xpath 中的逻辑是我们确保链接的 href 与任何后续链接的 href 不匹配，如果匹配则将其视为重复，并且 xpath 不会返回该元素。

过时元素：所以，现在是时候处理代码中的过时元素问题了。当您单击链接 1 时，存储在其中的所有引用都listOfLinks将无效，因为每次在页面上加载元素时，selenium 都会将新引用分配给元素。当您尝试访问具有旧引用的元素时，您将得到异常stale element。下面是一段代码，应该可以让您有所了解。

List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));

Thread.sleep(500);

pageSize = listOfLinks.size();

System.out.println( "The number of links in the page is: " + pageSize);

//iterate through all the links on the page

for ( int i = 0; i < pageSize; i++)

{

// ===> consider adding step to explicit wait for the Link element with "//a[contains(@href,'Link')][not(@href = following::a/@href)]" xpath present using WebDriverWait

// don't hard code the sleep

// ===> added this line

<WebElement> link = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]")).get(i);

System.out.println( "Clicking on link: " + i );

// ===> updated next 2 lines

linkText = link.getText();

link.click();

// ===> consider adding explicit wait using WebDriverWait to make sure the span exist before clicking.

driver.findElement(By.xpath("//span[contains(@title,'download')]")).click();

// ===> check this answer (https://stackoverflow.com/questions/34548041/selenium-give-file-name-when-downloading/56570364#56570364) for make sure the download is completed before clicking on browser back rather than sleep for x seconds.

driver.navigate().back();

// ===> removed hard coded wait time (sleep)

}

如果您想在新窗口中打开链接，请使用以下逻辑。

WebDriverWait wait = new WebDriverWait(driver, 20);

wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]")));

List<WebElement> listOfLinks = driver.findElements(By.xpath("//a[contains(@href,'Link')][not(@href = following::a/@href)]"));

JavascriptExecutor js = (JavascriptExecutor) driver;

for (WebElement link : listOfLinks) {

// get the href

String href = link.getAttribute("href");

// open the link in new tab

js.executeScript("window.open('" + href +"')");

// switch to new tab

ArrayList<String> tabs = new ArrayList<String> (driver.getWindowHandles());

driver.switchTo().window(tabs.get(1));

//click on download

//close the new tab

driver.close();

// switch to parent window

driver.switchTo().window(tabs.get(0));

}

反对回复 2023-07-13

慕仙森

TA贡献1827条经验获得超7个赞

你可以这样做。

将列表中元素的索引保存到哈希表
如果 Hashtable 已包含，则跳过它
一旦完成，HT只有独特的元素，即第一个Foundones

HT 的值是 listOfLinks 中的索引

        HashTable < String, Integer > hs1 = new HashTable(String, Integer);
                for (int i = 0; i < listOfLinks.size(); i++) {
                            if (!hs1.contains(e.getText()) {

                    hs1.add(e.getText(), i);
                }
            }            for (int i: hs1.values()) {

                listOfLinks.get(i).click();
            }

反对回复 2023-07-13

3 回答
0 关注
100 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

消除网页上的重复链接并避免链接过时错误

消除网页上的重复链接并避免链接过时错误

3 回答

添加回答