Wednesday, October 29, 2014

Selenium Webdriver - Find out broken links on a page

Today I will explain how to find out all the broken links present on a webpage using Selenium WebDriver. Providing a ready made method written in Java which will take URL from the user as an argument and give a list of all the broken links present on the webpage as output. Please find the method written below:


    public static void main(String[] args) throws IOException {

                List<String> brokenLinks = new ArrayList<String>();
                brokenLinks = getBrokenLinksOnWebpage("<Webpage URL e.g. https://www.google.com/>");
                System.out.println(brokenLinks);

    }

    // Find out all the broken links
    public static List<String> getBrokenLinksOnWebpage(String pageUrl)
                    throws IOException {

                WebDriver driver = new ChromeDriver();
                driver.get(pageUrl);

                List<WebElement> webElements = driver.findElements(By.tagName("a"));
                List<String> brokenLinks = new ArrayList<String>();

                int isBroken;
                for (int i = 0; i < webElements.size(); i++) {
                    String currentUrl = webElements.get(i).getAttribute("href");
                    isBroken = getHttpResponseCode(currentUrl);
                    if (isBroken != 200) {
                                brokenLinks.add(currentUrl);
                    }

                }

                return brokenLinks;

    }


    // Get HTTP Response code for an URL
    public static int getHttpResponseCode(String URL) throws IOException {

                try {

                    URL url = new URL(URL);
                    HttpURLConnection httpConnection = (HttpURLConnection) url
                                    .openConnection();
                    httpConnection.setRequestMethod("GET");
                    httpConnection.connect();
                    return httpConnection.getResponseCode();

                } catch (Throwable malformedUrlException) {
                    return -1;
                }

    }


Now, we just need to call the getBrokenLinksOnWebpage method and pass in the page URL as the argument. It will return a list of all the urls for the broken links present on the page. We can read the list and do whatever we want to do with this like printing the list/storing the list in a text file.

Hope it helps!

No comments:

Post a Comment