Wednesday, October 29, 2014

Selenium Webdriver - Find out broken links on a page

Today I will explain how to find out all the broken links present on a webpage using Selenium WebDriver. Providing a ready made method written in Java which will take URL from the user as an argument and give a list of all the broken links present on the webpage as output. Please find the method written below:


    public static void main(String[] args) throws IOException {

                List<String> brokenLinks = new ArrayList<String>();
                brokenLinks = getBrokenLinksOnWebpage("<Webpage URL e.g. https://www.google.com/>");
                System.out.println(brokenLinks);

    }

    // Find out all the broken links
    public static List<String> getBrokenLinksOnWebpage(String pageUrl)
                    throws IOException {

                WebDriver driver = new ChromeDriver();
                driver.get(pageUrl);

                List<WebElement> webElements = driver.findElements(By.tagName("a"));
                List<String> brokenLinks = new ArrayList<String>();

                int isBroken;
                for (int i = 0; i < webElements.size(); i++) {
                    String currentUrl = webElements.get(i).getAttribute("href");
                    isBroken = getHttpResponseCode(currentUrl);
                    if (isBroken != 200) {
                                brokenLinks.add(currentUrl);
                    }

                }

                return brokenLinks;

    }


    // Get HTTP Response code for an URL
    public static int getHttpResponseCode(String URL) throws IOException {

                try {

                    URL url = new URL(URL);
                    HttpURLConnection httpConnection = (HttpURLConnection) url
                                    .openConnection();
                    httpConnection.setRequestMethod("GET");
                    httpConnection.connect();
                    return httpConnection.getResponseCode();

                } catch (Throwable malformedUrlException) {
                    return -1;
                }

    }


Now, we just need to call the getBrokenLinksOnWebpage method and pass in the page URL as the argument. It will return a list of all the urls for the broken links present on the page. We can read the list and do whatever we want to do with this like printing the list/storing the list in a text file.

Hope it helps!

Monday, October 13, 2014

IEDriverServer : Start up messages - Some Pointers

With WebDriver + Internet Explorer, the following start up messages get appeared into Console :

Started InternetExplorerDriver server (32-bit)
2.43.0.0
Listening on port 47206
Oct 13, 2014 3:12:21 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute
INFO: I/O exception (org.apache.http.NoHttpResponseException) caught when processing request: The target server failed to respond
Oct 13, 2014 3:12:21 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute
INFO: Retrying request

Many of us always feel awkward about this and think to be some potential error with IEDriverServer. Here is some of pointers provided by Jim Evans on why we are wrong : 

Want to block these messages anyway. Here is some code snippet which will do the trick.

Disable log messages

java.util.logging.Logger.getLogger("org.apache.http.impl.client").setLevel(java.util.logging.Level.WARNING);

This is to disable log message from getting displayed on the console. Updated code will look like the following :

// Disable log messages

java.util.logging.Logger.getLogger("org.apache.http.impl.client").setLevel(java.util.logging.Level.WARNING);
System.setProperty("webdriver.ie.driver","D:\\Selenium Info\\IEDriverServer.exe");
DesiredCapabilities dc = DesiredCapabilities.internetExplorer();
dc.setCapability("silent", true);
WebDriver driver = new InternetExplorerDriver(dc);

Note: dc.setCapability("silent", true); only blocks IEDriver starting message.


Hope this helps!