Can Selenium verify text inside a PDF loaded by the browser?

firefox testing pdf selenium selenium-ide

While not natively supported, I have found a couple ways using the java driver. One way is to have the pdf open in your browser (having adobe acrobat installed) and then use keyboard shortcut keys to select all text (CTRL+A), then copy it to the clipboard (CTRL+C) and then you can verify the text in the clipboard. eg:

protected String getLastWindow() {    return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");}@Testpublic void testTextInPDF() {    session().click("link=View PDF");    String popupName = getLastWindow();    session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);    session().selectWindow(popupName);    session().windowMaximize();    session().windowFocus();    Thread.sleep(3000);    session().keyDownNative("17"); // Stands for CTRL key    session().keyPressNative("65"); // Stands for A "ascii code for A"    session().keyUpNative("17"); //Releases CTRL key    Thread.sleep(1000);    session().keyDownNative("17"); // Stands for CTRL key    session().keyPressNative("67"); // Stands for C "ascii code for C"    session().keyUpNative("17"); //Releases CTRL key    TextTransfer textTransfer = new TextTransfer();    assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));}

Another way, still in java, is to download the pdf and then convert the pdf to text with PDFBox, see http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html for an example on how to do this.

firefox testing pdf selenium selenium-ide

You cannot do this using WebDriver natively. However, PDFBox API can be used here to read content of PDF file. You will have to first of all shift a focus to browser window where PDF file is opened. You can then parse all the content of PDF file and search for the desired text string.

Here is a code to use PDFBox API to search within PDF document.

firefox testing pdf selenium selenium-ide

import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.IOException;import java.io.PrintWriter;import org.pdfbox.cos.COSDocument;import org.pdfbox.pdfparser.PDFParser;import org.pdfbox.pdmodel.PDDocument;import org.pdfbox.util.PDFTextStripper;public class pdfToTextConverter {public static void pdfToText(String path_to_PDF_file, String Path_to_output_text_file) throws FileNotFoundException, IOException{     //Parse text from a PDF into a string variable     File f = new File("path_to_PDF_file");     PDFParser parser = new PDFParser(new FileInputStream(f));     parser.parse();     COSDocument cosDoc = parser.getDocument();     PDDocument pdDoc = new PDDocument(cosDoc);     PDFTextStripper pdfStripper = new PDFTextStripper();     String parsedText = pdfStripper.getText(pdDoc);     System.out.println(parsedText);     //Write parsed text into a file     PrintWriter pw = new PrintWriter("Path_to_output_text_file");     pw.print(parsedText);     pw.close(); }}JAR Sourcehttp://sourceforge.net/projects/pdfbox/files/latest/download?source=files

CodeHunter

Can Selenium verify text inside a PDF loaded by the browser?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last