Android Web Scraping with a Headless Browser [closed] Android Web Scraping with a Headless Browser [closed] selenium selenium

Android Web Scraping with a Headless Browser [closed]


Ok after 2 weeks I admit defeat and are using a workaround which works great for me at the moment.

The problem:
It is too difficult to port HTMLUnit to Android (or at least with my level of expertise). I am sure its a worthwhile project (and not that time consuming for experienced java programmer) . I emailed the guys at HTMLUnit and they commented that they are not looking into a port or what effort will be involved but suggested anyone who wants to start with such a project should send an message to their mailing list to get more developers involved (http://htmlunit.sourceforge.net/mail-lists.html).

The workaround:
I used android's built in WebView and overrided the onPageFinished method of Webview class to inject Javascript that grabs all the html after the page has fully loaded. Webview can also be used to called futher javascript actions, clicking buttons, filling in forms etc.

Code:

webView.getSettings().setJavaScriptEnabled(true);MyJavaScriptInterface jInterface = new MyJavaScriptInterface(context);webView.addJavascriptInterface(jInterface, "HtmlViewer");webView.setWebViewClient(new WebViewClient() {@Overridepublic void onPageFinished(WebView view, String url) {   //Load HTML   webView.loadUrl("javascript:window.HtmlViewer.showHTML       ('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");}webView.loadUrl(StartURL);ParseHtml(jInterface.html);   public class MyJavaScriptInterface {    private Context ctx;    public String html;    MyJavaScriptInterface(Context ctx) {        this.ctx = ctx;    }    @JavascriptInterface    public void showHTML(String _html) {        html = _html;    }}


I have taken the implementation mentioned above (injecting JavaScript) and that works for me. All I do is simply set the visibility of the webview to be hidden under other UI elements. I was also thinking of doing the same with selenium. I have used selenium with Chrome in Python and it's great but like you mentioned it is not easy to not show the browser window. But I think it might be possible to just not show the component in Android. I'll have to try.