Scraping data from Highcharts using selenium

Mozilla provides a simple REST API to get the stats, so you don't need to use Selenium.

With the requests module:

url = "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20170823-20171023.json"data = requests.get(url).json()

To select the range, simply update the dates in the URL.

But if you are still willing to scrap the chart with selenium:

dates = driver.execute_script("return Highcharts.charts[0].series[0].xData");users = driver.execute_script("return Highcharts.charts[0].series[0].yData");downloads = driver.execute_script("return Highcharts.charts[0].series[1].yData");

python selenium highcharts

I noticed one thing.

It seems true that:

"when I use custom search option, csv file that automatically generated by the website is not updated".

But actually it is not true. It is updated, but the maximum "custom data range" seems to be 1 year.

For example, if you set from 2013-09-23 to 2017-10-23 the .csv(.json) generated has max the data of 1 year (in this example from 22/10/2016 to 21/10/2017).

You can better notice this if you play with the "extremes".

For example with:

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131023-20141023.json

first element: {"date": "2014-10-23", "count": 212730, "end": "2014-10-23"}
last element: {"date": "2013-10-24", "count": 163094, "end": "2013-10-24"}

if you change with:

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131023-20141024.json

first element: {"date": "2014-10-24", "count": 215105, "end": "2014-10-24"}
last element: {"date": "2013-10-25", "count": 168018, "end": "2013-10-25"}

Or with:

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131022-20141023.json

will be again :

first element: {"date": "2014-10-23", "count": 212730, "end": "2014-10-23"}
last element: {"date": "2013-10-24", "count": 163094, "end": "2013-10-24"}

So, in order to get the data of the last 5 years you could do:

import subprocessinterestedYears=5;year=1today="2017-10-23"tokenDataToday= today.split("-")dateEnd=tokenDataToday[0]+tokenDataToday[1]+tokenDataToday[2]url= "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-"while year <= interestedYears:     yearStart= str(int(float(tokenDataToday[0]))-year)     dateStart=yearStart+tokenDataToday[1]+tokenDataToday[2]     #print("dateStart: " + dateStart)     #print("dateEnd: " + dateEnd)     tmpUrl=url+dateStart+"-"+dateEnd+".csv"     cmd = 'curl -O ' + tmpUrl     print(cmd)     args = cmd.split()     process = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)     stdout, stderr = process.communicate()     dateEnd=dateStart     year = year+1     print("-----------------------------")

CodeHunter

Scraping data from Highcharts using selenium

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last