Scraping Google Analytics by Scrapy Scraping Google Analytics by Scrapy ajax ajax

Scraping Google Analytics by Scrapy


Your error is because headers needs to be a dict, not a list inside a dict:

  headers={'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',                          'Galaxy-Ajax': 'true',                          'Origin': 'https://analytics.google.com',                          'Referer': 'https://analytics.google.com/analytics/web/?hl=fr&pli=1',                          'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36',                          },

That will fix your current issue but you will get a 411 as you need to specify the content-length also, if you add what you want to pull from I will be able to show you how. You can see the output below:

2016-03-29 14:02:11 [scrapy] DEBUG: Redirecting (302) to <GET https://www.google.com/analytics/web/?hl=fr> from <GET https://accounts.google.com/CheckCookie?hl=fr&checkedDomains=youtube&pstMsg=0&chtml=LoginDoneHtml&service=analytics&continue=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr&gidl=CAA>2016-03-29 14:02:13 [scrapy] DEBUG: Crawled (200) <GET https://www.google.com/analytics/web/?hl=fr> (referer: https://accounts.google.com/AccountLoginInfo)Login Successful!!2016-03-29 14:02:14 [scrapy] DEBUG: Crawled (411) <POST https://analytics.google.com/analytics/web/getPage?id=trafficsources-all-traffic&ds=a5425w87291514p94531107&hl=fr&authuser=0> (referer: https://analytics.google.com/analytics/web/?hl=fr&pli=1)2016-03-29 14:02:14 [scrapy] DEBUG: Ignoring response <411 https://analytics.google.com/analytics/web/getPage?id=trafficsources-all-traffic&ds=a5425w87291514p94531107&hl=fr&authuser=0>: HTTP status code is not handled or not allowed