How to use Tor socks5 in R getURL
RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.
In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).
Here is a pure R solution which will use Tor for dns-queries.
library(RCurl)options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))my.handle <- getCurlHandle()html <- getURL(url='https://www.torproject.org', curl=my.handle)
If you want to specify additional parameters, see below on where to put them:
library(RCurl)options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050", useragent = "Mozilla", followlocation = TRUE, referer = "", cookiejar = "my.cookies.txt" ) )my.handle <- getCurlHandle()html <- getURL(url='https://www.torproject.org', curl=my.handle)