Tor Web Crawler Tor Web Crawler php php

Tor Web Crawler


cURL also supports SOCKS connections; try this:

<?php$ch = curl_init('http://google.com'); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); // SOCKS5curl_setopt($ch, CURLOPT_PROXY, 'localhost:9050'); curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);curl_exec($ch); curl_close($ch);


Unless I'm missing something the answer is yes, and here is some documentation on the Tor site. The instructions are pretty specific. Though I've not set Tor up as a proxy it's something I've considered, this is the place I would start.

EDIT:It is dead simple to setup Tor on Linux and use it as a proxy as the documentation suggests.

sudo apt-get install torsudo /etc/init.d/tor startnetstat -ant | grep 9050 # verify Tor is running

Now after looking through OPs code we see calls to file_get_contents. While the easiest method to use at first file_get_contents becomes cumbersome when you want to start parametrizing the request because you have to use stream contexts.

First suggestion is to move to curl, but again, more reading on how SOCKS works w/ HTTP is probly in order to truly answer this question... But to answer the question technically, how to send an HTTP request to a Tor SOCKS proxy on localhost, again easy..

<?php  $ch = curl_init('http://google.com'); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); curl_setopt($ch, CURLOPT_PROXY, 'https://127.0.01:9050/'); curl_exec($ch); curl_close($ch);

But what does Tor tell us?

HTTP/1.0 501 Tor is not an HTTP Proxy

Content-Type: text/html; charset=iso-8859-1

Basically, learn more about SOCKS & HTTP. Another option is to google around for PHP SOCKS clients. A quick inspection reveals a library that claims it can send HTTP requests over SOCKS.

EDIT:

Alright, 1 more edit! Seconds after finishing my last post, I've found a way to do it. This article shows us how to set up something called Privoxy, which translates SOCKS requests into HTTP requests. Put that in front of Tor and blamo, we're sending proxied HTTP requests through Tor!


you have to intercept the dns lookup request from the php script by configuring tor with the "dnsport" directive. then you have to configure a "transport" for tor and a "virtualnetworkaddress". now what happens when your php script does a dns-lookup thru tor is that tor sees a request for a onion address and answers with a ip address from the "virtualnetworkaddress" range. you now have to redirect the traffic going to this address to the address defined with "transport". read "torrc" manual on "automaphostonresolve", "virtualnetworkaddress", "dnsport" and "transport".