How to Block 100,000+ Individual IP addresses How to Block 100,000+ Individual IP addresses apache apache

How to Block 100,000+ Individual IP addresses


Something that you can try is keeping a list of the IP addresses you want to block in a text file or convert it to a dbm hash file, then use mod_rewrite's RewriteMap. You'd have to set this up in your server/vhost config. You cannot initialize a map in an htaccess file.

RewriteEngine OnRewriteMap deny_ips txt:/path/to/deny_ips.txtRewriteCond ${deny_ips:%{REMOTE_ADDR}|0} !=0RewriteRule ^ - [L,F]

The /path/to/deny_ips.txt file would look something like this:

12.34.56.78 111.22.33.44 1etc.

Essentially, an IP that you want to deny and a space then a "1". Any IP in this text file will cause the server to return a 403 Forbidden. To speed things up a bit you can use the httxt2dbm to generate a dbm hash and then you'd define the mapping as so:

RewriteMap deny_ips dbm:/path/to/deny_ips.dbm

I'm not sure what the performance hit is for using mod_rewrite like this with a lot of IPs, but a quick benchmark test on apache 2.2 running on a 3Ghz i686 under linux, the difference between 5 IPs in the list versus 102418 is negligible. According to ab's output, they're nearly identical.


Addressing specific questions:

Is it possible for htaccess to get the list from database (Redis,Crunchbase,Mongo, MySQL or even Sqlite) ... any

Using a rewrite map, you can use the "prg" map type to run an external program for a mapping type. You can then write a perl, php, etc. script to talk to a database in order to look up an IP address. Also note that caveats listed under "Caution". You'd then use this map like you would any other map (RewriteCond ${deny_ips:%{REMOTE_ADDR}|0} !=0). This would essentially create a bottleneck for all requests. Not the best solution for talking to a database.

In apache 2.4 though, there is a dbd/fastdbd map type, which allows you to create queries through mod_dbd. This is a much better option and the mod_dbd module manages connections to the database, pools connections, etc. So the map definition would look something like:

RewriteMap deny_ips "fastdbd:SELECT active FROM deny_ips WHERE source = %s"

Assuming you have a table "deny_ips" with 2 columns "source" (the IP address) and "active" (1 for active, 0 for inactive).

Is there a visible solution to manage such kind of issue in production

If you are storing all of the blocked IPs in the database, it's a matter of managing the contents of your database table. If you are using the dbm map type, I know at least perl has a DBI for managing dbm files, so you can use that to add/remove IP entries from the deny list. I've never used it before so I can't really say much about it. Managing a flat text file is going to be a lot trickier, especially if you plan on removing entries, and not just append to it. Outside of using a database and apache 2.4's mod_dbd, I don't think any of these solutions are out of the box or production ready. It's going to require custom work.

I know the best solution is Block the IPs at the firewall level is there any way to pragmatically add/remove IP to the firewall

For IPtables, there is a perl interface that's marked as Beta, but I've never used it before. There's libiptc but according to netfilter's faq:

Is there an C/C++ API for adding/removing rules?

The answer unfortunately is: No.

Now you might think 'but what about libiptc?'. As has been pointed out numerous times on the mailinglist(s), libiptc was NEVER meant to be used as a public interface. We don't guarantee a stable interface, and it is planned to remove it in the next incarnation of linux packet filtering. libiptc is way too low-layer to be used reasonably anyway.

We are well aware that there is a fundamental lack for such an API, and we are working on improving that situation. Until then, it is recommended to either use system() or open a pipe into stdin of iptables-restore. The latter will give you a way better performance.

So I don't know how viable a libiptc solution is if there's no API stability.


ANOTHER PERSPECTIVE

Hello. You can check if an address is blocked or not, via accessing two bytes in two data chunks each 8KB long. Yes, I am serious... Please be patient because it takes a little bit long to explain it.

THE THEORY

An IP address is an address, actually a 4 byte number.

The question is, what if we make it to address bit positions?.

The answer: Well ok, we will have

  2^32 = 4 Giga Bits 

of addressing space and that will take

 4Gb/8 = 512 Mega Bytes

of allocation. Ouch! But do not worry, we are not going to block everything in the ipverse and 512MB is an exaggeration.

This can open us a path to the solution.

The Lilliputian Case

Think of a Lilliputian world which there exists only ip addresses from 0 to 65535. So addresses are like 0.1 or 42.42 up to 255.255.

Now King of this world wants to block several L-IP (lilliput ip) addresses.

First he builds a virtual 2D bit map which is 256 * 256 bits long that takes up :

 64 K Bits = 8 K Bytes.

He decides to block that nasty "revolution" site which he hates because he is the king, the address is 56.28 for instance.

Address     = (56 * 256) + 28  = 14364.(bit position in whole map)Byte in map = floor(14364 / 8) =  1795.Bit position= 14364 % 8        =     4.(modulus)

He opens the map file, accesses 1795th byte and sets the bit 4 (by an | 16), then writes it back to mark the site as blocked.

When his script sees the 56.28, it does the same calculation and looks at the bit, and if it is set, blocks the address.

Now what is the moral of the story? Well we can use this lilliputian structure.

THE PRACTICE

The Real World Case

We can apply the Lilliputian case to real world with a "use it when you need" approach since allocating a 512MB file is not a good choice.

Think of a database table named BLOCKS with entries like that:

IpHead(key): unsigned 16 bit integer,Map        : 8KB BLOB(fixed size),EntryCount : unsigned 16 bit integer.

And another table with just one entry with the structure below named BASE

Map        : 8KB BLOB(fixed size).

Now lets say you have an incoming address 56.28.10.2

Script accesses BASE table and gets the Map.

It looks up the higher order IP numbers 56.28:

Address     = (56 * 256) + 28  = 14364.(bit position in whole map)Byte in map = floor(14364 / 8) =  1795.Bit position= 14364 % 8        =     4.(modulus)

Looks at byte 1795 bit 4 in the Map.

If bit is not set no further operation is needed meaning there is no blocked ip address in range 56.28.0.0 - 56.28.255.255 .

If bit is set then the script accesses the BLOCKS table.

The higher order IP numbers were 56.28 which gives 14364 so the script queries the BLOCKS table with index IpHead = 14364. Fetches the record. The record should exist since it is marked at BASE.

Script does the calculation for lower order IP address

Address     = (10 * 256) + 2   = 2562.(bit position in whole map)Byte in map = floor(2562 / 8) =   320.Bit position= 2562 % 8        =     2.(modulus)

Then it checks if address is blocked by looking at bit 2 of byte 320 of field Map.

Job done!

Q1: Why do we use BASE at all, we could directly query BLOCKS with 14364.

A1: Yes we could but BASE map lookup will be faster then BTREE search of any database server.

Q2: What is the EntryCount field in BLOCKS table for?

A2: It is the count of ip addresses blocked in the map field at the same record. So if we unblock ip's and EntryCount reaches 0 that BLOCKS record becomes unnecessary. It can be erased and the corresponding bit on BASE map will be unset.

IMHO this approach will be lightning fast. Also for the blob allocation is 8K per record. Since db servers keep blobs in seperate files, file systems with 4K, 8K or multiples of 4K paging will react fast.

In case blocked addresses are too dispersed

Well that is a concern, which will make the database BLOCKS table to grow unnecessarily.

But for such cases the alternative is to use a 256*256*256 bit cube which is 16777216 bits long, equaling to 2097152 bytes = 2MB.

For our previous example Higher Ip resolving is :

(56 * 65536)+(28 * 256)+10      

So BASE will become a 2MB file instead of a db table record, which will be opened (fopen etc.) and bit will be addressed via seeking (like fseek, never read whole file contents, unnecessary) then access the BLOCKS table with structure below:

IpHead(key): unsigned 32 bit integer, (only 24 bit is used)Map        : 32 unsigned 8 bit integers(char maybe),(256 bit fixed)EntryCount : unsigned 8 bit integer. 

Here is the php example code for block checking of bitplane-bitplane (8K 8K) version:

Side Note: This script can be optimized further via elimination of several calls etc.. But written like this for keeping it easy to understand.

<?define('BLOCK_ON_ERROR', true); // WARNING if true errors block everyone$shost = 'hosturl';$suser = 'username';$spass = 'password';$sdbip = 'database';$slink = null;$slink = mysqli_connect($shost, $suser, $spass, $sdbip);if (! $slink) {    $blocked = BLOCK_ON_ERROR;} else {    $blocked = isBlocked();    mysqli_close($slink); // clean, tidy...}if ($blocked) {    // do what ever you want when blocked} else {    // do what ever you want when not blocked}exit(0);function getUserIp() {    $st = array(            'HTTP_CLIENT_IP',            'REMOTE_ADDR',            'HTTP_X_FORWARDED_FOR'    );    foreach ( $st as $v )        if (! empty($_SERVER[$v]))            return ($_SERVER[$v]);    return ("");}function ipToArray($ip) {    $ip = explode('.', $ip);    foreach ( $ip as $k => $v )        $ip[$k] = intval($v);    return ($ip);}function calculateBitPos($IpH, $IpL) {    $BitAdr = ($IpH * 256) + $IpL;    $BytAdr = floor($BitAdr / 8);    $BitOfs = $BitAdr % 8;    $BitMask = 1;    $BitMask = $BitMask << $BitOfs;    return (array(            'bytePos' => $BytAdr,            'bitMask' => $BitMask    ));}function getBaseMap($link) {    $q = 'SELECT * FROM BASE WHERE id = 0';    $r = mysqli_query($link, $q);    if (! $r)        return (null);    $m = mysqli_fetch_assoc($r);    mysqli_free_result($r);    return ($m['map']);}function getBlocksMap($link, $IpHead) {    $q = "SELECT * FROM BLOCKS WHERE IpHead = $IpHead";    $r = mysqli_query($link, $q);    if (! $r)        return (null);    $m = mysqli_fetch_assoc($r);    mysqli_free_result($r);    return ($m['map']);}function isBlocked() {    global $slink;    $ip = getUserIp();    if($ip == "")        return (BLOCK_ON_ERROR);    $ip = ipToArray($ip);    // here you can embed preliminary checks like ip[0] = 10 exit(0)    // for unblocking or blocking address range 10 or 192 or 127 etc....    // Look at base table base record.    // map is a php string, which in fact is a good byte array    $map = getBaseMap($slink);     if (! $map)        return (BLOCK_ON_ERROR);    $p = calculateBitPos($ip[0], $ip[1]);    $c = ord($map[$p['bytePos']]);    if (($c & $p['bitMask']) == 0)        return (false); // No address blocked    // Look at blocks table related record    $map = getBlocksMap($slink, $p[0]);    if (! $map)        return (BLOCK_ON_ERROR);    $p = calculateBitPos($ip[2], $ip[3]);    $c = ord($map[$p['bytePos']]);    return (($c & $p['bitMask']) != 0);}?> 

I hope this helps.

If you have questions on the details, I will be happy to answer.


Block the traffic before it reaches the www server using iptables and ipset.

Catch the blacklisted IP traffic in the filter table of the INPUT chain assuming your web server is on the same machine. If you are blocking IPs on a router you will want the FORWARD chain.

First create the ipset:

ipset create ip_blacklist hash:ip

IPs can be added via:

ipset add ip_blacklist xxx.xxx.xxx.xxx

Add the ipset match rule to your iptables (DROP all packets match to ipset):

iptables --table filter --insert INPUT --match set --match-set ip_blacklist src -j DROP

This will stop the blacklisted traffic before the www server.

Edit: I had a chance to look up the default maximum size and it is 65536 so you will need to adjust this to support 100000+ entries:

ipset create ip_blacklist hash:ip maxelem 120000

You can also tweak the hash size:

ipset create ip_blacklist hash:ip maxelem 120000 hashsize 16384 (Must be a power of 2)

My experience is ipset lookup has negligible effect on my system (~45000 entries). There are a number of test cases on the net. Memory for the set is a limiting factor.