Processing large amounts of data in PHP without a browser timeout Processing large amounts of data in PHP without a browser timeout codeigniter codeigniter

Processing large amounts of data in PHP without a browser timeout


I would write two scripts:

File index.php:

<iframe src="job.php" frameborder="0" scrolling="no" width="1" height="1"></iframe><script type="text/javascript">    function progress(percent){        document.getElementById('done').innerHTML=percent+'%';    }</script><div id="done">0%</div>

File job.php:

set_time_limit(0);                   // ignore php timeoutignore_user_abort(true);             // keep on going even if user pulls the plug*while(ob_get_level())ob_end_clean(); // remove output buffersob_implicit_flush(true);             // output stuff directly// * This absolutely depends on whether you want the user to stop the process//   or not. For example: You might create a stop button in index.php like so://     <a href="javascript:window.frames[0].location='';">Stop!</a>//     <a href="javascript:window.frames[0].location='job.php';">Start</a>// But of course, you will need that line of code commented out for this feature to work.function progress($percent){    echo '<script type="text/javascript">parent.progress('.$percent.');</script>';}$total=count($mobiles);echo '<!DOCTYPE html><html><head></head><body>'; // webkit hotfixforeach($mobiles as $i=>$mobile){    // send sms    progress($i/$total*100);}progress(100);echo '</body></html>'; // webkit hotfix


I'm assuming these numbers are in a database, if so you should add a new column titled isSent (or whatever you fancy).

This next paragraph you typed should be queued and possibly done night/weekly/whenever appropriate. Unless you have a specific reason too, it shouldn't be done in bulk on demand. You can even add a column to the db to see when it was last checked so that if a number hasn't been checked in at least X days then you can perform a check on that number on demand.

Processing of the data involves checking mobile number type (e.g CDMA), assigning unique ids to all the numbers for further referencing, check for network/country unique charges, etc.

But that still leads you back to the same question of how to do this for 50,000 numbers at once. Since you mentioned cron jobs, I'm assuming you have SSH access to your server which means you don't need a browser. These cron jobs can be executed via the command line as such:

/usr/bin/php /home/username/example.com/myscript.php

My recommendation is to process 1,000 numbers at a time every 10 minutes via cron and to time how long this takes, then save it to a DB. Since you're using a cron job, it doesn't seem like these are time-sensitive SMS messages so they can be spread out. Once you know how long it took for this script to run 50 times (50*1000 = 50k) then you can update your cron job to run more/less frequently.

$time_start = microtime(true);set_time_limit(0);function doSendSMS($phoneNum, $msg, $blah);$time_end = microtime(true);$time = $time_end - $time_start;saveTimeRequiredToSendMessagesInDB($time);

Also, you might have noticed a set_time_limit(0), this will tell PHP to not timeout after the default 30seconds. If you are able to modify the PHP.ini file then you don't need to enter this line of code. Even if you are able to edit the PHP.ini file, I would still recommend not changing this feature since you might want other pages to time out.

http://php.net/manual/en/function.set-time-limit.php


If this isn't a one-off type of situation, consider engineering a better solution.

What you basically want is a queue that your browser-bound process can write to, and than 1-N worker processes can read from and update.

Putting work in the queue should be rather inexpensive - perhaps a bunch of simple INSERT statements to a SQL RDBMS.

Then you can have a daemon or two (or 100, distributed across multiple servers) that read from the queue and process stuff. You'll want to be careful here and avoid two workers taking on the same task, but that's not hard to code around.

So your browser-bound workflow is: click some button that causes a bunch of stuff to get added to the queue, then redirect to some "queue status" interface, where the user can watch the system chew through all their work.

A system like this is nice, because it's easy to scale horizontally quite a ways.

EDIT: Christian Sciberras' answer is going in this direction, except the browser ends up driving both sides (it adds to the queue, then drives the worker process)