Compute 2,3 quartile average in SQL Compute 2,3 quartile average in SQL mysql mysql

Compute 2,3 quartile average in SQL


Look at answer and comment by @Richard aka cyberkiwi in this question:

Select *from(    SELECT tbl.*, @counter := @counter +1 counter    FROM (select @counter:=0) initvar, tbl    ORDER BY ordcolumn) Xwhere counter >= (25/100 * @counter) and counter <= (75/100 * @counter);ORDER BY ordcolumn


You can create the quartile values by using IF to set them to zero if in the wrong quartile:

Let's assume, the raw data table is created by

DROP TABLE IF EXISTS `rawdata`;CREATE TABLE `rawdata` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `url` varchar(250) NOT NULL DEFAULT '',  `time` int(11) NOT NULL,  PRIMARY KEY (`id`),  KEY `time` (`time`)) ENGINE=MyISAM DEFAULT CHARSET=utf8;

(and ofcourse populated).

Let's also assume the quartile table data is created by

DROP TABLE IF EXISTS `quartiles`;CREATE TABLE `quartiles` (  `url` varchar(250) NOT NULL,  `Q1` float DEFAULT '0',  `Q2` float DEFAULT '0',  `Q3` float DEFAULT '0',  `Q4` float DEFAULT '0',  PRIMARY KEY (`url`),) ENGINE=MyISAM DEFAULT CHARSET=utf8;

(and left empty).

Then a procedure to populate quartiles from rawdata would look like

DELIMITER ;;CREATE PROCEDURE `ComputeQuartiles`()    READS SQL DATABEGIN    DECLARE numrows int DEFAULT 0;    DECLARE qrows int DEFAULT 0;    DECLARE rownum int DEFAULT 0;    DECLARE done int DEFAULT 0;    DECLARE currenturl VARCHAR(250) CHARACTER SET utf8;    DECLARE Q1,Q2,Q3,Q4 float DEFAULT 0.0;    DECLARE allurls CURSOR FOR SELECT DISTINCT url FROM rawdata;    DECLARE CONTINUE HANDLER FOR NOT FOUND SET currenturl='';    OPEN allurls;    FETCH allurls INTO currenturl;    WHILE currenturl<>'' DO        SELECT COUNT(*) INTO numrows FROM rawdata WHERE url=currenturl;        SET qrows=FLOOR(numrows/4);        if qrows>0 THEN            -- Only session parameters can be recalculated inside a query,            -- so @rownum:=@rownum+1 will work, but rownum:=rownum+1 will not.            SET @rownum=0;            SELECT                SUM(IFNULL(QA,0))/qrows,                 SUM(IFNULL(QB,0))/qrows,                 SUM(IFNULL(QC,0))/qrows,                 SUM(IFNULL(QD,0))/qrows            FROM (                SELECT                     if(@rownum<qrows,time,0) AS QA,                    if(@rownum>=qrows AND @rownum<2*qrows,time,0) AS QB,                    -- the middle 0-3 rows are left out                     if(@rownum>=(numrows-2*qrows) AND @rownum<(numrows-qrows),time,0) AS QC,                    if(@rownum>=(numrows-qrows),time,0) AS QD,                    @rownum:=@rownum+1 AS dummy                FROM rawdata                WHERE url=currenturl ORDER BY time            ) AS baseview            INTO Q1,Q2,Q3,Q4            ;            REPLACE INTO quartiles values (currenturl,Q1,Q2,Q3,Q4);        END IF;        FETCH allurls INTO currenturl;    END WHILE;    CLOSE allurls;END ;;DELIMITER ;

The main points being:

  • Use a cursor to cycle the URLs (or adapt the sample to accept the URL as a parameter)
  • For every URL find the total number of rows
  • Do some trivial math to leave out the middle rows, if (rowcount % 4) != 0
  • select all raw rows for the URL, assigning the value of time to one of QA-QD, depending on the row number, assigning the other Qx the value 0
  • Use this query as a subquery to another one, which sums up and normalizes the values
  • Use the results of this superquery to update quartiles table

I tested this with 18432 raw rows, url=concat('http://.../',floor(rand()*10)), time=round(rand()*10000) on a 8x1.9GHz machine and it finished consistently in 0.50-0.54sec


how about this ?

prepare stmt from select concat('select * from test where a="a" LIMIT ',@of,@len);execute stmt;