MySQL "Group By" and "Order By" MySQL "Group By" and "Order By" mysql mysql

MySQL "Group By" and "Order By"


A simple solution is to wrap the query into a subselect with the ORDER statement first and applying the GROUP BY later:

SELECT * FROM (     SELECT `timestamp`, `fromEmail`, `subject`    FROM `incomingEmails`     ORDER BY `timestamp` DESC) AS tmp_table GROUP BY LOWER(`fromEmail`)

This is similar to using the join but looks much nicer.

Using non-aggregate columns in a SELECT with a GROUP BY clause is non-standard. MySQL will generally return the values of the first row it finds and discard the rest. Any ORDER BY clauses will only apply to the returned column value, not to the discarded ones.

IMPORTANT UPDATESelecting non-aggregate columns used to work in practice but should not be relied upon. Per the MySQL documentation "this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."

As of 5.7.5 ONLY_FULL_GROUP_BY is enabled by default so non-aggregate columns cause query errors (ER_WRONG_FIELD_WITH_GROUP)

As @mikep points out below the solution is to use ANY_VALUE() from 5.7 and above

Seehttp://www.cafewebmaster.com/mysql-order-sort-grouphttps://dev.mysql.com/doc/refman/5.6/en/group-by-handling.htmlhttps://dev.mysql.com/doc/refman/5.7/en/group-by-handling.htmlhttps://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_any-value


Here's one approach:

SELECT cur.textID, cur.fromEmail, cur.subject,      cur.timestamp, cur.readFROM incomingEmails curLEFT JOIN incomingEmails next    on cur.fromEmail = next.fromEmail    and cur.timestamp < next.timestampWHERE next.timestamp is nulland cur.toUserID = '$userID' ORDER BY LOWER(cur.fromEmail)

Basically, you join the table on itself, searching for later rows. In the where clause you state that there cannot be later rows. This gives you only the latest row.

If there can be multiple emails with the same timestamp, this query would need refining. If there's an incremental ID column in the email table, change the JOIN like:

LEFT JOIN incomingEmails next    on cur.fromEmail = next.fromEmail    and cur.id < next.id


As pointed in a reply already, the current answer is wrong, because the GROUP BY arbitrarily selects the record from the window.

If one is using MySQL 5.6, or MySQL 5.7 with ONLY_FULL_GROUP_BY, the correct (deterministic) query is:

SELECT incomingEmails.*  FROM (    SELECT fromEmail, MAX(timestamp) `timestamp`    FROM incomingEmails    GROUP BY fromEmail  ) filtered_incomingEmails  JOIN incomingEmails USING (fromEmail, timestamp)GROUP BY fromEmail, timestamp

In order for the query to run efficiently, proper indexing is required.

Note that for simplification purposes, I've removed the LOWER(), which in most cases, won't be used.