問題描述
有一個表 messages
包含如下所示的數據:
There is a table messages
that contains data as shown below:
Id Name Other_Columns
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1
如果我運行一個查詢select * from messages group by name
,我會得到如下結果:
If I run a query select * from messages group by name
, I will get the result as:
1 A A_data_1
4 B B_data_1
6 C C_data_1
什么查詢會返回以下結果?
What query will return the following result?
3 A A_data_3
5 B B_data_2
6 C C_data_1
即返回每組最后一條記錄.
That is, the last record in each group should be returned.
目前,這是我使用的查詢:
At present, this is the query that I use:
SELECT
*
FROM (SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY name
但這看起來效率極低.還有其他方法可以達到相同的結果嗎?
But this looks highly inefficient. Any other ways to achieve the same result?
推薦答案
MySQL 8.0 現在支持窗口函數,就像幾乎所有流行的 SQL 實現一樣.使用這種標準語法,我們可以編寫最大 n-per-group 查詢:
MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:
WITH ranked_messages AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
FROM messages AS m
)
SELECT * FROM ranked_messages WHERE rn = 1;
以下是我在 2009 年為這個問題寫的原始答案:
Below is the original answer I wrote for this question in 2009:
我是這樣寫解決方案的:
I write the solution this way:
SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;
關于性能,一種解決方案或另一種解決方案可能更好,具體取決于數據的性質.因此,您應該測試這兩個查詢,并使用給定數據庫性能更好的查詢.
Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.
例如,我有一份 StackOverflow 八月數據轉儲的副本.我將使用它進行基準測試.Posts
表中有 1,114,357 行.這是在我的 Macbook Pro 2.40GHz 上的 MySQL 5.0.75 上運行的.
For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts
table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.
我將編寫一個查詢來查找給定用戶 ID(我的)的最新帖子.
I'll write a query to find the most recent post for a given user ID (mine).
首先使用技術顯示 by @Eric 在子查詢中使用 GROUP BY
:
First using the technique shown by @Eric with the GROUP BY
in a subquery:
SELECT p1.postid
FROM Posts p1
INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
FROM Posts pi GROUP BY pi.owneruserid) p2
ON (p1.postid = p2.maxpostid)
WHERE p1.owneruserid = 20860;
1 row in set (1 min 17.89 sec)
即使是 EXPLAIN
分析 需要超過 16 秒:
Even the EXPLAIN
analysis takes over 16 seconds:
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
| 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
| 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
3 rows in set (16.09 sec)
現在使用 我的技術與LEFT JOIN
:
Now produce the same query result using my technique with LEFT JOIN
:
SELECT p1.postid
FROM Posts p1 LEFT JOIN posts p2
ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;
1 row in set (0.28 sec)
EXPLAIN
分析表明兩個表都能夠使用它們的索引:
The EXPLAIN
analysis shows that both tables are able to use their indexes:
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
| 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
2 rows in set (0.00 sec)
這是我的 Posts
表的 DDL:
CREATE TABLE `posts` (
`PostId` bigint(20) unsigned NOT NULL auto_increment,
`PostTypeId` bigint(20) unsigned NOT NULL,
`AcceptedAnswerId` bigint(20) unsigned default NULL,
`ParentId` bigint(20) unsigned default NULL,
`CreationDate` datetime NOT NULL,
`Score` int(11) NOT NULL default '0',
`ViewCount` int(11) NOT NULL default '0',
`Body` text NOT NULL,
`OwnerUserId` bigint(20) unsigned NOT NULL,
`OwnerDisplayName` varchar(40) default NULL,
`LastEditorUserId` bigint(20) unsigned default NULL,
`LastEditDate` datetime default NULL,
`LastActivityDate` datetime default NULL,
`Title` varchar(250) NOT NULL default '',
`Tags` varchar(150) NOT NULL default '',
`AnswerCount` int(11) NOT NULL default '0',
`CommentCount` int(11) NOT NULL default '0',
`FavoriteCount` int(11) NOT NULL default '0',
`ClosedDate` datetime default NULL,
PRIMARY KEY (`PostId`),
UNIQUE KEY `PostId` (`PostId`),
KEY `PostTypeId` (`PostTypeId`),
KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
KEY `OwnerUserId` (`OwnerUserId`),
KEY `LastEditorUserId` (`LastEditorUserId`),
KEY `ParentId` (`ParentId`),
CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
) ENGINE=InnoDB;
評論者注意:如果您想使用不同版本的 MySQL、不同的數據集或不同的表設計進行另一個基準測試,請自行完成.我已經展示了上面的技術.Stack Overflow 是為了向您展示如何進行軟件開發工作,而不是為您完成所有工作.
這篇關于檢索每組中的最后一條記錄 - MySQL的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!