問題描述
我有這個文件表(這里是簡化版):
I have this table for documents (simplified version here):
id | 轉 | 內容 |
---|---|---|
1 | 1 | ... |
2 | 1 | ... |
1 | 2 | ... |
1 | 3 | ... |
如何為每個 ID 選擇一行并且只選擇最大的轉速?
有了上面的數據,結果應該包含兩行:[1, 3, ...]
和 [2, 1, ..]
.我正在使用 MySQL.
How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...]
and [2, 1, ..]
. I'm using MySQL.
目前我在 while
循環中使用檢查來檢測和覆蓋結果集中的舊版本.但這是達到結果的唯一方法嗎?沒有SQL解決方案嗎?
Currently I use checks in the while
loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?
推薦答案
乍一看...
您只需要一個帶有 MAX
聚合函數的 GROUP BY
子句:
At first glance...
All you need is a GROUP BY
clause with the MAX
aggregate function:
SELECT id, MAX(rev)
FROM YourTable
GROUP BY id
事情從來沒有這么簡單,是嗎?
我剛剛注意到您還需要 content
列.
這是 SQL 中一個非常常見的問題:在每個組標識符的列中找到具有最大值的行的整個數據.在我的職業生涯中,我聽到了很多.實際上,這是我在當前工作的技術面試中回答的問題之一.
This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.
實際上,Stack Overflow 社區創建了一個標簽來處理這樣的問題非常普遍:greatest-n-per-group.
It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.
基本上,您有兩種方法可以解決該問題:
Basically, you have two approaches to solve that problem:
在這種方法中,您首先在子查詢中找到 group-identifier, max-value-in-group
(上面已經解決了).然后你將你的表加入到子查詢中,在 group-identifier
和 max-value-in-group
上相等:
In this approach, you first find the group-identifier, max-value-in-group
(already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier
and max-value-in-group
:
SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
SELECT id, MAX(rev) rev
FROM YourTable
GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev
與自身左連接,調整連接條件和過濾器
在這種方法中,您將表與自身分開.group-identifier
中的平等.然后,2個聰明的動作:
Left Joining with self, tweaking join conditions and filters
In this approach, you left join the table with itself. Equality goes in the group-identifier
. Then, 2 smart moves:
- 第二個連接條件是左邊的值小于右邊的值
- 當您執行第 1 步時,實際具有最大值的行將在右側具有
NULL
(這是一個LEFT JOIN
,還記得嗎?).然后,我們過濾連接的結果,只顯示右側為NULL
的行.
- The second join condition is having left side value less than right value
- When you do step 1, the row(s) that actually have the max value will have
NULL
in the right side (it's aLEFT JOIN
, remember?). Then, we filter the joined result, showing only the rows where the right side isNULL
.
所以你最終得到:
SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;
結論
兩種方法都帶來了完全相同的結果.
Conclusion
Both approaches bring the exact same result.
如果您有兩行 max-value-in-group
用于 group-identifier
,那么這兩種方法的結果中都會包含這兩行.
If you have two rows with max-value-in-group
for group-identifier
, both rows will be in the result in both approaches.
這兩種方法都與 SQL ANSI 兼容,因此,無論其風格"如何,都可以與您最喜歡的 RDBMS 一起使用.
Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".
這兩種方法也是性能友好的,但是您的里程可能會有所不同(RDBMS、DB 結構、索引等).因此,當您選擇一種方法而不是另一種方法時,基準.并確保您選擇對您最有意義的那個.
Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.
這篇關于SQL 僅選擇列上具有最大值的行的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!