問題描述
hash
方法狀態的Java doc,
Java doc of hash
method states,
檢索對象散列碼并將補充散列函數應用于結果散列,以防止劣質散列函數.這很關鍵,因為 HashMap 使用長度為二的冪的哈希表,否則會遇到低位沒有差異的 hashCode 的沖突.
Retrieve object hash code and applies a supplemental hash function to the result hash, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.
我無法理解的是,
1) 為什么 HashMap 使用長度為二的冪的哈希表?
1) Why HashMap uses power-of-two length hash tables?
在聲明表時也有說明:
/**
* The table, resized as necessary. Length MUST Always be a power of two.
*/
transient Entry<K,V>[] table;
為什么會有這個限制?
2) 否則會遇到低位相同的 hashCode 沖突. 是什么意思?
推薦答案
hashmap 的目的是快速縮小在搜索特定鍵時需要查看的對象數量(理想情況下為 0 或 1).
The purpose of a hashmap is to very quickly narrow down how many objects you need to look at (ideally 0 or 1) when searching for a specific key.
HashMap.get(key)
的一般方法如下:
調用
key.hashCode()
以獲取表示對象的單個整數.
Call
key.hashCode()
to get a single integer that represents the object.
查看基于該哈希碼的哈希桶",其中可以包含零個或多個條目.
Look in a hash "bucket" based on that hashcode, which can contain zero or more entries.
遍歷bucket中的每一個entry,看看是否有entry的key是.equals(key)
.如果是這樣,請將其退回.如果存儲桶中沒有條目與搜索的條目具有相同的鍵,則返回 null.
Go through each entry in the bucket and find if any entry's key is .equals(key)
. If so, return it. If no entry in the bucket has an equal key to the one searched for, return null.
good hashmap 和 bad hashmap 的區別在于速度.您必須平衡所有這三個問題:
The difference between a good hashmap and a bad hashmap is speed. You have to balance all three of these concerns:
您可以多快將密鑰轉換為哈希碼?
How quickly can you transform the key into a hashcode?
兩個不同的鍵映射到同一個哈希碼的頻率如何?
How often do two different keys map to the same hashcode?
您多久會將具有不同哈希碼的兩個密鑰放入同一個桶"中?
How often will you put two keys with different hashcodes into the same "bucket"?
Java 的設計者選擇了一組他們認為最平衡的折衷方案.沒有正確的答案,但您必須選擇一種特定的方法,并在文檔中寫入該方法是什么.
Java's designers have chosen a set of tradeoffs they think balances best. There is no right answer, but you do have to choose a specific approach, and write into the documentation what that approach is.
Java 的設計者可能有一些基于添加到哈希圖中的典型數據的統計證據.
Java's designers likely have some statistic evidence based on typical data added to hashmaps.
他們選擇通過提取哈希碼的最低 n 位來將哈希碼轉換為桶,因為這些位的變化比高位更頻繁.他們選擇提取位而不是另一種將哈希碼轉換為桶的典型方法(除以素數后的整數余數),因為在 Java 最常部署到的平臺上,這通常是一種更快的操作.
They chose to convert hashcode to bucket by extracting the lowest n bits of the hashcode, because those vary more often than the upper bits. They chose extracting bits over another typical method of converting hashcode to bucket (integer remainder after dividing by a prime number) because it's typically a faster operation on the platforms Java is most commonly deployed to.
Java 的設計者可能發現,第 1 步,hashCode()
的實現,是由 Java 用戶編寫的,而且通常很糟糕,為他們想要的許多對象返回相同的 hashCode存儲在同一個哈希圖中.想象一下,如果 hashCode 是這樣的:
What Java's designers may have found is that step 1, the implementation of hashCode()
, is written by Java users, and can often be terrible, returning the same hashCode for lots of objects they want to store in the same hashmap. Imagine if the hashCode was this:
public class MyInteger {
final int i;
public MyInteger(int i) {
this.i = i;
}
public int hashCode() {
return i << 24; // will return 0x00000000, 0x01000000, etc.
}
public boolean equals(Object o) {
return (o != null) && (o instanceof MyInteger) && ((MyInteger)o).i == i;
}
}
這就是他們所說的劣質";哈希碼的低位變化不大.在這種病態的實現中,低 24 位根本沒有變化!
This is what they call "poor quality"; the lower bits of the hashcode don't vary very much. In this pathological implementation, the lower 24 bits don't vary at all!
在這種情況下,對于任何小于 16,777,216 個桶的 hashmap,可以進入 hashmap 的每個鍵都將轉到桶 0.其他 16,777,215 個桶將為空.
In this case, for hashmaps any smaller than 16,777,216 buckets, every single key that could go in the hashmap will go to bucket 0. The other 16,777,215 buckets will be empty.
其他人的哈希碼可能沒有這么糟糕,但它們已經夠糟糕了,以至于 Java 的設計者選擇添加第二個哈希碼來幫助提高兩個不同鍵進入兩個不同存儲桶的機會,從而減少對象的數量每次檢索給定鍵時都需要檢查是否相等.
Other people's hashcodes may not be as bad as this, but they're bad enough that Java's designers chose to add a second hashcode to help improve the chance that two different keys will go into two different buckets, thus reducing how many objects need to be checked for equality each time a given key is retrieved.
這篇關于為什么要在 HashMap 中使用哈希方法的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!