問(wèn)題描述
從 Java 8 開(kāi)始,hashMap 稍作修改,如果同一存儲(chǔ)桶上有超過(guò) 8 個(gè) (TREEIFY_THRESHOLD=8) 項(xiàng),則 hashMap 具有平衡樹(shù)而不是鏈表.選擇 8 有什么理由嗎?
From Java 8, the hashMap modified slightly to have balanced tree instead of linkedlist if more than 8 (TREEIFY_THRESHOLD=8) items on same bucket. is there any reason choosing 8?
如果是 9 會(huì)影響性能嗎?
would it impact the performance in case it is 9?
推薦答案
使用平衡樹(shù)而不是鏈表是一種權(quán)衡.在列表的情況下,必須執(zhí)行線性掃描以在存儲(chǔ)桶中執(zhí)行查找,而樹(shù)允許日志時(shí)間訪問(wèn).當(dāng)列表很小時(shí),查找速度很快,并且使用樹(shù)實(shí)際上并沒(méi)有提供任何好處,而大約 8 個(gè)左右的元素在列表中查找的成本變得足夠顯著,以至于樹(shù)提供了加速.
The use of a balanced tree instead of a linked-list is a tradeoff. In the case of a list, a linear scan must be performed to perform a lookup in a bucket, while the tree allows for log-time access. When the list is small, the lookup is fast and using a tree doesn't actually provide a benefit while around 8 or so elements the cost of a lookup in the list becomes significant enough that the tree provides a speed-up.
我懷疑樹(shù)的使用是針對(duì)密鑰哈希被災(zāi)難性破壞(例如許多密鑰沖突)的例外情況;雖然線性查找會(huì)導(dǎo)致性能嚴(yán)重下降,但使用樹(shù)可以緩解這種情況性能有所損失,如果鍵可直接比較.
I suspect that the use of a tree is intended for the exceptional case where the key hash is catastrophically broken (e.g. many keys collide); while a linear lookup will cause performance to degrade severely the use of a tree mitigates this performance loss somewhat, if the keys are directly comparable.
因此,8 個(gè)條目的確切閾值可能不是非常重要:假設(shè)良好的密鑰分布,樹(shù)箱的機(jī)會(huì)是 0.00000006,因此在這種情況下顯然很少使用樹(shù)箱.當(dāng)哈希算法災(zāi)難性地失敗時(shí),存儲(chǔ)桶中的鍵數(shù)無(wú)論如何都遠(yuǎn)大于 8.
Therefore, the exact threshold of 8 entries may not be terribly significant: the chance of a tree bin is 0.00000006 assuming good key distribution, so tree bins are obviously used very rarely in such a case. When the hash algorithm is failing catastrophically, then the number of keys in the bucket is far greater than 8 anyway.
這會(huì)帶來(lái)空間損失,因?yàn)闃?shù)節(jié)點(diǎn)必須包含額外的引用:四個(gè)對(duì)樹(shù)節(jié)點(diǎn)的引用和一個(gè)布爾值除了 LinkedHashMap.Entry
(見(jiàn) 它的來(lái)源).
This comes at a space penalty since the tree-node must include additional references: four references to tree nodes and a boolean in addition to the fields of a LinkedHashMap.Entry
(see its source).
來(lái)自 HashMap類源碼中的注釋:
因?yàn)?TreeNode 的大小大約是常規(guī)節(jié)點(diǎn)的兩倍,我們僅當(dāng) bin 包含足夠的節(jié)點(diǎn)以保證使用時(shí)才使用它們(參見(jiàn) TREEIFY_THRESHOLD).當(dāng)它們變得太小時(shí)(由于刪除或調(diào)整大小)它們被轉(zhuǎn)換回普通垃圾箱.在使用分布良好的用戶哈希碼,樹(shù)箱是很少使用.理想情況下,在隨機(jī)哈希碼下,箱中的節(jié)點(diǎn)遵循泊松分布(http://en.wikipedia.org/wiki/Poisson_distribution)默認(rèn)調(diào)整大小的平均參數(shù)約為 0.50.75 的閾值,盡管有很大的差異,因?yàn)檎{(diào)整粒度.忽略方差,預(yù)期列表大小 k 的出現(xiàn)次數(shù)為 (exp(-0.5) * pow(0.5, k)/階乘(k)).
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)).
這篇關(guān)于Java Hashmap 中有什么理由在 TREEIFY_THRESHOLD 上有 8 個(gè)嗎?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!