一、什么是哈希表
JDK1.8之前,HashMap 是通过数组和链表来实现的,当发生 hash 冲突时,将 hashcode 相同的值放在一个链表里,但是当链表里元素很多时,查找的效率会比较低。
JDK1.8 之后,HashMap采用数组+链表+红黑树来实现,当链表长度超过 8 时,将链表转换成红黑树,这样可以提高查找效率。
/**
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
*
- hashMap 是 Map 接口的实现,拥有所有 map 的功能,与 hashtable 不同的是,他不是线程安全的,而且是可以存放null 值的。
-
他既不保证存放的顺序,也不保证顺序一直不变。
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
- HashMap 在正常散列的情况下,put 和 get 操作耗时是恒定的。对 hashmap 的迭代时间,与他的 capacity 和大小有关,因此如果看重遍历性能,一开始就不能把初始的容量设置太高,或者负载因子设置太低。
* <p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
-
初始容量和负载因子,是影响 hashmap 性能的两个参数。
capacity 容量是 hashtable 的桶位数,负载因子 load factor 是衡量在 capacity 扩容之前的充满度。 -
当 hashMap 的键值对超过capacity 和 load factor 的乘积时,代表这个 hashmap 过满,此时就会发生 rehash,扩容到原有桶位的两倍。
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
*
- 默认的负载因子为 0.75 ,在时间和空间之间提供了良好的平衡。过高的话减少了空间成本,但是增加了查找时间成本。在设置初始 capacity 的时候,应该考虑预期的键值对数量和负载因子,这样就可以减少 rehash 的次数。
* <p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
- 如果要存很多的映射,那么创建时就应该选择足够大的 capacity,这样会比让他自动rehash 更有效率。
- 当 key 的 hashcode 相同时,会出现 hash 碰撞降低性能。当 key 是可排序比较的时候,hashmap 会使用排序来减少 hash 碰撞。
* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
*
-
HashMap 不是线程同步的。当多个线程同时获取一个 hashmap 时,并且至少一个线程修改其结构时,必须要在外部进行同步处理。这通常通过在自然封装map的某个对象上进行同步来实现。
-
结构化修改指的是增加或者删除映射。仅仅修改一个 key 的 value 不属于此类。
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new HashMap(...));</pre>
*
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
*
-
如果没有这样的对象,那么应该用synchronizedMap方法在创建时进行包装。
-
所有这个类的集合视图方法返回的迭代器都是 fail-fast:如果这个 map 在 iterator 创建后被结构化的修改,且不是通过iterator自带的 remove 方法,那么 iterator 将会抛出 ConcurrentModificationException。
- 因此,在并发修改的情况下,迭代器快速而干净地失败,比在未来的未确定时间冒着任意的,非确定性行为的风险要好的多。
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
*
- 但是 fail-fast 机制不能保证一定会出现该错误,他只能尽力抛出异常,所以不能依赖此异常去编写程序,fail-fast 应该只用于检测 bug。
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
*
- HashMap 默认是一个由桶位 bucket 组成的哈希表,但是当bin 太大时,将会被转换为Treenode 的 bin,也就是树节点。大部分方法还是使用正常的 bin,适时的时候会转换到 treenode。TreeNodes的Bins可以像其他任何一样遍历和使用,但是当bin过多时还支持更快的查找。
- 但是,由于大部分 bin 都不会过大,因此在table方法的过程中可能会延迟检查treenode的存在。
* Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor). The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)
*
- 树容器主要由 hashcode 排序,但是当有 hash 冲突时,如果两个元素都是继承 comparable 的,那么他们的 compareTo 方法就会用来排序。
- 当密钥具有不同的哈希值或可排序时,增加树容器的复杂性对于提供最坏情况的O(log n)操作是值得的。
-
所以就算 hash 散列分布的不均匀,或者很多 key 使用同一个 hashcdoe,只要他们是comparable,那么性能就不会下降很多。
-
如果这些都不适用,与不采取预防措施相比,我们可能在时间和空间上浪费大约两倍。
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
红黑树的插入、删除和遍历的最坏时间复杂度都是log(n)
但是因为红黑树是普通bin节点的两倍大小,所以我们只会在bin 有足够多节点的情况下去使用。当由于删除或者resize 导致 bins 变小的时候,他们会变成普通的 bin。
在具有良好分布的hashCodes的用法中,很少使用树容器。
理想情况下,在随机hashcode下,对于默认的调整阈值0.75,bin 中节点个数的频率遵循泊松分布,其参数平均约为0.5
由频率表可以看出,桶的长度超过8的概率非常非常小。所以作者应该是根据概率统计而选择了8作为阈值。
忽略方差,列表大小k的预期出现是(exp(-0.5)* pow(0.5,k)/ factorial(k))。
* The root of a tree bin is normally its first node. However,
* sometimes (currently only upon Iterator.remove), the root might
* be elsewhere, but can be recovered following parent links
* (method TreeNode.root()).
*
* All applicable internal methods accept a hash code as an
* argument (as normally supplied from a public method), allowing
* them to call each other without recomputing user hashCodes.
* Most internal methods also accept a "tab" argument, that is
* normally the current table, but may be a new or old one when
* resizing or converting.
所有适用的内部方法都接受哈希码作为参数(通常由公共方法提供),允许它们相互调用而无需重新计算用户hashCodes。 大多数内部方法也接受“tab”参数,通常是当前表,但在调整大小或转换时可能是新的或旧的。
* When bin lists are treeified, split, or untreeified, we keep
* them in the same relative access/traversal order (i.e., field
* Node.next) to better preserve locality, and to slightly
* simplify handling of splits and traversals that invoke
* iterator.remove. When using comparators on insertion, to keep a
* total ordering (or as close as is required here) across
* rebalancings, we compare classes and identityHashCodes as
* tie-breakers.
*
* The use and transitions among plain vs tree modes is
* complicated by the existence of subclass LinkedHashMap. See
* below for hook methods defined to be invoked upon insertion,
* removal and access that allow LinkedHashMap internals to
* otherwise remain independent of these mechanics. (This also
* requires that a map instance be passed to some utility methods
* that may create new nodes.)
*
* The concurrent-programming-like SSA-based coding style helps
* avoid aliasing errors amid all of the twisty pointer operations.
当bin列表被树化,拆分或未解析时,我们将它们保持在相同的相对访问/遍历顺序中以更好地保留局部性,并略微简化对调用迭代器的拆分和遍历的处理。当在插入时使用比较器时,为了保持整个重新排序的总排序(或者在这里需要尽可能接近),我们将类和identityHashCodes作为绑定器进行比较。
普通vs树模式之间的使用和转换由于子类LinkedHashMap的存在而变得复杂。请参阅下面的定义在插入,删除和访问时调用的钩子方法,这些钩子方法允许LinkedHashMap内部保持独立于这些机制。
类似于并发编程的基于SSA的编码风格有助于避免在所有扭曲指针操作中出现混叠错误。
二、类的属性
/**
* The default initial capacity - MUST be a power of two.
* 默认的 capacity,必须是 2 的幂
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
* 最大的桶位数量capacity,是 2^30
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* The load factor used when none specified in constructor.
* 默认的负载因子大小
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
* 当一个 bin 中的 node 数量大于 8 时,将会使用红黑树。
当 node 在 2~6 之间时,才可以在收缩时转换回普通的 bin
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
* 用于在resize 期间,去除树结构的 bin 中 node 的数量门槛。
*/
*为什么这里使用 6?
中间有个差值7可以防止链表和树之间频繁的转换。
假设一下,如果设计成链表个数超过8则链表转换成树结构,链表个数小
于8则树结构转换成链表,如果一个HashMap不停的插入、删除元素,链表个数在8左右徘徊,就会频繁的发生树转链表、链
表转树,效率会很低。
static final int UNTREEIFY_THRESHOLD = 6;
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* bin 在使用红黑树时的最小 capacity。为了在 resize 和转换成红黑树时避免hash冲突,他至少应该是
TREEIFY_THRESHOLD的 4 倍大小,
*/
static final int MIN_TREEIFY_CAPACITY = 64;
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
//初始化使用的 table,长度必须是 2 次幂
* 为什么这里要用 transient 修饰?
主要原因是hashcode方法和JVM相关,在不同的操作平台有不同的实现。这样相同的键值在不同的平台计算出的哈希值
有可能不一样。解决方法就是在序列话的时候将实体的内容进行持久化而不是table域,反序列化的时候再重新构建出
HashMap对象
如果你使用默认的序列化,那么反序列化后,元素的位置和之前的是保持一致的,可是由于hashCode的值不一样了,那
么定位函数indexOf()返回的元素下标就会不同
transient Node<K,V>[] table;
/**
* The number of key-value mappings contained in this map.
* map 中存放的键值对的数量
*/
transient int size;
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
* hashmap 被结构化修改的次数,被用于 fail-fast 机制
*/
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
* 触发 resize 的临界值 当 map中的元素大于这个值时,触发 resize
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
/**
* The load factor for the hash table.
* hashtable 的负载因子
* @serial
*/
final float loadFactor;
二、重要方法分析
1、hash
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
- 这个方法是用来计算 hashcode 并且将更高位的 hash 通过异或运算降低
- 因为 hashtable的长度是 2 次幂,用(n - 1) & hash计算下标容易产生碰撞
- 因此,我们综合速度、效率和质量,将高 16 位 bit 和低 16 位 bit 异或了一下。
- 因为现在大多数的 hash 算法已经分布的不错了,因为会用树去处理 hash 碰撞
- 仅仅异或一下,既减少了系统的开销,也不会造成的因为高位没有参与下标的计算(table长度比较小时),从而引起的碰撞。
2、put & putVal( )
hashMap 的 put 方法调用的是 putVal()方法
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// 步骤①:tab为空则创建
// table未初始化或者长度为0,进行扩容
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// 步骤②:计算index,获取 table 中 index 位置的 node p,并判空
// (n - 1) & hash 确定元素存放在哪个桶中,如果桶为空,新生成结点放入桶中(此时,这个结点是放在数组中)
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// 桶中已经存在元素
else {
Node<K,V> e; K k;
// 步骤③:节点key存在,直接覆盖value ,此时 p 存的是tab[index]中的元素
// 比较桶中第一个元素(数组中的结点)的hash值相等,key相等
//如果 hash 和 key 都相等,那么就赋值给 节点 e
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
// 将第一个元素赋值给e,用e来记录
e = p;
// 步骤④:通过 p 来判断该链为红黑树
// hash值不相等,即key不相等;为红黑树结点
else if (p instanceof TreeNode)
// 放入树中
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// 步骤⑤:该链为链表
// 为链表结点
else {
// 在链表最末插入结点
for (int binCount = 0; ; ++binCount) {
// 到达链表的尾部
//这里其实分两步,第一步是给 e 赋值为 p.next,然后再判空
if ((e = p.next) == null) {
// 在尾部插入新结点
p.next = newNode(hash, key, value, null);
// 结点数量达到阈值,转化为红黑树
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
// 跳出循环
break;
}
// 判断链表中结点的key值与插入的元素的key值是否相等
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
// 相等,跳出循环
break;
// 用于遍历桶中的链表,与前面的e = p.next组合,可以遍历链表
p = e;
}
}
//对第 3 步中找到的已经存在的 node 进行判空,如果不为空,则 putVal()方法返回 oldValue
// 表示在桶中找到key值、hash值与插入元素相等的结点
if (e != null) {
// 记录e的value
V oldValue = e.value;
// onlyIfAbsent为false或者旧值为null
if (!onlyIfAbsent || oldValue == null)
//用新值替换旧值
e.value = value;
// 访问后回调
afterNodeAccess(e);
// 返回旧值
return oldValue;
}
}
// 结构性修改
++modCount;
// 步骤⑥:超过最大容量 就扩容
// 实际大小大于阈值则扩容
if (++size > threshold)
resize();
// 插入后回调
afterNodeInsertion(evict);
return null;
}
综上,hashmap 的存储原理为:
- 根据key计算得到key.hash = (h = k.hashCode()) ^ (h >>> 16);
-
根据key.hash计算得到桶数组的索引index = key.hash & (table.length - 1),这样就找到该key的存放位置了:
- 如果该位置没有数据,用该数据新生成一个节点保存新数据,返回null;
- 如果该位置有数据,
- 如果该 key 存在,先将原来的 node 值赋值给e,到最后用新 value覆盖原来的 value,返回 oldValue
- 如果 key 不相等,说明是新 node,
- 如果 p 节点类型为红黑树,则插入到红黑树中
- 如果 p 节点类型为链表,则在链表尾部插入节点,如果节点数大于 8,则转换成红黑树
- 如果映射数量大于门槛值,那么触发 resize
3、get & getNode( )
HashMap 的 get 方法调用的是 getNode 方法
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
//新建一个 node 数组 tab,第一个节点 first
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
//当 table 已经初始化,tab 长度大于 0,
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
//每次都先检查第一个节点,如果查找的 hash、key 和first 相同,直接返回 first
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
//从 first 的下一个节点开始查找
if ((e = first.next) != null) {
//在红黑树中进行查找
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
//在链表中进行查找
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);//用 e.next 进行循环遍历
}
}
return null;
}
4、resize( )
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
* 初始化或者将 table 容量翻倍。如果 table 是 null 的话,就根据初始化的 capacity 进行分配
* 因为使用的是 2 次幂的扩展,所以每个桶位的元素要么不动,要么在原位置再移动2次幂的位置。
* @return the table
*
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
//获取原 table 的 capacity 和 threshold
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
//如果原来的 hashmap 不为空
if (oldCap > 0) {
//如果hashmap 大小超过了最大 capacity
if (oldCap >= MAXIMUM_CAPACITY)
//将 threshold 赋值为最大值,并返回原 table
threshold = Integer.MAX_VALUE;
return oldTab;
}
//如果当前hash桶数组的长度在扩容后仍然小于最大容量 并且oldCap大于默认值16
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold 那就将 threshold 翻倍
}
//当原来的 table 为空的时候,oldThr 有值就赋值给 newcap, 然后 newThr 在后面赋值。不然的话使用默认值
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
//设置 threshold
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];//新建 hash 数组
table = newTab;//将新数组的值赋值给旧数组
if (oldTab != null) {//进行扩容操作,复制Node对象值到新的hash桶数组
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {//如果旧数组节点不为空,赋值给 e
oldTab[j] = null;//将旧的hash桶数组在j结点处设置为空,方便gc
if (e.next == null)
//直接对e的hash值对新的数组长度求模获得存储位置
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)//如果e是红黑树的类型,那么添加到红黑树中
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order 保存顺序
// 进行链表复制
// 方法比较特殊: 它并没有重新计算元素在数组中的位置
// 而是采用了 原始位置加原数组长度的方法计算得到位置
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;//将Node结点的next赋值给next
//如果结点e的hash值与原hash桶数组的长度作与运算为0
//仅仅是判断元素是否需要换位置,不要理解为元素的新位置
if ((e.hash & oldCap) == 0) {
if (loTail == null)//如果loTail为null
loHead = e;//将e结点赋值给loHead
else
loTail.next = e;//否则将e赋值给loTail.next
loTail = e;//然后将e复制给loTail
}
//如果结点e的hash值与原hash桶数组的长度作与运算不为0
else {
if (hiTail == null)//如果hiTail为null
hiHead = e;//将e赋值给hiHead
else
hiTail.next = e;//如果hiTail不为空,将e复制给hiTail.next
hiTail = e;//将e复制个hiTail
}
} while ((e = next) != null);//直到e为空
if (loTail != null) {//如果loTail不为空
loTail.next = null;//将loTail.next设置为空
newTab[j] = loHead;//将loHead赋值给新的hash桶数组[j]处
}
if (hiTail != null) {//如果hiTail不为空
hiTail.next = null;//将hiTail.next赋值为空
newTab[j + oldCap] = hiHead;//将hiHead赋值给新的hash桶数组[j+旧hash桶数组长度]
}
}
}
}
}
return newTab;
}
可以看到,扩容前和扩容后的位置是否一样完全取决于多出来的那一位是与新数组计算后值是为0还是1.为0则新位置与原位置相同,不需要换位置,不为零则需要换位置。而为什么新的位置是原位置+原数组长度,是因为每次换的位置只是前面多了一个1而已。为什么是前面多1,因为数组扩容为原来的两倍也是高位进1,比如15是0000 1111,而31就是0001 1111. 那么新位置的变化的高位进1位。而每一次高位进1都是在加上原数组长度的过程。
n = (tab = resize()).length;
(n - 1) & hash;//这一步是计算下标值的
扩容之后,n 从16 变成 32,n-1 从 15 变成 31
所以计算下标的时候,在&运算的时候,就是最高位多了一个 1
所以当 hash 的最高位为 0 的时候,&运算结果和原来相同,因此位置不变
当 hash 最高位为 1 的时候,&运算结果比以前要多一个 1,所以加上了原来数组的长度,为现在的新位置
为什么不重新去调用 native hashCode?
5、回调方法
// Callbacks to allow LinkedHashMap post-actions
//这几个方法在putVal 中用到了,这个三个方法都是为了继承HashMap的LinkedHashMap类服务的。
这个三个方法都是为了继承HashMap的LinkedHashMap类服务的。
void afterNodeAccess(Node<K,V> p) { } //通过before、after重定向,将新访问节点链接为链表尾节点
void afterNodeInsertion(boolean evict) { } //用来回调移除最早放入Map的对象
void afterNodeRemoval(Node<K,V> p) { } //就是在删除后处理其对应链表前后关系(刨掉一截)