HashMap在JDK1.7和JDK1.8的实现区别

Java

HashMap在JDK1.7和JDK1.8的实现区别

1、实现方式

JDK1.7中，使用的是Entry数组 + 单向链表

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
        // 省略下面代码
}

JDK1.8中，使用的是Node数组 + 单向链 + 红黑树

 transient Node<K,V>[] table;
 
 static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
    // 省略下面代码
}

虽然结点名字不一样，但是从源码上看，是一样的。

2、新节点插入到链表是的插入顺序不同

JDK7插入在头部，JDK8插入在尾部

JDK7中的插入方法:

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }
    createEntry(hash, key, value, bucketIndex);
}
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

其中，bucketIndex即为当前元素通过hash算法得出的在Entry数组的位置，元素e就是链表的第一个元素。Entry构造函数中，e参数位置会被赋值到next属性上。然后把当前table的位置放上新建的Entry，也就是插入在链表的头部了。据说是考虑到热点数据的原因，即最近插入的元素也很可能最近会被使用到。所以为了缩短链表查找元素的时间，所以每次都会将新插入的元素放到表头。

JDK8中，是将元素放在链表尾部。putVal方法的插入部分实现

if ((e = p.next) == null) {
    p.next = newNode(hash, key, value, null);
    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
        treeifyBin(tab, hash);
    break;
}

我们知道jdk8链表大于8的时候就要树化，本身就要算这个链表的长度有多大，链表算大小就要一个个去遍历了，遍历到了才能知道数量，也可以直接把数据放到后面了，这样也是很方便，减少了把数据移动到头数组位置这一步，效率也提高了。而且，JDK7中出现的那个HashMap死循环bug不会有了，因为这里是只是平移，没有调换顺序。

如果元素超过了阈值8，则转换成红黑树。

JDK8的hash算法有所简化

JDK7默认初始化大小16，加载因子0.75。如果传入了size，会变为大于等于当前值的2的n次方的最小的数

// Find a power of 2 >= toSize
int capacity = roundUpToPowerOf2(toSize);

为什么是2次方数？因为indexFor方法的时候，h&(length-1),length是2的次方，那么length-1总是00011111等后面都是1的数，h&它之后，其实就相当于取余，与的效率比取余高，所以用了这种方式达到高效率。

/**
 * Returns index for hash code h.
 */
static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

另外在hash函数里面方法里面，数会经过右移，为什么要右移？因为取余操作都是操作低位，hash碰撞的概率会提高，为了减少hash碰撞，右移就可以将高位也参与运算，减少了hash碰撞。

/**
 * Retrieve object hash code and applies a supplemental hash function to the
 * result hash, which defends against poor quality hash functions.  This is
 * critical because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

JDK8的右移操作就比较简单

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

JDK8中我们知道用的是链表过度到红黑树，效率会提高，所以JDK8提高查询效率的地方由红黑树去实现，没必要像JDK7那样右移那么复杂

JDK8扩容机制有所优化

在JDK8中，也是通过取余找到元素所在的数组的位置的，JDK8里面取余的方式在putVal里面
i = (n - 1) & hash

if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

我们假设，在扩容之前，key取余之后留下了n位。扩容之后，容量变为2倍，所以key取余得到的就有n+1位。在这n+1位里面，如果第1位是0，那么扩容前后这个key的位置还是在相同的位置（因为hash相同，并且余数的第1位是0，和之前n位的时候一样，所以余数还是一样，位置就一样了）；如果这n+1位的第一位是1，那么就和之前的不同，那么这个key就应该放在之前的位置再加上之前整个数组的长度的位置，也就是这段代码

hiTail.next = null;
newTab[j + oldCap] = hiHead;

这样子就减少了移动所有数据带来的消耗

JDK7中并发情况下的环形链表问题

待补充

如果觉得文章对你有用，请随意赞赏

框架

HashMap在JDK1.7和JDK1.8的实现区别

https://www.cychee.cn/archives/hashmap-zai-jdk17-he-jdk18-de-shi-xian-qu-bie

作者

小蓝同学

发布于

2022-02-25

更新于

2022-03-01

许可协议

CC BY 4.0