集合深度学习09—HashMap源码解析

admin • 2022-08-29 12:33 • 人工智能

一、HashMap的原理简单介绍

HashMap的对象：存储的是双列数据，键值对 key - value

键相同，则哈希值相同，则直接替换值，返回原值
键不同，如果计算后哈希值也相同了。则链表法。
在遇到哈希冲突时，使用链表法：7上8下

JDK7及之前是插在头部
JDK8及之后是插在尾部

二、HashMap的重要属性

  		static final int DEFAULT_INITIAL_CAPACITY = 16;//哈希表主数组的默认长度
       
       //定义了一个float类型的变量，以后作为：默认的负载因子
       			 //负载因子是表示Hsah表中元素的填满的程度
      				 //太大容易引起哈西冲突，太小容易浪费  0.75是经过大量运算后得到的最好值
       //这个值其实可以自己改，但是不建议改，因为这个0.75是大量运算得到的
       static final float DEFAULT_LOAD_FACTOR = 0.75f;
       
       transient Entry<K,V>[] table;//主数组,每个元素为Entry类型
       
       transient int size;
       
       int threshold;//数组扩容的界限值,门槛值   16*0.75=12 
       
       final float loadFactor;//用来接收装填因子的变量，上面默认是常量，不可直接修改

三、HashMap的构造器

JDK1.7

	   public HashMap() {
	        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
	    }
        //本类中带参数构造器：--》作用给一些数值进行初始化的！
        public HashMap(int initialCapacity, float loadFactor) {
        //给capacity赋值，capacity的值一定是 大于你传进来的initialCapacity 的 最小的 2的幂次
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;
        //给loadFactor赋值，将装填因子0.75赋值给loadFactor
        this.loadFactor = loadFactor;
        //数组扩容的界限值,门槛值       16 * 0.75 = 12       1<<30 + 1
        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        //给table数组赋值，初始化数组长度为16
        table = new Entry[capacity];            
    }

JDK1.8

底层：数组+链表 ——>数组+红黑树
什么时候变为红黑树？

红黑树：（非平衡二叉树，相对平衡，有序的）

每个节点是黑的或者红的

根结点是黑的

如果一个节点是红的，则它的两个儿子节点是黑的

对于每个叶子节点，则从该节点到根结点的所有路径上包含相同数目的黑节点。

链表长度>8并且数组长度>64 ，两个条件都要满足
碰撞后，七上八下，七头插法。八尾插法。单向链表会变成双向链表，为树化做准备。

3.1 Map map = new HashMap();

先来看HashMap的结构

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 初始容量
    static final int MAXIMUM_CAPACITY = 1 << 30; //最大容量
    static final float DEFAULT_LOAD_FACTOR = 0.75f; //默认负载因子，有效元素 超过 负载因子 * 容量 则扩容
    static final int TREEIFY_THRESHOLD = 8; //扩容阈值，等于  负载因子 * 容量
    static final int MIN_TREEIFY_CAPACITY = 64; //数组长度超过64 且 链表长度>=8 就树化
    transient Node<K,V>[] table; // HashMap的数组
    transient Set<Map.Entry<K,V>> entrySet; //链表中存放的结点数据类型
    transient int size;//有效元素个数
    int threshold;//扩容阈值变量
    final float loadFactor;//负载因子

    // 数组中的元素是Node类型，存   [哈希值，key，value，下一个元素地址]
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
        .....
	}
}

看看这行代码会发生什么事

把默认的负载因子赋值给负载因子变量

    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
    }

三、HashMap的 put( ) 方法

JDK1.8

放元素
存放哈希值，key，value …

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    
   static final int hash(Object key) {
        int h;
        //hashmap可以存键为null，哈希值为0,，如果不是null，会进行二次散列 + 扰动算法，减少哈希碰撞
        //hashtable不可以存
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
  }
  
	 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {         
        Node<K,V>[] tab;
        Node<K,V> p; 
        int n;//数组长度
        int i;
        //如果是空数组
        if ((tab = table) == null || (n = tab.length) == 0)
        //就调用 resize()方法，并获取它的长度
            n = (tab = resize()).length;
        //计算元素在数组中存放的位置，效果等效 i=hash%length; p=tab[i]
        //如果这个位置没有元素
        if ((p = tab[i = (n - 1) & hash]) == null)
        //就在这个位置创建一个节点
            tab[i] = newNode(hash, key, value, null);
        else {
        //如果这个位置有元素
            Node<K,V> e; K k;
            //判断当前传来的元素和存放的元素的哈希值是否相等 并且 （key 是否是同一个 或 如果key不为空，key的值是否相等）
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                //如果都满足，则是同一个元素，e指向当前位置的元素
                e = p;
            //如果结点是 TreeNode 类型    
            else if (p instanceof TreeNode)
            //添加树结点
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
            //获取到当前链表（桶）的尾元素，如果链表长度>=8，则树化，否则在尾部添加元素。（如果是JDK1.7.则在头部添加元素。这里是JDK1.8）
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                    //添加尾结点
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1)
                        //树化
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //如果存在这个键（哈希值相同并且key的值也相同，则是同一个键），则替换旧值，返回新值
            if (e != null) { 
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        //如果size到达阈值，则扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

//扩容
  final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        //如果数组为null，则旧容量赋值为0，否则赋值为当前数组容量
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        //旧的扩容阈值
        int oldThr = threshold;
        //新容量和新的扩容阈值
        int newCap, newThr = 0;
        //如果旧容量>0
        if (oldCap > 0) {
        //如果旧容量>=最大容量，则赋值为int最大值，返回旧容量
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            //新容量=旧容量*2，扩容两倍，如果小于最大容量 并且 旧容量>=默认初始化容量
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                //则新阈值也扩大2倍
                newThr = oldThr << 1; 
        }
        //如果旧阈值0>0
        else if (oldThr > 0) 
        //则新容量=旧阈值，能走到这层说明旧容量为0，旧阈值>0。 所以 数组 在 初始化时大小为旧阈值，即为 threshold 
        //而此时threshold 还是0，还无法进入这层
            newCap = oldThr;
        else {            
        //当阈值和容量都为0
        //新容量为默认初始容量  16
            newCap = DEFAULT_INITIAL_CAPACITY;
            //新阈值为 负载因子 （数组使用比例） * 默认初始容量  0.75 * 16 = 12
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // 如果新阈值为0
        if (newThr == 0) {
        // 阈值= 新容量 * 负载因子
            float ft = (float)newCap * loadFactor;
            // 新阈值 = ft 或 int 最大值         MAXIMUM_CAPACITY 1 <<< 30 
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        //阈值容量初始化为新阈值
        threshold = newThr;
        // 创建 HashMap的数组，为Node类型，初始容量为 newCap   16
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        // newTab 赋值给 table
        table = newTab;
        //如果旧表不为空
        if (oldTab != null) {
        //遍历旧表
            for (int j = 0; j < oldCap; ++j) {
                //获取旧表每个元素
                Node<K,V> e;
                //如果不为空
                //赋值给e
                if ((e = oldTab[j]) != null) {
                    //旧表该位置设为null
                    oldTab[j] = null;
                    //如果有下一个元素，新表的 计算后的位置 设为 原旧表元素
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                     //如果的TreeNode类型   
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else {
                    //loHead，下标不变情况下的链表头
                    //loTail，下标不变情况下的链表尾
                    
                    //hiHead，下标改变情况下的链表头
                    //hiTail，下标改变情况下的链表尾
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                        //设置next指针指向当前元素的下一个
                            next = e.next;
                            //如果其新位置是0
                            if ((e.hash & oldCap) == 0) {
                                //如果没有元素，就头结点赋值为e
                                //否则尾结点的下一个赋值为e
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                               //尾结点后移     
                                loTail = e;
                            }
                            else {
                            //如果不是0位置
                               //如果没有元素，就头结点赋值为e                  
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    //否则尾结点的下一个赋值为e
                                    hiTail.next = e;
                               //尾结点后移 
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        //如果有尾结点
                        if (loTail != null) {
                        //尾结点置空，新表当前位置设置Wie头结点
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        //如果尾结点不空
                        if (hiTail != null) {
                        //尾结点下一个置空
                            hiTail.next = null;
                            //新表 j+旧容量位置 设为头结点
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        /返回新表
        return newTab;
    }
// 设置容量为 大于当前容量的最小2的幂次    
 static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

//为树化做准备
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        //容量 < 最小树化容量阈值 则不树化，只扩容
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

JDK1.7

      public V put(K key, V value) {
                //【1】对空值的判断
        if (key == null)
            return putForNullKey(value);
                //【2】调用hash方法，获取哈希码
        int hash = hash(key);
                //【3】得到key对应在数组中的位置
        int i = indexFor(hash, table.length);
                //【4】如果你放入的元素，在主数组那个位置上没有值，e==null  那么下面这个循环不走
                //当在同一个位置上放入元素的时候
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
             //哈希值一样  并且  equals相比一样   
             //(k = e.key) == key  如果是一个对象就不用比较equals了
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            // 旧值替换新值
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
       //【4】走addEntry添加这个节点的方法：
        addEntry(hash, key, value, i);
        return null;
    }
        
        //【2.1】hash方法返回这个key对应的哈希值，内部进行二次散列，为了尽量保证不同的key得到不同的哈希码！
    final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }
                //k.hashCode()函数调用的是key键值类型自带的哈希函数，
                //由于不同的对象其hashCode()有可能相同，所以需对hashCode()再次哈希，以降低相同率。
        h ^= k.hashCode();
                /*
                接下来的一串与运算和异或运算，称之为“扰动函数”，
                扰动的核心思想在于使计算出来的值在保留原有相关特性的基础上，
                增加其值的不确定性，从而降低冲突的概率。
                不同的版本实现的方式不一样，但其根本思想是一致的。
                往右移动的目的，就是为了将h的高位利用起来，减少哈西冲突
                */
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
        //【3.1】返回int类型数组的坐标
        static int indexFor(int h, int length) {
                //其实这个算法就是取模运算：h%length，取模效率不如位运算
        return h & (length-1);
    }
    //【4.1】调用addEntry
    void addEntry(int hash, K key, V value, int bucketIndex) {
    //满足条件则扩容
        //size的大小  大于 16*0.75=12的时候，比如你放入的是第13个，这第13个你打算放在没有元素的位置上的时候
        if ((size >= threshold) && (null != table[bucketIndex])) {
           //主数组扩容为2倍
            resize(2 * table.length);
            //重新调整当前元素的hash码
            hash = (null != key) ? hash(key) : 0;
            //重新计算元素位置
            bucketIndex = indexFor(hash, table.length);
        }
      //将hash,key,value,bucketIndex位置  封装为一个Entry对象：
        createEntry(hash, key, value, bucketIndex);
    }

        void createEntry(int hash, K key, V value, int bucketIndex) {
         //获取bucketIndex位置上的元素给e
        Entry<K,V> e = table[bucketIndex];
         //然后将hash, key, value封装为一个对象，然后将下一个元素的指向为e 
         //（链表的头插法）
          //将新的Entry放在table[bucketIndex]的位置上
        table[bucketIndex] = new Entry<>(hash, key, value, e);
           //集合中加入一个元素 size+1
        size++;
    }

五、HashMap底层数组的扩容

每次扩容为之前的1倍，比如原16，扩容后为32

JDK1.7

  void resize(int newCapacity) {
  //存储旧表
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
        //如果无法再扩容1倍，就再扩容 0.25 （之前是0.75倍） 
            threshold = Integer.MAX_VALUE;
            return;
        }
         //创建长度为newCapacity的数组
        Entry[] newTable = new Entry[newCapacity];
        boolean oldAltHashing = useAltHashing;
        useAltHashing |= sun.misc.VM.isBooted() &&
                (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        boolean rehash = oldAltHashing ^ useAltHashing;
                //转让方法：将老数组中的东西都重新放入新数组中
        transfer(newTable, rehash);
                //老数组替换为新数组
        table = newTable;
         //重新计算 扩容阈值
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
        //老数组里的东西加到新数组里
  void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                                //将哈希值，和新的数组容量传进去，重新计算key在新数组中的位置
                int i = indexFor(e.hash, newCapacity);
                                //头插法
                e.next = newTable[i];//获取链表上元素给e.next
                newTable[i] = e;//然后将e放在i位置 
                e = next;//e再指向下一个节点继续遍历
            }
        }
    }

六、HashMap的两个经典面试题

1. 负载因子（装填因子、加载因子）为什么是 0.75？

装填因子设置为1：空间利用率得到了很大的满足，但很容易碰撞，产生链表-> 查询效率低，时间长
装填因子设置为0.5：发生碰撞的概率低，扩容，产生链表的几率也低，查询效率高，但空间利用率低
在时间和空间的开销中，取中间值

2. 主数组的长度为什么必须为2的n次幂？

h % ( length - 1) 等效 h % length 操作，等效的前提就是：length必须是2的整数倍
为了防止哈希冲突，位置冲突，不然很容易很容易算出相同的哈希值。

七、HashMap1.8的底层原理

底层原理：数组 + 链表 = 哈希表

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。

THE END

java 散列表

二维码

python如何导入不同目录下的包

< <上一篇

java+python+nodejs+vue+php留守儿童帮扶网站

下一篇>>

搜索内容

集合深度学习09—HashMap源码解析

一、HashMap的原理简单介绍

二、HashMap的重要属性

三、HashMap的构造器

JDK1.7

JDK1.8

3.1 Map map = new HashMap();

三、HashMap的 put( ) 方法

JDK1.8

JDK1.7

五、HashMap底层数组的扩容

JDK1.7

六、HashMap的两个经典面试题

1. 负载因子（装填因子、加载因子）为什么是 0.75？

2. 主数组的长度为什么必须为2的n次幂？

七、HashMap1.8的底层原理

最新文章

分类

标签云