首页手记 GO 中 map 的实现原理

GO 中 map 的实现原理

标签：

GO 中 map 的实现原理

嗨，我是小魔童哪吒，我们来回顾一下上一次分享的内容

分享了切片是什么
切片和数组的区别
切片的数据结构
切片的扩容原理
空切片和 nil 切片的区别

map 是什么？

是 GO 中的一种数据类型，底层实现是 hash 表，看到 hash 表 是不是会有一点熟悉的感觉呢

我们在写 C/C++ 的时候，里面也有 map 这种数据结构，是 key - value 的形式

可是在这里我们可别搞混了，GO 里面的 map 和 C/C++ 的map 可不是同一种实现方式

C/C++ 的 map 底层是红黑树实现的
GO 的 map 底层是hash 表实现的

可是别忘了C/C++中还有一个数据类型是 unordered_map，无序map，他的底层实现是 hash 表，与我们GO 里面的 map 实现方式类似

map 的数据结构是啥样的？

前面说到的 GO 中 string 实现原理，GO 中 slice 实现原理，都会对应有他们的底层数据结构

哈，没有例外，今天说的 map 必然也有自己的数据结构，相对来说会比前者会多一些成员，我们这就来看看吧

map 具体的实现源码位置是：src/runtime/map.go

// A header for a Go map.
type hmap struct {
   // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
   // Make sure this stays in sync with the compiler's definition.
   count     int // # live cells == size of map.  Must be first (used by len() builtin)
   flags     uint8
   B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
   noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
   hash0     uint32 // hash seed

   buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
   oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
   nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

   extra *mapextra // optional fields
}

hmap结构中的成员我们来一个一个看看：

字段	含义
count	当前元素保存的个数
flags	记录几个特殊的标志位
B	hash 具体的buckets数量是 2^B 个
noverflow	溢出桶的近似数目
hash0	hash种子
buckets	一个指针，指向2^B个桶对应的数组指针，若count为0 则这个指针为 nil
oldbuckets	一个指针，指向扩容前的buckets数组
nevacuate	疏散进度计数器，也就是扩容后的进度
extra	可选字段，一般用于保存溢出桶链表的地址，或者是还没有使用过的溢出桶数组的首地址

通过extra字段，我们看到他是mapextra类型的，我们看看细节

// mapextra holds fields that are not present on all maps.
type mapextra struct {
   // If both key and elem do not contain pointers and are inline, then we mark bucket
   // type as containing no pointers. This avoids scanning such maps.
   // However, bmap.overflow is a pointer. In order to keep overflow buckets
   // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
   // overflow and oldoverflow are only used if key and elem do not contain pointers.
   // overflow contains overflow buckets for hmap.buckets.
   // oldoverflow contains overflow buckets for hmap.oldbuckets.
   // The indirection allows to store a pointer to the slice in hiter.
   overflow    *[]*bmap
   oldoverflow *[]*bmap

   // nextOverflow holds a pointer to a free overflow bucket.
   nextOverflow *bmap
}

点进来，这里主要是要和大家一起看看这个 bmap的数据结构，

这个结构是，GO map 里面桶的实现结构，

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

type bmap struct {
    tophash [8]uint8 //存储哈希值的高8位
    data    byte[1]  //key value数据:key/key/key/.../value/value/value...
    overflow *bmap   //溢出bucket的地址
}

源码的意思是这样的：

tophash 一般存放的是桶内每一个key hash值字节，如果 tophash[0] < minTopHash， tophash[0] 是一个疏散状态

这里源码中有一个注意点：

实际上分配内存的时候，内存的前8个字节是 bmap ，后面跟着 8 个 key 、 8 个 value 和 1 个溢出指针

我们来看看图吧

GO 中 map 底层数据结构成员相对比 string 和 slice 多一些，不过也不是很复杂，咱们画图来瞅瞅

咱们的 hmap的结构是这样的，可以关注桶数组（hmap.buckets）

若图中的 B = 3的话的，那么桶数组长度就是 8

上面看到每一个 bucket ，最多可以存放 8 个key / value对

如果超出了 8 个的话，那么就会溢出，此时就会链接到额外的溢出桶

理解起来是这个样子的

严格来说，每一个桶里面只会有8 个键值对，若多余 8 的话，就会溢出，溢出的指针就会指向另外一个桶对应的 8个键值对

这里我们结合一下上面 bmap 的数据结构：

tophash 是个长度为8的数组

哈希值低位相同的键存入当前bucket时，会将哈希值的高位存储在该数组中，便于后续匹配

data里面存放的是 key-value 数据

存放顺序是8个key依次排开，8个value依次排开，这是为啥呢？

因为GO 里面为了字节对齐，节省空间

overflow 指针，指向的是另外一个桶

这里是解决了 2 个问题，第一是解决了溢出的问题，第二是解决了冲突问题

啥是哈希冲突？

上述我们说到 hash 冲突，我们来看看啥是hash 冲突，以及如何解决呢

关键字值不同的元素可能会映象到哈希表的同一地址上就会发生哈希冲突

简单对应到我们的上述数据结构里面来，我们可以这样理解

当有两个或以上的键(key)被哈希到了同一个bucket时，这些键j就发生了冲突

关于解决hash 冲突的方式大体有如下 4 个，网上查找的资料，咱们引用一下，梳理一波看看：

开放定址法

当冲突发生时，使用某种探查(亦称探测)技术在散列表中形成一个探查(测)序列。

沿此序列逐个单元地查找，直到找到给定的关键字，或者碰到一个开放的地址(即该地址单元为空)为止（若要插入，在探查到开放的地址，则可将待插入的新结点存人该地址单元）。

查找时探查到开放的地址则表明表中无待查的关键字，即查找失败。