Java ArrayList.add 的实现方法

arraylist 是平时相当常用的list实现, 其中 boolean add(e e) 的实现比较直接:

								
									 /** 

									    * appends the specified element to the end of this list. 

									    * 

									    * @param e element to be appended to this list 

									    * @return <tt>true</tt> (as specified by {@link collection#add}) 

									    */ 

									 public   boolean   add(e e) { 

									     ensurecapacityinternal(size +   1  );   // increments modcount!! 

									     elementdata[size++] = e; 

									     return   true  ; 

									 }

有时候也使用 void add(int index, e element) 把元素插入到指定的 index 上. 在jdk中的实现是:

								
									 /** 

									    * inserts the specified element at the specified position in this 

									    * list. shifts the element currently at that position (if any) and 

									    * any subsequent elements to the right (adds one to their indices). 

									    * 

									    * @param index index at which the specified element is to be inserted 

									    * @param element element to be inserted 

									    * @throws indexoutofboundsexception {@inheritdoc} 

									    */ 

									 public   void   add(  int   index, e element) { 

									     rangecheckforadd(index); 

									     ensurecapacityinternal(size +   1  );   // increments modcount!! 

									     system.arraycopy(elementdata, index, elementdata, index +   1  , 

									              size - index); 

									     elementdata[index] = element; 

									     size++; 

									 }

略有差别, 需要保证当前 elementdata 数组容量够用, 然后把从 index 处一直到尾部的数组元素都向后挪一位. 最后把要插入的元素赋给数组的 index 处.

一直以来, 我都认为 system.arraycopy 这个native方法, 它的c++实现是调用底层的 memcpy , 直接方便, 效率也没问题.

但今天看了openjdk的源码发现并非如此.

以openjdk8u60 为例, 在 objarrayklass.cpp 中:

								
									 void   objarrayklass::copy_array(arrayoop s,   int   src_pos, arrayoop d, 

									                   int   dst_pos,   int   length, traps) { 

									    assert  (s->is_objarray(),   "must be obj array"  ); 

									    if   (!d->is_objarray()) { 

									     throw  (vmsymbols::java_lang_arraystoreexception()); 

									    } 

									    // check is all offsets and lengths are non negative 

									    if   (src_pos <   0   || dst_pos <   0   || length <   0  ) { 

									     throw  (vmsymbols::java_lang_arrayindexoutofboundsexception()); 

									    } 

									    // check if the ranges are valid 

									    if   ( (((unsigned   int  ) length + (unsigned   int  ) src_pos) > (unsigned   int  ) s->length()) 

									      || (((unsigned   int  ) length + (unsigned   int  ) dst_pos) > (unsigned   int  ) d->length()) ) { 

									     throw  (vmsymbols::java_lang_arrayindexoutofboundsexception()); 

									    } 

									    // special case. boundary cases must be checked first 

									    // this allows the following call: copy_array(s, s.length(), d.length(), 0). 

									    // this is correct, since the position is supposed to be an 'in between point', i.e., s.length(), 

									    // points to the right of the last element. 

									    if   (length==  0  ) { 

									     return  ; 

									    } 

									    if   (usecompressedoops) { 

									     narrowoop*   const   src = objarrayoop(s)->obj_at_addr<narrowoop>(src_pos); 

									     narrowoop*   const   dst = objarrayoop(d)->obj_at_addr<narrowoop>(dst_pos); 

									     do_copy<narrowoop>(s, src, d, dst, length, check); 

									    }   else   { 

									     oop*   const   src = objarrayoop(s)->obj_at_addr<oop>(src_pos); 

									     oop*   const   dst = objarrayoop(d)->obj_at_addr<oop>(dst_pos); 

									     do_copy<oop> (s, src, d, dst, length, check); 

									    } 

									 }

可以看到copy_array在做了各种检查之后, 最终copy的部分在do_copy方法中, 而这个方法实现如下:

								
									 // either oop or narrowoop depending on usecompressedoops. 

									 template <  class   t>   void   objarrayklass::do_copy(arrayoop s, t* src, 

									                   arrayoop d, t* dst,   int   length, traps) { 

									    barrierset* bs = universe::heap()->barrier_set(); 

									    // for performance reasons, we assume we are that the write barrier we 

									    // are using has optimized modes for arrays of references. at least one 

									    // of the asserts below will fail if this is not the case. 

									    assert  (bs->has_write_ref_array_opt(),   "barrier set must have ref array opt"  ); 

									    assert  (bs->has_write_ref_array_pre_opt(),   "for pre-barrier as well."  ); 

									    if   (s == d) { 

									     // since source and destination are equal we do not need conversion checks. 

									     assert  (length >   0  ,   "sanity check"  ); 

									     bs->write_ref_array_pre(dst, length); 

									     copy::conjoint_oops_atomic(src, dst, length); 

									    }   else   { 

									     // we have to make sure all elements conform to the destination array 

									     klass* bound = objarrayklass::cast(d->klass())->element_klass(); 

									     klass* stype = objarrayklass::cast(s->klass())->element_klass(); 

									     if   (stype == bound || stype->is_subtype_of(bound)) { 

									      // elements are guaranteed to be subtypes, so no check necessary 

									      bs->write_ref_array_pre(dst, length); 

									      copy::conjoint_oops_atomic(src, dst, length); 

									     }   else   { 

									      // slow case: need individual subtype checks 

									      // note: don't use obj_at_put below because it includes a redundant store check 

									      t* from = src; 

									      t* end = from + length; 

									      for   (t* p = dst; from < end; from++, p++) { 

									       // xxx this is going to be slow. 

									       t element = *from; 

									       // even slower now 

									       bool element_is_null = oopdesc::is_null(element); 

									       oop new_val = element_is_null ? oop(  null  ) 

									                      : oopdesc::decode_heap_oop_not_null(element); 

									       if   (element_is_null || 

									         (new_val->klass())->is_subtype_of(bound)) { 

									        bs->write_ref_field_pre(p, new_val); 

									        *p = element; 

									       }   else   { 

									        // we must do a barrier to cover the partial copy. 

									        const   size_t pd = pointer_delta(p, dst, (size_t)heapoopsize); 

									        // pointer delta is scaled to number of elements (length field in 

									        // objarrayoop) which we assume is 32 bit. 

									        assert  (pd == (size_t)(  int  )pd,   "length field overflow"  ); 

									        bs->write_ref_array((heapword*)dst, pd); 

									        throw  (vmsymbols::java_lang_arraystoreexception()); 

									        return  ; 

									       } 

									      } 

									     } 

									    } 

									    bs->write_ref_array((heapword*)dst, length); 

									 }

可以看到, 在设定了heap barrier之后, 元素是在for循环中被一个个挪动的. 做的工作比我想象的要多.

如果有m个元素, 按照给定位置, 使用arraylist.add(int,e)逐个插入到一个长度为n的arraylist中, 复杂度应当是o(m*n), 或者o(m*(m+n)), 所以, 如果m和n都不小的话, 效率确实是不高的.

效率高一些的方法是, 建立m+n长度的数组或arraylist, 在给定位置赋值该m个要插入的元素, 其他位置依次赋值原n长度list的元素. 这样时间复杂度应当是o(m+n).

还有, 在前面的实现中, 我们可以看到有对ensurecapacityinternal(int) 的调用. 这个保证数组容量的实现主要在:

								
									 /** 

									    * increases the capacity to ensure that it can hold at least the 

									    * number of elements specified by the minimum capacity argument. 

									    * 

									    * @param mincapacity the desired minimum capacity 

									    */ 

									 private   void   grow(  int   mincapacity) { 

									     // overflow-conscious code 

									     int   oldcapacity = elementdata.length; 

									     int   newcapacity = oldcapacity + (oldcapacity >>   1  ); 

									     if   (newcapacity - mincapacity <   0  ) 

									       newcapacity = mincapacity; 

									     if   (newcapacity - max_array_size >   0  ) 

									       newcapacity = hugecapacity(mincapacity); 

									     // mincapacity is usually close to size, so this is a win: 

									     elementdata = arrays.copyof(elementdata, newcapacity); 

									 }

大家知道由于效率原因, arraylist容量增长不是正好按照要求的容量mincapacity来设计的, 新容量计算的主要逻辑是: 如果要求容量比当前容量的1.5倍大, 就按照要求容量重新分配空间; 否则按当前容量1.5倍增加. 当然不能超出integer.max_value了. oldcapacity + (oldcapacity >> 1) 实际就是当前容量1.5倍, 等同于(int) (oldcapacity * 1.5), 但因这段不涉及浮点运算只是移位, 显然效率高不少.

所以如果arraylist一个一个add元素的话, 容量是在不够的时候1.5倍增长的. 关于1.5这个数字, 或许是觉得2倍增长太快了吧. 也或许有实验数据的验证支撑.

关于这段代码中出现的arrays.copyof这个方法, 实现的是重新分配一段数组, 把elementdata赋值给新分配的空间, 如果新分配的空间大, 则后面赋值null, 如果分配空间比当前数组小则截断. 底层还是调用的system.arraycopy.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持。

原文链接：https://segmentfault测试数据/a/1190000016910760

查看更多关于Java ArrayList.add 的实现方法的详细内容...

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://haodehen.cn/did250252

更新时间：2023-07-13 阅读：12次