在"这里" (https://link.segmentfault.com/?enc=bxbybZquw5u2RhlvGUnoEw%3D%3D.uiUUCy9F9CxNbFum2x1CNU799HjmUUvZiu56v8mwfkiQWQ38QZwH4NMtHjFALh0sihBa5rnvyNXA%2BaKh1fbAE4I8A3%2FPZen7jJocsKQgS9CBR2fp5hfD1Xzw8gjoqe64yW06NkL6wMXXH%2FsbdqOGmw%3D%3D)有完整的
String 源码(好像是 JDK13 的)。
很容易搜索到 COMPACT_STRINGS 的定义和说明:
/**
* If String compaction is disabled, the bytes in {@code value} are
* always encoded in UTF16.
*
* For methods with several possible implementation paths, when String
* compaction is disabled, only one code path is taken.
*
* The instance field value is generally opaque to optimizing JIT
* compilers. Therefore, in performance-sensitive place, an explicit
* check of the static boolean {@code COMPACT_STRINGS} is done first
* before checking the {@code coder} field since the static boolean
* {@code COMPACT_STRINGS} would be constant folded away by an
* optimizing JIT compiler. The idioms for these cases are as follows.
*
* For code such as:
*
* if (coder == LATIN1) { ... }
*
* can be written more optimally as
*
* if (coder() == LATIN1) { ... }
*
* or:
*
* if (COMPACT_STRINGS && coder == LATIN1) { ... }
*
* An optimizing JIT compiler can fold the above conditional as:
*
* COMPACT_STRINGS == true => if (coder == LATIN1) { ... }
* COMPACT_STRINGS == false => if (false) { ... }
*
* @implNote
* The actual value for this field is injected by JVM. The static
* initialization block is used to set the value here to communicate
* that this static final field is not statically foldable, and to
* avoid any possible circular dependency during vm initialization.
*/
static final boolean COMPACT_STRINGS;
static {
COMPACT_STRINGS = true;
}
这一段说明大致上可以看明白,如果 "COMPACT_STRINGS" 是 "false",那 "value" 固定是按 UTF16
进行编码的。而且,大致可以猜到跟 "coder" 相关。
然后关于 "coder",可以找到对应的源码
/**
* The identifier of the encoding used to encode the bytes in
* {@code value}. The supported values in this implementation are
*
* LATIN1
* UTF16
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*/
private final byte coder;
其实就两个值,分别表示 "LATIN1" 和 "UTF16"。Java 的字段和函数(方法)是可以同名的,所以除了字段 coder 外,也可以找到函数
coder()。这个就很好理解了:
byte coder() {
return COMPACT_STRINGS ? coder : UTF16;
}
那么这句话就好理解了:
(!COMPACT_STRINGS || this.coder == aString.coder)
如果 "COMPACT_STRINGS == false",那就是按 UTF16,继续看下一个条件。如果这个条件不成立,就要看 "coder"
是否相等,如果不等,那直接判“否”。这里如果不好理解,可以自己手写代码,把这个逻辑判断拆开来理解。
boolean flag = false;
if (!COMPACT_STRINGS) {
flag = true; // 根据 COMPACT_STRINGS 的说明,这种情况下使用 UTF16,忽略 coder 值
} else if (this.coder == aString.coder) {
flag = true; // 说明 coder 一致
}
然后下一个条件,"StringLatin1.equals(value, aString.value)",直接使用 Latin1 编码规则来对字符串的内部数据
"value" 来进行比较。至于 value 是什么,代码里也很清楚的写了,就是用来存储字符的。
/**
* The value is used for character storage.
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*
* Additionally, it is marked with {@link Stable} to trust the contents
* of the array. No other facility in JDK provides this functionality (yet).
* {@link Stable} is safe here, because value is never null.
*/
@Stable
private final byte[] value;
所以整个比较的逻辑就出来了
1. 先判断是不是字符串,如果不是,那前提无效,比较失败
2. 判断是否相同的 coder(COMPACT_STRINGS 的值间接影响 coder 一致比较),如果不是,前提不足,比较失败
3. 相同 coder 的情况下,比较内部数据是否一致,决定最终的比较结果
***
«2024-03-03 17:33:32 补充回答评论“如果 COMPACT_STRINGS == false,那就是按
UTF16”,但是在jdk13这个源码中,只用了StringLatin1.equals,utf16是如何进行比较的?»
UTF16 如何进行比较我,说实话我也没明白。不过只要确定了编码规则一样,而且 StringLatin1
是按字节进行对比的话,那其实并不需要关注它本身是什么编码规则。毕竟按字节对比是最底层的方法。当然前提是去看 StringLatin1
代码看看他的实现是不是跟假设的一样。
«是循环运行了很多次,但它本质上是return,并没有什么循环语句,为啥会循环运行?
在断点debug的时候,发现"a".equals("a"),传递的Object anObject参数是GBK,请问这又是为什么»
并不是循环,如果在 ""a".equals("a")" 的时候发现了需要比较的是 ""GBK"",那说明在比较的过程中有编码的比较。这里产生的比较似乎只有在
"StringLatin1.equals" 中才会发生。所以到底是怎么回事,可能还是要看看 "StringLatin1" 的源代码。另外,既然 DEBUG
跟踪了,那可以看看调用栈,并且在调用栈中逐级去查找调用点的代码。