理解Java String类equals方法的内部逻辑?-灵析社区

环境：jdk18 ## 前言今天在看Java string类的equals源码，源码主要逻辑比较好理解：先判断是否是同一对象，是就直接返回true，否则判断类型是否是string类型，且每一个元素内容是否相同（先判断length，再判断内容） ## 问题描述在使用断点debug时发现问题1： return (anObject instanceof String aString) && (!COMPACT_STRINGS || this.coder == aString.coder) && StringLatin1.equals(value, aString.value); 是**循环运行** 的，且有的时候**value与aString.value的数组长度就不一样（哪怕字符相等，如`"a".equals("a")`）** 问题2： * `"a".equals(new String("a"));` 传到equals后，参数如图所示 ![image.png](https://wmprod.oss-cn-shanghai.aliyuncs.com/c/user/20241008/aac546615c33412c5a1bfd1e252a3ae0.png) * `"a".equals("a");` 传到equals后，参数如图所示 ![image.png](https://wmprod.oss-cn-shanghai.aliyuncs.com/c/user/20241008/d7ff57fcba18a8d22d5eca1937fbb601.png) 但是上述两行代码，正常情况下传值后不应该是"a"吗？ * * * 上述两个问题，实在想不通，希望大佬可以解惑一下

阅读量：267

点赞量：0

问AI

Daily毅星

在"这里" (https://link.segmentfault.com/?enc=bxbybZquw5u2RhlvGUnoEw%3D%3D.uiUUCy9F9CxNbFum2x1CNU799HjmUUvZiu56v8mwfkiQWQ38QZwH4NMtHjFALh0sihBa5rnvyNXA%2BaKh1fbAE4I8A3%2FPZen7jJocsKQgS9CBR2fp5hfD1Xzw8gjoqe64yW06NkL6wMXXH%2FsbdqOGmw%3D%3D)有完整的 String 源码（好像是 JDK13 的）。很容易搜索到 COMPACT_STRINGS 的定义和说明： /** * If String compaction is disabled, the bytes in {@code value} are * always encoded in UTF16. * * For methods with several possible implementation paths, when String * compaction is disabled, only one code path is taken. * * The instance field value is generally opaque to optimizing JIT * compilers. Therefore, in performance-sensitive place, an explicit * check of the static boolean {@code COMPACT_STRINGS} is done first * before checking the {@code coder} field since the static boolean * {@code COMPACT_STRINGS} would be constant folded away by an * optimizing JIT compiler. The idioms for these cases are as follows. * * For code such as: * * if (coder == LATIN1) { ... } * * can be written more optimally as * * if (coder() == LATIN1) { ... } * * or: * * if (COMPACT_STRINGS && coder == LATIN1) { ... } * * An optimizing JIT compiler can fold the above conditional as: * * COMPACT_STRINGS == true => if (coder == LATIN1) { ... } * COMPACT_STRINGS == false => if (false) { ... } * * @implNote * The actual value for this field is injected by JVM. The static * initialization block is used to set the value here to communicate * that this static final field is not statically foldable, and to * avoid any possible circular dependency during vm initialization. */ static final boolean COMPACT_STRINGS; static { COMPACT_STRINGS = true; } 这一段说明大致上可以看明白，如果 "COMPACT_STRINGS" 是 "false"，那 "value" 固定是按 UTF16 进行编码的。而且，大致可以猜到跟 "coder" 相关。然后关于 "coder"，可以找到对应的源码 /** * The identifier of the encoding used to encode the bytes in * {@code value}. The supported values in this implementation are * * LATIN1 * UTF16 * * @implNote This field is trusted by the VM, and is a subject to * constant folding if String instance is constant. Overwriting this * field after construction will cause problems. */ private final byte coder; 其实就两个值，分别表示 "LATIN1" 和 "UTF16"。Java 的字段和函数（方法）是可以同名的，所以除了字段 coder 外，也可以找到函数 coder()。这个就很好理解了： byte coder() { return COMPACT_STRINGS ? coder : UTF16; } 那么这句话就好理解了： (!COMPACT_STRINGS || this.coder == aString.coder) 如果 "COMPACT_STRINGS == false"，那就是按 UTF16，继续看下一个条件。如果这个条件不成立，就要看 "coder" 是否相等，如果不等，那直接判“否”。这里如果不好理解，可以自己手写代码，把这个逻辑判断拆开来理解。 boolean flag = false; if (!COMPACT_STRINGS) { flag = true; // 根据 COMPACT_STRINGS 的说明，这种情况下使用 UTF16，忽略 coder 值 } else if (this.coder == aString.coder) { flag = true; // 说明 coder 一致 } 然后下一个条件，"StringLatin1.equals(value, aString.value)"，直接使用 Latin1 编码规则来对字符串的内部数据 "value" 来进行比较。至于 value 是什么，代码里也很清楚的写了，就是用来存储字符的。 /** * The value is used for character storage. * * @implNote This field is trusted by the VM, and is a subject to * constant folding if String instance is constant. Overwriting this * field after construction will cause problems. * * Additionally, it is marked with {@link Stable} to trust the contents * of the array. No other facility in JDK provides this functionality (yet). * {@link Stable} is safe here, because value is never null. */ @Stable private final byte[] value; 所以整个比较的逻辑就出来了 1. 先判断是不是字符串，如果不是，那前提无效，比较失败 2. 判断是否相同的 coder（COMPACT_STRINGS 的值间接影响 coder 一致比较），如果不是，前提不足，比较失败 3. 相同 coder 的情况下，比较内部数据是否一致，决定最终的比较结果 *** «2024-03-03 17:33:32 补充回答评论“如果 COMPACT_STRINGS == false，那就是按 UTF16”，但是在jdk13这个源码中，只用了StringLatin1.equals，utf16是如何进行比较的？» UTF16 如何进行比较我，说实话我也没明白。不过只要确定了编码规则一样，而且 StringLatin1 是按字节进行对比的话，那其实并不需要关注它本身是什么编码规则。毕竟按字节对比是最底层的方法。当然前提是去看 StringLatin1 代码看看他的实现是不是跟假设的一样。 «是循环运行了很多次，但它本质上是return，并没有什么循环语句，为啥会循环运行？在断点debug的时候，发现"a".equals("a")，传递的Object anObject参数是GBK，请问这又是为什么» 并不是循环，如果在 ""a".equals("a")" 的时候发现了需要比较的是 ""GBK""，那说明在比较的过程中有编码的比较。这里产生的比较似乎只有在 "StringLatin1.equals" 中才会发生。所以到底是怎么回事，可能还是要看看 "StringLatin1" 的源代码。另外，既然 DEBUG 跟踪了，那可以看看调用栈，并且在调用栈中逐级去查找调用点的代码。