Java HashCode Identity Crisis
I was unit testing a global UDF that accepts a string and leverages Java's built-in hashCode method to return a signed 32bit integer. This seemed trivial, but I wondered if different versions of ColdFusion that are outside of my internal control would consistently return the same values. I pasted a very basic example into TryCF.com and naively used CGI.REMOTE_ADDR as the value and got different results... but this was entirely my fault. After outputting the values of CGI.REMOTE_ADDR, it became obvious that some TryCF engines were configured to proxy the request and my thought-to-be consistent IP address wasn't being used. Phew! This was a relief to discover, but it did cause me to pause and wonder regarding the consistency of Java's hashcode generation. According to the JavaDocs, "As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.", but this is meaningless if the hashCode's flag option is not consistent. There are 6 different options. Historically, Java 6 & 7 used a hashCode flag of 0 and Java 8+ use a default value of 5. Summary of Each Flag Value (0-5): 0: Global Park-Miller RNG (used in Java 6 and earlier as the default). 1: A function of the object's memory address and a global state, stable across stop-the-world (STW) operations. 2: Always returns a constant value of 1 (used for testing purposes, not practical for production). 3: A global incrementing counter. 4: Directly uses the object's memory address (cast to an integer). 5: Marsaglia's XOR-shift with thread-specific state (default from Java 8 onwards). If this method's behavior can be overridden or the default behavior has changed in the past, it's probably best to not blindly assume that it won't ever change in the future. This is what my dotNET brother was trying to explain to me as he indicated that it could be different on every PC his apps are installed on and his recommendation to never to leave anything to chance. My recent mantra of mine has been "Trust, but verify". I've been burned too many times over the last 25 years as a ColdFusion developer when I make any assumptions regarding "black boxes of code that doesn't have any transparency". I've encountered integers, dates & email addresses in production that are invalid, except that built-in ColdFusion functions insisted they were all "valid". I've created and processed JPEGs & PDFs that are valid, yet built-in ColdFusions functions can't seem to agree. (ie, isImageFile & isPDFFile will return TRUE, but then CFImage or CFPDF will throw an error when attempting to read the file.) I endeavor to develop cross-platform ColdFusion code, but this isn't possible if the undocumented Java hashCode method to left to how it's been configured at the java level which could potentially be different per-platform (ColdFusion, ColdFusion "future", Lucee & BoxLang). In an effort to normalize values and return consistent a 32bit integer regardless of how Java may be configured, I spent some time today to develop a CFScript UDF alternative that performs the same function. The results returned are consistent with Java's hashCode, but the performance is substantially poorer. 1,000 iteration using built-in Java is consistently ~2ms whereas the CFML UDF fluctuates between 58-70 ms. I'm not sure if there's any way to optimize this any further. (If you can, let me know.) Source Code https://gist.github.com/JamoCA/111cb4ef9855c1bdbfc4a8409b1d8db9
I was unit testing a global UDF that accepts a string and leverages Java's built-in hashCode method to return a signed 32bit integer. This seemed trivial, but I wondered if different versions of ColdFusion that are outside of my internal control would consistently return the same values. I pasted a very basic example into TryCF.com and naively used CGI.REMOTE_ADDR
as the value and got different results... but this was entirely my fault. After outputting the values of CGI.REMOTE_ADDR
, it became obvious that some TryCF engines were configured to proxy the request and my thought-to-be consistent IP address wasn't being used. Phew! This was a relief to discover, but it did cause me to pause and wonder regarding the consistency of Java's hashcode generation.
According to the JavaDocs, "As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.", but this is meaningless if the hashCode's flag option is not consistent. There are 6 different options. Historically, Java 6 & 7 used a hashCode flag of 0
and Java 8+ use a default value of 5
.
Summary of Each Flag Value (0-5):
- 0: Global Park-Miller RNG (used in Java 6 and earlier as the default).
- 1: A function of the object's memory address and a global state, stable across stop-the-world (STW) operations.
- 2: Always returns a constant value of 1 (used for testing purposes, not practical for production).
- 3: A global incrementing counter.
- 4: Directly uses the object's memory address (cast to an integer).
- 5: Marsaglia's XOR-shift with thread-specific state (default from Java 8 onwards).
If this method's behavior can be overridden or the default behavior has changed in the past, it's probably best to not blindly assume that it won't ever change in the future. This is what my dotNET brother was trying to explain to me as he indicated that it could be different on every PC his apps are installed on and his recommendation to never to leave anything to chance.
My recent mantra of mine has been "Trust, but verify". I've been burned too many times over the last 25 years as a ColdFusion developer when I make any assumptions regarding "black boxes of code that doesn't have any transparency". I've encountered integers, dates & email addresses in production that are invalid, except that built-in ColdFusion functions insisted they were all "valid". I've created and processed JPEGs & PDFs that are valid, yet built-in ColdFusions functions can't seem to agree. (ie, isImageFile
& isPDFFile
will return TRUE
, but then CFImage
or CFPDF
will throw an error when attempting to read the file.)
I endeavor to develop cross-platform ColdFusion code, but this isn't possible if the undocumented Java hashCode method to left to how it's been configured at the java level which could potentially be different per-platform (ColdFusion, ColdFusion "future", Lucee & BoxLang).
In an effort to normalize values and return consistent a 32bit integer regardless of how Java may be configured, I spent some time today to develop a CFScript UDF alternative that performs the same function. The results returned are consistent with Java's hashCode, but the performance is substantially poorer. 1,000 iteration using built-in Java is consistently ~2ms whereas the CFML UDF fluctuates between 58-70 ms. I'm not sure if there's any way to optimize this any further. (If you can, let me know.)
Source Code
https://gist.github.com/JamoCA/111cb4ef9855c1bdbfc4a8409b1d8db9