The Setup
Before I plunge into my rant, lets review a little-ole documentation. Under java.lang.Object
, for equals()
we have this:
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
Over in java.util.Collection
land, there is this little bit of documentation, highlighting the importance of overriding the default implementations of equals() and hashCode() if one wants to play nice in a Collection:
Many methods in Collections Framework interfaces are defined in terms of the equals method. This specification should not be construed to imply that invoking Collection.contains with a non-null argument o will cause o.equals(e) to be invoked for any element e. Implementations are free to implement optimizations whereby the equals invocation is avoided, for example, by first comparing the hash codes of the two elements.
So for instances of a class to function well as a key in a java.util.Map
or be placed into a java.util.Set
, the class should override the standard equals() and hashCode() provided by Object. Because of this universal need, it is a reasonable expectation that implementors of equals()
and hashCode()
pay attention to performance. For example, the programmers who wrote java.lang.String did a fairly simple implementation of hashCode() and then cached the result so the calculation was not repeated for a given String object.
The resulting ubiquity of overridden equals()
and hashCode()
results in a certain set of expectations:
- if the object doesn’t change the results of
equals()/hashCode()
shouldn’t change (otherwise Set collections will break) - equals()/hashCode() should be fast and in the case of hashCode – for potentially expensive hashCode() operations the results should be cached.
- Immutables (such as URL) should be good candidates to be keys in a
java.util.Map
The Takedown
So I was completely blindside by the brain-dead, high school, freshman implementation of java.net.URL
.
The only thing I will say positive about the java.net.URL
implementation equals()/hashCode()
is this: buried deep, deep in the documentation, the high school students who wrote the implementation do casually tell you that they are going to screw you over.
In java.net.URL.equals() (only if you go look at the detailed documentation) do you see this:
Compares this URL for equality with another object.
If the given object is not a URL then this method immediately returns false.
Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.
Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; (emphasis mine) else if either host name can’t be resolved, the host names must be equal without regard to case; or both host names equal to null.
Since hosts comparison requires name resolution (emphasis mine), this operation is a blocking operation.
Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP. (emphasis mine)
Translation: We are going to screw you and you will enjoy it.
The Scream
How does this screw you?
- Two fundamental operations equals()/hashCode() are now ridiculously expensive since they involve a DNS lookup to see if they resolve to the same ip address. So now an operation that is optimize in other classes is outrageously expensive (milliseconds long) for a very fundamental internet class.
- URL looks to be an immutable. Immutable objects should mean that the identity: x.equals(x) is true. But this isn’t the case for URL. If you serialize a URL, the resolved hostaddress (it’s transient) is lost. So serialize/deserialize a URL. Wait a little bit. Add a little bit of the Dynamic DNS magic. Presto “http://google.com” will not equal the deserialized version of itself
- It is simple wrong. Take the case of a hosting service that hosts two different sites on the same server. The DNS lookup resolves the 2 domains to the same physical ip address. URL (with an “assist” from URLStreamHandler, InetAddress, and our very expensive DNS lookup) compares URLs based on the ip address not the entered hostname. As a result,
http://my-porn-site.com
could ‘equal’http://jesus-has-saved-my-soul.org
. I think someone might disagree with this assessment, don’t you?
The Takeaway
So after these very expensive operations URL.equals()/hashCode()
is broken. The implementers of URL clearly decided that they were going to be ‘clever’ and for all their ‘cleverness’ the only thing they managed to do is screw users of this class. Sun should just admit the error of its ways and just reimplement URL.equals()
(with hashCode()
being the equivalent).
public boolean equals(Object o) { if( !(o instanceof URL)) { return false; } else if ( o == this ) { return true; } else { return ((URL)o).toString().equalsIgnoreCase(this.toString()); } }
The old functionality should be quietly pushed to a uselessEquals()
method some place.
The Net
Performance problems and bugs can be introduced from the wildest and least expected locations. Always recheck your assumptions.
“When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.”
–Sherlock Holmes.
hehe … i remember a year’s old comments of yours about url’s implementation of equals – did you forget?
Anyway,i had recently seen http://video.google.com/videoplay?docid=9214177555401838409 – take the time to look at their first (i think) puzzle which deals exactly with that… and btw, they propose using the URI class.
That’s really interesting.. and discouraging.
Andy — I don’t remember ever noticing that URL.equals() was so badly broken. But I guarantee now I do!
Andy — well it turned out to be the second item in.
I am going to watch the rest of that video. From what I saw – it shouldn’t be “Java Puzzlers” it should be “Broken Java”. Some of the things mentioned.. they should just fix and stop worrying about supporting broken APIs.