List Set Map or Collection?
OneToMany and ManyToMany types
When creating @Entity Beans you will have to choose how to map your @OneToMany and @ManyToMany
types.
Refering to JPA specification (Section 2.1.1 pg19).
Specifically you need to choose between using List Set Map or Collection.
Collection
I do not know of a reason why Collection should be used over the more specific Set or List.
There is nothing I have read or seen in the JPA Spec about what behaviour you would get
if you specify a Collection.
Specifically you can not be sure if the implementation will give you Set or List
semantics which I think is fairly important to know.
If anyone knows of a reason for specifying Collection I'd love to know - thanks.
Map
Map is sufficiently different from List or Set that it should be fairly easy to choose
whether or not to use it.
The implication is that you get benefit from the Map API (get put methods etc).
The potential issue with this theory is that the ORM mapping should represent
a 'logical model' that is independant of how that model is used. Specifying a
Map (instead of Set or List) implies knowledge of 'how' the data is used.
Using Maps at Runtime with Ebean
Set - ordered or not? The JPA Spec doesn't say?
At this moment the JPA spec (as far as I can tell)
does not make it clear if predictable iteration order on a Set (or Map) is the expected
behaviour. That is, HashSet does not provide a predicable iteration order (and LinkedHashSet does).
For example, if you use a @OrderBy annotation on a Set you may find that the iteration order
does not match the order specified in the order by. The likely reason for this is that the JPA
provider has choosen to use HashSet rather than LinkedHashSet as the basis for the actual implementation.
Ehrsson has noted this behaviour in his blog at
Ehrsson.java Blog
...
@OneToMany
@OrderBy(createdDate)
Set<OrderLine> lines;
...
Set<OrderLine> lines = order.getLines();
Iterator it = lines.iterator();
...
// this iteration order may not match your @OrderBy annotation :(
If your JPA vendor does not provide predicatable ordering for Sets then
this may be the most important factor in your decision.
That is, if you ever want the Collection to be ordered you may HAVE to use
List rather than Set.
Ebean uses LinkedHashSet by default for the underlying implementation of Sets.
This provides predicatable iteration order for Sets.
List versus Set - Duplicates allowed
Lists allow duplicates and Sets do not allow duplicates. For some this will
be the main reason for them choosing List or Set.
List versus Set - Performance
Performance may come into the decision for some. Its a detailed issue
so I am just going to skim the surface with the highlights:
contains:
In general you could say the performance of contains() will be better in
Set (using hashing) over List (linear search).
iteration:
In general I would say iteration performance of List will be faster than Set.
Its hard to say exactly when and why but there seem to be JVM optimizations
that can be made on List (& especially ArrayList).
In addition its noted that you can iterate over List without using an Iterator
and avoiding that object creation.
growth:
Lists and Sets grow in quite a different fashion.
ArrayList grows using System.arrayCopy and Sets and Maps grow with rehashing.
Avoiding lots of rehashing of Sets and Maps should probably be something to keep in mind.
List versus Set - Extra API, paging through a List
List provides a bit more API over Set. Specifically it provides the ability to get set and remove
using an index (and provides subList() and listIterator()).
This extra API most notably provides an efficient way of paging through the list.
For example, you can process/view rows 30 to 40 of 100 more efficiently than a Set
by using the index based get() method (or subList() etc).
Summary
I see no benefit in using Collection.
You need to check if your JPA Vendor provides Set with a predictable iteration order or
not. If you are using Ebean, note that it provides a predictable iteration order for Sets.
For deciding between Set or List you need to weigh the issues around:
- Duplicates
- Does Set give predictable iteration order (Vendor specific)
- Performance - iteration, contains, growth
- Lists extra API - paging through a List
Personally I feel the issue of duplicates is more a coding issue (I have never had a problem
with duplicates and List). I don't give this much weight.
If I use Sets I'd want them to be ABLE to have predictable iteration order. I'd avoid Set
if this was not available (for a given JPA vendor).
Personally I feel that performance is not a deciding factor. If I chose List I just need to
be aware of the cost of using contains(). For Set I need to be more wary of rehashing.
Personally I feel the benefits of the List API and the ability to easily page though a big
List is very compelling (a common problem worth having a good solution for).
My Personal choice would be to use List over Set always (if its up to me).
|