List Set Map or Collection?

OneToMany and ManyToMany types

When creating @Entity Beans you will have to choose how to map your @OneToMany and @ManyToMany types.

Refering to JPA specification (Section 2.1.1 pg19). Specifically you need to choose between using List Set Map or Collection.

Collection

I do not know of a reason why Collection should be used over the more specific Set or List. There is nothing I have read or seen in the JPA Spec about what behaviour you would get if you specify a Collection. Specifically you can not be sure if the implementation will give you Set or List semantics which I think is fairly important to know.

If anyone knows of a reason for specifying Collection I'd love to know - thanks.

Map

Map is sufficiently different from List or Set that it should be fairly easy to choose whether or not to use it.

The implication is that you get benefit from the Map API (get put methods etc).

The potential issue with this theory is that the ORM mapping should represent a 'logical model' that is independant of how that model is used. Specifying a Map (instead of Set or List) implies knowledge of 'how' the data is used. Using Maps at Runtime with Ebean

Set - ordered or not? The JPA Spec doesn't say?

At this moment the JPA spec (as far as I can tell) does not make it clear if predictable iteration order on a Set (or Map) is the expected behaviour. That is, HashSet does not provide a predicable iteration order (and LinkedHashSet does).

For example, if you use a @OrderBy annotation on a Set you may find that the iteration order does not match the order specified in the order by. The likely reason for this is that the JPA provider has choosen to use HashSet rather than LinkedHashSet as the basis for the actual implementation.

Ehrsson has noted this behaviour in his blog at Ehrsson.java Blog

...
@OneToMany
@OrderBy(createdDate)
Set<OrderLine> lines;
...

Set<OrderLine> lines = order.getLines();
Iterator it = lines.iterator();
...
// this iteration order may not match your @OrderBy annotation :(

If your JPA vendor does not provide predicatable ordering for Sets then this may be the most important factor in your decision.

That is, if you ever want the Collection to be ordered you may HAVE to use List rather than Set.

Ebean uses LinkedHashSet by default for the underlying implementation of Sets. This provides predicatable iteration order for Sets.

List versus Set - Duplicates allowed

Lists allow duplicates and Sets do not allow duplicates. For some this will be the main reason for them choosing List or Set.

List versus Set - Performance

Performance may come into the decision for some. Its a detailed issue so I am just going to skim the surface with the highlights:

contains:

In general you could say the performance of contains() will be better in Set (using hashing) over List (linear search).

iteration:

In general I would say iteration performance of List will be faster than Set. Its hard to say exactly when and why but there seem to be JVM optimizations that can be made on List (& especially ArrayList).

In addition its noted that you can iterate over List without using an Iterator and avoiding that object creation.

growth:

Lists and Sets grow in quite a different fashion. ArrayList grows using System.arrayCopy and Sets and Maps grow with rehashing. Avoiding lots of rehashing of Sets and Maps should probably be something to keep in mind.

List versus Set - Extra API, paging through a List

List provides a bit more API over Set. Specifically it provides the ability to get set and remove using an index (and provides subList() and listIterator()).

This extra API most notably provides an efficient way of paging through the list. For example, you can process/view rows 30 to 40 of 100 more efficiently than a Set by using the index based get() method (or subList() etc).

Summary

I see no benefit in using Collection.

You need to check if your JPA Vendor provides Set with a predictable iteration order or not. If you are using Ebean, note that it provides a predictable iteration order for Sets.

For deciding between Set or List you need to weigh the issues around:

  • Duplicates
  • Does Set give predictable iteration order (Vendor specific)
  • Performance - iteration, contains, growth
  • Lists extra API - paging through a List

Personally I feel the issue of duplicates is more a coding issue (I have never had a problem with duplicates and List). I don't give this much weight.

If I use Sets I'd want them to be ABLE to have predictable iteration order. I'd avoid Set if this was not available (for a given JPA vendor).

Personally I feel that performance is not a deciding factor. If I chose List I just need to be aware of the cost of using contains(). For Set I need to be more wary of rehashing.

Personally I feel the benefits of the List API and the ability to easily page though a big List is very compelling (a common problem worth having a good solution for).

My Personal choice would be to use List over Set always (if its up to me).

Note: HashSet and HashMap - unpredicatable ordering

HashSet and HashMap do not provide a predicatable iteration order. Ebean will by default use a LinkedHashSet or LinkedHashMap as the default underlying implementation so that iterator order can match an query OrderBy clause.

Some vendors may not use LinkedHashSet by default and the JPA spec does not seem to mention iteration order (that I can see).

Using Maps at Runtime with Ebean

@OneToMany Map<Something> details;
Yup its very cool and maybe useful one day but personally I think a Map implies too much knowledge of the 'USE' of the data as opposed to the pure 'LOGICAL RELATIONSHIP'.

Time will tell but currently I don't think I'll be using Map for many (if any) @OneToMany or @ManyToMany properties.

You can easily fetch maps using Ebean. Here are some examples:

Example: Simple example


Map map = Ebean.find(Order.class).findMap();

Example: Specify a map key


Map withKeyMap = Ebean.find(Order.class)
    .setMapKey("status")
    .findMap();

Example: Fetch a Map and 'merge' it into another map

// put the keys in the Map
// use LinkedHashMap to control the order
LinkedHashMap load = new LinkedHashMap();
load.put(1, null);
load.put(2, null);

// get a Map of Orders using IN keySet

Map fetchMap = Ebean.find(Order.class)
      .where().in("id", load.keySet())
      .findMap();

// merge the fetchMap into load
load.putAll(fetchMap);

Introduction User Guide (pdf) Install/Configure Public JavaDoc Whitepapers
General Database Specific Byte Code Deployment Annotations Features
Top Bugs Top Enhancements
woResponse