Set is a collection of unique objects, it doesn't allow duplicate objects and modification of existing objects. Set types also allow basic operations like adding objects, removing objects, accessing objects, iterating objects but do not allow modification. There are two implementations of the Set interface they are HashSet and TreeSet.
HashSet gives better performance than TreeSet because , TreeSet is an ordered collection of objects and the objects are sorted while they are inserted in the TreeSet where as in case of HashSet objects are added in an adhoc manner. It is expensive to do all basic operations in TreeSet because it has to compare and sort every object. We can get better performance by using a HashSet and converting it to a TreeSet later on.
HashSet and TreeSet are backed by HashMap and TreeMap respectively. Whenever we use a HashSet we can specify an initial capacity and load factor using constructors. The default size for a HashSet is 11 and it's load factor is 0.75. Load factor determines at which capacity HashSet has to be resized. It's internal structure will become double in size when it reaches it's maximum capacity based on load factor. HashSet scales well when it is initialized with proper size and default load factor. When you know the the number of objects to be added, it is better to initialize with that capacity and put load factor as 1.0f. The objects in HashSet are stored and retrieved through hash code which provides constant look up time. I did not give any bench marks for these two Sets because We don't have many options here to compare and evaluate. We have two Sets to choose. Use TreeSet if you want sorted collection otherwise use HashSet.
The constructors for the HashSet to initialize with proper size are:
HashSet(int initialcapacity)
HashSet(int initialcapacity, float loadfactor)
HashSet gives better performance than TreeSet because , TreeSet is an ordered collection of objects and the objects are sorted while they are inserted in the TreeSet where as in case of HashSet objects are added in an adhoc manner. It is expensive to do all basic operations in TreeSet because it has to compare and sort every object. We can get better performance by using a HashSet and converting it to a TreeSet later on.
HashSet and TreeSet are backed by HashMap and TreeMap respectively. Whenever we use a HashSet we can specify an initial capacity and load factor using constructors. The default size for a HashSet is 11 and it's load factor is 0.75. Load factor determines at which capacity HashSet has to be resized. It's internal structure will become double in size when it reaches it's maximum capacity based on load factor. HashSet scales well when it is initialized with proper size and default load factor. When you know the the number of objects to be added, it is better to initialize with that capacity and put load factor as 1.0f. The objects in HashSet are stored and retrieved through hash code which provides constant look up time. I did not give any bench marks for these two Sets because We don't have many options here to compare and evaluate. We have two Sets to choose. Use TreeSet if you want sorted collection otherwise use HashSet.
The constructors for the HashSet to initialize with proper size are:
HashSet(int initialcapacity)
HashSet(int initialcapacity, float loadfactor)
Key Points :
- Use HashSet for maintaining unique objects if you don't want thread safe for the collection for all basic(add/remove/access) operations otherwise use synchronized HashSet for thread safe.
- Use TreeSet for ordered and sorted set of unique objects for non-thread safe collection otherwise use synchronized TreeSet for thread safe.
No comments:
Post a Comment