Tuesday, 21 July 2015

Optimization techniques in Lists

            List types represent an ordered collection of objects. ArrayList, Vector, Stack and LinkedList are the List implementation classes. All  List types support basic operations - adding objects, removing objects, accessing objects and iterating through the list. So which one to choose since all the list implementations support these basic operations? Performance is different for each class based on specific operations. So your choice is driven by the performance and the requirement options. Your requirement could be

1. Thread safe collection
2. Size of collection (large or small collection)
3. Type of operation ( adding, removing, accessing or iterating )


              If you want your collection to be thread safe then Vector or Stack must be used because both have synchronized methods. While ArrayList and LinkedList are not thread safe. Stack is meant for specific LIFO (last in - first out) operations, this can be filtered down based on this specific requirement. If you don't want your collection to be thread safe then you have can choose between ArrayList or LinkedList. General concept from performance point of view is that ArrayList gives better performance when accessing and iterating objects whereas LinkedList gives better performance when adding and removing objects. Although true in most cases, sometimes there is an exception.
conclusion is :
Type of operation ArrayList with out initialization ArrayList with initialization Vector with out initialization Vector with initialization LinkedList
Adding objects at end fast (but slower than initialization) fast fast (but sligtly slower than initialization and slower than ArrayList because of synchronization) fast(but sligtly slower than ArrayList because of synchronization) fast ( but slightly slower than ArrayList and Vector)
Adding objects at middle slow ( slower than when adding objects at last) slow ( slower than when adding objects at last) slow ( slower than when adding objects at last) slow ( slower than when adding objects at last) worse( worse than every operation)
Adding objects at first slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle)

                       The initial size for ArrayList and Vector is 10. ArrayList increases its capacity by half approximately whenever its capacity reaches maximum (10) but Vector increases its capacity by double whenever its capacity reaches maximum. That is the reason why ArrayList takes more time than Vector if it is not initialized with proper size though ArrayList is not synchronized. As soon as it reaches its maximum capacity when adding objects, it creates one more bigger array ( with 15 capacity for ArrayList approximately and 20 capacity for Vector) and copies the previous and new objects into new array. Obviously it is expensive to create new array and copy objects. So best approach is to initialize the ArrayList and Vector with proper size using constructors or using ensureCapacity(int capacity) which gives good performance.  If you initialize with proper size then the ArrayList gives better performance than Vector.
                        ArrayList with initialization gives better performance than others because its methods are non-synchronized. Synchronized methods are bit expensive because JVM has to lock the objects whenever it finds synchronized methods.
                        Vector takes slightly more time than ArrayList when you use JDK1.3 Hotspot JVM ,if you are not sure that whether your collection needs to be thread safe or not then it is better to use Vector to have higher safety.
                       You can convert an ArrayList as thread safe collection using Collections.synchronizedList(ArrayList object) but it is more expensive than using a Vector.
                      ArrayList and Vector maintain internal Object array ( Object[]) to store objects. So whenever you add an object, they add it to the end of the array which is  fine as long as it doesn't reach its maximum capacity. If you want to add an object at any other position, it creates a new object array and recopies all the objects which is expensive. That is the reason why adding objects at middle and beginning of collection takes a long time than when it is adding at the end
                     LinkedList gives good performance when adding elements at the end and beginning but it is worse when adding objects at middle because it needs to scan the node whenever it needs to add an object. LinkedList cannot be initialized.

The constructors for ArrayList and Vector to initialize with proper size are
ArrayList( int initialcapacity)
Vector( int initialcapacity)
Vector( int initialcapacity, int capacityIncrement)
You can give incremental capacity in Vector to change the default increment in capacity.
Here is the ListAddTest.java source code.
 ===================================================================
package com.test;

import java.util.List;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.Vector;

public class ListAddTest {

    private static final int NUM = 50000;
    private static String[] objs = null;

    public void addLast(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < NUM; i++) {            list.add(objs[i]);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for adding Objects at End: " + (endTime - startTime));
    }
   public void addFirst(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < NUM; i++) {
            list.add(0, objs[i]);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for adding Objects at First : " + (endTime - startTime));
    }
    public void addMiddle(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < NUM; i++) {
            list.add(i / 2, objs[i]);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for adding Objects at Middle : " + (endTime - startTime));
    }
    public void doTest(List list) {
        addLast(list);
        clear(list);
       
        addMiddle(list);
        clear(list);
       
        addFirst(list);
        clear(list);
    }
    public void clear(List col) {
        if (!col.isEmpty())
            col.clear();
    }
    public static void main(String[] args) {
        objs = new String[NUM];
        for (int i = 0; i < NUM; i++) {
            objs[i] = "Object" + i;
        }
        ListAddTest col = new ListAddTest();
        ArrayList collection1 = new ArrayList();
        col.doTest(collection1);

        ArrayList collection1A = new ArrayList(NUM);
        col.doTest(collection1A);

        Vector collection2 = new Vector();
        col.doTest(collection2);

        Vector collection2A = new Vector(NUM);
        col.doTest(collection2A);

        LinkedList collection4 = new LinkedList();
        col.doTest(collection4);
    }
}
 ====================================================================
Here is the Output :
Time taken for adding Objects at End: 0
Time taken for adding Objects at Middle : 101
Time taken for adding Objects at First : 180
Time taken for adding Objects at End: 0
Time taken for adding Objects at Middle : 90
Time taken for adding Objects at First : 190
Time taken for adding Objects at End: 0
Time taken for adding Objects at Middle : 100
Time taken for adding Objects at First : 180
Time taken for adding Objects at End: 10
Time taken for adding Objects at Middle : 90
Time taken for adding Objects at First : 180
Time taken for adding Objects at End: 10
Time taken for adding Objects at Middle : 1091
Time taken for adding Objects at First : 0

====================================================================== 
Removing objects :
  1. All classes take approximately same time when removing objects from end
  2. ArrayList and Vector give similar performance with slight difference because of JDK1.3 Hotspot JVM.
  3. LinkedList  gives worst performance when removing objects from middle (similar to adding objects at middle).
  4. LinkedList gives better performance when removing objects from the beginning.
  5. Only LinkedList gives better performance when removing objects from the beginning.
======================================================================
Here is the ListRemoveTest.java source code
package com.test;

import java.util.List;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.Vector;
import java.util.Arrays;

public class ListRemoveTest {

    private static final int NUM = 20000;
    private static Object[] objs = null;

    public void removeLast(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = NUM; i > 0; i--) {
            list.remove(i - 1);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for removing Objects at End: " + (endTime - startTime));
    }
    public void removeFirst(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < NUM; i++) {
            list.remove(0);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for removing Objects at First : " + (endTime - startTime));
    }
    public void removeMiddle(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < NUM; i++) {
            list.remove((NUM - i) / 2);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for removing Objects at Middle : " + (endTime - startTime));
    }
    public void doTest(List collection) {
        collection.addAll(getList());
        removeLast(collection);
        clear(collection);
 
        collection.addAll(getList());
        removeMiddle(collection);
        clear(collection);

        collection.addAll(getList());
        removeFirst(collection);
        clear(collection);
    }
    public void clear(List col) {
        if (!col.isEmpty())
            col.clear();
    }
    public List getList() {
        objs = new Object[NUM];
        for (int i = 0; i < NUM; i++) {
            objs[i] = new Object();
        }
        return Arrays.asList(objs);
    }
    public static void main(String[] args) {
        ListRemoveTest col = new ListRemoveTest();
      
        ArrayList collection1 = new ArrayList();
        col.doTest(collection1);

        Vector collection2 = new Vector();
        col.doTest(collection2);

        LinkedList collection4 = new LinkedList();
        col.doTest(collection4);
    }
}
=======================================================================
Here is the Output :
Time taken for removing Objects at End: 10
Time taken for removing Objects at Middle : 20
Time taken for removing Objects at First : 20
Time taken for removing Objects at End: 0
Time taken for removing Objects at Middle : 20
Time taken for removing Objects at First : 30
Time taken for removing Objects at End: 10
Time taken for removing Objects at Middle : 150
Time taken for removing Objects at First : 0
=======================================================================
The conclusion is :
  • ArrayList and Vector give best performance because they access objects using index. Vector takes slightly more time but it is negligible.
  • LinkedList gives worst performance  when accessing objects at end and middle because it has to scan nodes to access objects.
=========================================================================
Here is the ListAccessTest.java source code
package com.test;

import java.util.List;

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.Vector;
import java.util.Arrays;

public class ListAccessTest {

    private static final int NUM = 25000;
    private static Object[] objs = null;

    public void getFromLast(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = NUM; i > 0; i--) {
            list.get(i - 1);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for getting Objects at Last: " + (endTime - startTime));
    }
    public void getFromFirst(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < NUM; i++) {
            list.get(0);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for getting Objects at First : " + (endTime - startTime));
    }
    public void getFromMiddle(List list) {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < NUM; i++) {
            list.get(NUM / 2);
        }
        long endTime = System.currentTimeMillis();
        System.out.println("Time taken for getting Objects at Middle : " + (endTime - startTime));
    }
    public void doTest(List list) {
        list.addAll(getList());
        getFromLast(list);
        getFromMiddle(list);
        getFromFirst(list);
    }
    public void clear(List col) {
        if (!col.isEmpty())
            col.clear();
    }
    public static List getList() {
        objs = new Object[NUM];
        for (int i = 0; i < NUM; i++) {
            objs[i] = new Object();
        }
        return Arrays.asList(objs);
    }
    public static void main(String[] args) {
        ListAccess col = new ListAccess();

        ArrayList collection1 = new ArrayList();
        col.doTest(collection1);

        Vector collection2 = new Vector();
        col.doTest(collection2);

        LinkedList collection4 = new LinkedList();
        col.doTest(collection4);
    }
}
 =======================================================================
Here is the Output :
Time taken for getting Objects at Last: 0
Time taken for getting Objects at Middle : 0
Time taken for getting Objects at First : 10
Time taken for getting Objects at Last: 0
Time taken for getting Objects at Middle : 0
Time taken for getting Objects at First : 0
Time taken for getting Objects at Last: 222
Time taken for getting Objects at Middle : 420
Time taken for getting Objects at First : 0
=======================================================================
Iterating collection:
                  Iterating collection using all three types of classes ,ArrayList ,Vector and LinkedList gives similar performance because they need not do any extra work  they simply iterate one by one. So I did not give any benchmark results. You can use any Iterator . But using ListIterator gives more flexibility than Iterator and Enumeration. You can traverse both sides.
 Key Points :
  • Use ArrayList with proper initialization if you don't want thread safe for the collection whenever you  add/remove/access objects at end and middle of collection.
  • Use Vector with proper initialization if you want thread safe for the collection whenever you  add/remove/access objects at end and middle of collection.
  • Use LinkedList if you don't want thread safe for the collection whenever you  add/remove/access objects at beginning of collection.
  • Use synchronized LinkedList if you want thread safe for the collection whenever you add/remove/access objects at beginning of collection.
  • Use ListIterator than Iterator and Enumeration for List types

No comments:

Post a Comment