Sunday, 12 July 2015

Optimization techniques when Concatenating Strings


You can concatenate multiple strings using either + operator or String.concat()  or   StringBuffer.append(). Which is the best one interms of performance?
The choice depends on two scenarios,first scenario is compile time resolution versus run time resolution and second scenario is wether you are using StringBuffer or String. In general, programmers think that StringBuffer.append() is better than + operator or String.concat() method. But this assumption is not true under certain conditions.

1) First scenario: compile time resolution versus run time resolution
Look at the following code StringAppendTest .java and the output.

package com.citruspay;

public class StringAppendTest {

    /**
     * This class shows the time taken by string concatenation at compile time
     * and run time.
     */

    public static void main(String[] args) {

        // Test the String Concatination
        long startTime = System.currentTimeMillis();

        for (int i = 0; i < 5000; i++) {
            String result = "This is" + "testing the" + "difference"
                    + "between" + "String" + "and" + "StringBuffer";
        }

        long endTime = System.currentTimeMillis();

        System.out.println("Time taken for string concatenation using + operator : "
                        + (endTime - startTime) + " milli seconds");

        // Test the StringBuffer Concatination
        long startTime1 = System.currentTimeMillis();

        for (int i = 0; i < 5000; i++) {

            StringBuffer result = new StringBuffer();

            result.append("This is");
            result.append("testing the");
            result.append("difference");
            result.append("between");
            result.append("String");
            result.append("and");
            result.append("StringBuffer");
        }

        long endTime1 = System.currentTimeMillis();

        System.out.println("Time taken for String concatenation using StringBuffer : "
                        + (endTime1 - startTime1) + " milli seconds");
    }
}

The output of this code
Time taken for String concatenation using + operator : 0 milli seconds
Time taken for String concatenation using StringBuffer : 50 milli seconds

Interestingly the + operator is faster than StringBuffer.append() method. Let us see why?

Here the compiler does a good job of optimization. Compiler simply concatenates at compile time as shown below. It does compile time resolution instead of runtime resolution, this happens when you create a String object using 'new' key word.
before compilation:
String result = "This is"+"testing the"+"difference"+"between"+"String"+"and"+"StringBuffer";
after compilation
String result = "This is testing the difference between String and StringBuffer";
String object is resolved at compile time where as StringBuffer object is resolved at run time. Run time resolution takes place when the value of the string is not known in advance where as compile time resolution happens when the value of the string is known in advance. Here is an example.
Before compilation:
public String getString(String str1,String str2) {
            return str1+str2;
}

After compilation:
           return new StringBuffer().append(str1).append(str2).toString();
This resolves at run time and take much more time to execute.

2) Second scenario: Using StringBuffer instead of String
If you look at the following code, you will find StringBuffer is faster than String for concatenation which is opposite to above scenario.

package com.demo;

public class StringBufferAppendTest {

    public static void main(String[] args) {

        // Test the String Concatenation using + operator
        long startTime = System.currentTimeMillis();

        String result = "hello";

        for (int i = 0; i < 1500; i++) {
            result += "hello";
        }

        long endTime = System.currentTimeMillis();

        System.out.println("Time taken for string concatenation using + operator : "
                    + (endTime - startTime) + " milli seconds");

        // Test the String Concatenation using StringBuffer
        long startTime1 = System.currentTimeMillis();

        StringBuffer result1 = new StringBuffer("hello");

        for (int i = 0; i < 1500; i++) {
            result1.append("hello");
        }

        long endTime1 = System.currentTimeMillis();

        System.out.println("Time taken for string concatenation using StringBuffer :  "
                    + (endTime1 - startTime1) + " milli seconds");
    }
}

The output of the code is:
Time taken for string concatenation using + operator : 280 milli seconds
Time taken for String concatenation using StringBuffer : 0 milli seconds

It shows StringBuffer.append() is much more faster than String. Why?
The reason is both resolve at runtime but the + operator resolves in a different manner and uses String and StringBuffer to do this operation.

Optimization by initializing StringBuffer
You can set the initial capacity of StringBuffer using its constructor this improves performance significantly. The constructor is StringBuffer(int length), length shows the number of characters the StringBuffer can hold.

You can even set the capacity using ensureCapacity(int minimumcapacity) after creation of StringBuffer object. Initially we will look at the default behavior and then the better approach later.

The default behavior of StringBuffer:
StringBuffer maintains a character array internally.When you create StringBuffer with default constructor StringBuffer() without setting initial length, then the StringBuffer is initialized with 16 characters. The default capacity is 16 characters. When the StringBuffer reaches its maximum capacity, it will increase its size by twice the size plus 2 ( 2*old size +2).

 If you use default size, initially and go on adding characters, then it increases its size by 34(2*16 +2) after it adds 16th character and it increases its size by 70(2*34+2) after it adds 34th character. Whenever it reaches its maximum capacity it has to create a new character array and recopy old and new characters. It is obviously expensive. So it is always good to initialize with proper size that gives very good performance.

I tested the above StringTest4.java again with two StringBuffers, one without initial size and other with initial size. I added 50000 'hello' objects this time and did not use the + operator. I initialized the second StringBuffer with StringBuffer(250000).

The output is :
Time taken for String concatenation using StringBuffer with out setting size: 280 milli seconds
Time taken for String concatenation using StringBuffer with setting size: 0 milli seconds

It shows how effective the initialization of StringBuffer is. So it is always best to initialize the StringBuffer with proper size.

Key Points

  • Create strings as literals instead of creating String objects using 'new' key word whenever possible.
  • Use String.intern() method if you want to add number of equal objects whenever you create String objects using 'new' key word.
  • + operator gives best performance for String concatenation if Strings resolve at compile time.
  • StringBuffer with proper initial size gives best performance for String concatenation if Strings resolve at run time.
We appreciate and welcome your comments on this section.

No comments:

Post a Comment