Tuesday, 27 August 2013

Java Collections : Internal Working

Java Collections:

Going for an Job Interview? be prepared with Java Collections framework. Java Collection framework is the most preferred topic by interviewers. It gives them the idea of how much effort the interviewee as taken to understand this framework and clear the interview.

And guess what, similar questions are now also asked in US for H1B visa holders, or MS post graduates looking for job.

This blog explains internal working of the Collection framework. For Interview Question on Collection refer Java Collection Interview Questions

For Java Interview questions for 2 to 5 years Senior Developer refer this blog.

Recently Blockchain has also made place in Java Interviews. Blockchain internally is also a Data Structure. To know more click here : Implementing Blockchain using Java

Click Here for Core Java Objective Questions and Answers

Custom implementation of Collections and Data Structure:

Java Collection internally uses the primitive and core elements like Arrays and datastructures like  Linked List, Tree etc. So if you are asked a question to explain the internal working of any of the Collection classes, don't be surprised. Be it an interview for an Junior Java developer or even for an Architect, Java Collection is always something that you will have on you plate.

Java provides many collection classes that can be used to store data. Knowing which collection class to use for best performance and optimum result is the key.

First the basics.
Below two image shows the complete hierarchy of interfaces and classes present in the Collection framework.

The Collections come in basic four flavors:
Lists : List of things ( Classes that implement List Interface)
Sets : Unique things ( Classes that implement Set Interface)
Maps : Things with unique id ( Classes that implement Map Interface)
Queues : Things arranged in order ( Classes that implement Queue Interface)

The blue ones are the Interfaces and the red ones are the implementation classes

Below is Map. A map is a special type of collection.



          Below table shows the different concrete classes implementing these interfaces



Maps
Sets
Lists
Queues
Utilities
HashMap
HashSet
PriorityQueue
Collections
Hashtable
LinkedHashSet
Vector
Arrays
TreeMap
TreeSet
LinkedList
LinkedHashMap


List

List interface promises that the elements maintain the order in which they are added. That means it is a ordered Collection. List implementations do not sort the elements.

Lets see each implementation of List

ArrayList

Ordered
ArrayList works on the principle of creating a array and adding elements to it. So basically it internally just creates and array and adds elements to it. Remember, in case of array the size of the array has to be defined during its creation.
Integer [] a = new Integer[5];

Then how come ArrayList is dynamic? We will see this below.

ArrayList class has a member variable elementData which is a Object array;
Object[] elementData;
When we do this:
List l = new ArrayList();

the array elementData is initialised with a size of 10

There are 2 important method in ArrayList:

add(E element)
When a new element is added the capacity of the array elementData is checked and if it is completely filled that is all element 10 are filled, a new array is created with a new capacity by using Arrays.copyOf. If the elementData array is not exhausted the new element is added in the array.
So adding a element in a array may take more time as a completely new array needs to be created with greater capacity and the data in the old array is transferred into the new array.

The second method adds the element on a specific location in an array.

add(index i, E element)
On adding a element at a particular index in ArrayList, ArrayList checks if a element is already present at that index. If no than the parameter is added at that index, otherwise a new array is created with the index kept vacant and the remaining element shifted to right.
For Eg:

List<Integer> l = new ArrayList<Integer>();
l.add(1);
l.add(2);
l.add(1,3);
l.add(4);

for(int i:l){
     System.out.println(i);
}


Output
1
3
2
4
Here above we are trying to add 3 and position 1, since position 1 already has value '2'. A new array is created with value at index 1 kept vacant and the remaining elements are shifted to right. Than the element 3 is added at index 1.

get(int index)
The element present at that index in that array is returned. This is very fast. Why? Because there is no need to traverse through the nodes as is the case with other collection classes.

When to use ArrayList
When the requirement is fetch data frequently and adding data is not so frequent activity.

When not to use ArrayList
When the list is updated frequently

A better way to understand internal working of ArrayList is to create one yourself.
Follow link Internal Working of ArrayList. Here a custom Arraylist is created with basic add and get operations




Linked List

Ordered
As opposed to ArrayList, LinkedList does not store elements in a array. Linked List is a actually a collection of objects linked together using a reference to each other. So in assence a java.util.LinkedList is a Doubly Linked List.
For more details about the Data structure of Singly Link List and Doubly Linked List refer this link Internal Working of LinkedList



LinkedList has a Node class called as Entry. It has the below structure
 class Entry {
 E element;
 Entry next;
 Entry previous;
 }

LinkedList class also has a instance variable of type 'Entry' called 'header'. This is the 'start' element or node of the list.

add(E element )
Every Time we call add(var);  a new instance of 'Entry' class is created and attached at the end of the list.

add(var, position)
Inserts the specified element at the specified position in this list.
Shifts the element currently at that position (if any) and any subsequent elements to the right. This is not as fast as ArrayList.

get(int index)
It iterates through the list and returns the element. This is very expensive and time consuming as opposed to ArraList.get(int index)

When to use LinkedList
When the elements are getting added and removed frequently.

When not to use LinkedList
When you want to access or fetch a element by index.



Map

As we saw above, the List allows us to add values in it. But to find that value you need to traverse through the complete list. Map is a special collection provided by Java. It helps to find a added element quickly.

Instead of just adding the value or element in a collection, Map allows you to add 2 elements. One called as key and the other as value.

Consider the key as employee id and the value as the employee object. The logic to save and fetch a value is based upon the key.

Lets see different implementations of Map.

HashMap

HashMap works on the principal of hashing. It stores values in the form of key-value pair and to access a value you need to provide the key.
HashMap is basically a 2 dimensional Singly Linked List. It can grow in both directions.
For efficient use of HashMap the 'key' element should implement equals() and hashcode() method. equals() method define that two objects are meaningfully equal. hashcode() helps HashMap to arrange elements separately in a bucket. So elements with same hascode are kept in the same bucket together. Now what the hell is a bucket?

Observe the diagram below. All elements that are stored horizontally are said to be in the same bucket.

So when we want to fetch a element using get(K key), HashMap first identifies the bucket in which all elements of the same hascode as the hashcode of the 'key' passed are present. Now since it knows the bucket, it will only have to traverse through that bucket to fetch the actual object.

Then it uses the equals() method to identify the actual object present in the bucket.

Lets see how HashMap implements this logic internally.

Hash Map
Hash Map Internal Structure
For fast access to a value HashMap places a element (both key and value) in a SinglyLinkedList(Bucket). All the elements that have the same hascode are placed in the same SinglyLinkedList. The number of SinglyLinkedList(buckets) depends upon how many objects are present with different hashcode. To hold these buckets together a array is used. The size of the array is initially defined to 12. And it changes as new elements with different hascodes are added. Lets see the pictorial view.
The structure of the 'Entry' class used above.

class Entry {
   K key;
   V value;
   Entry next;
   int hash;
}

HashMap also has some more variables which define the initial size of the array.

DEFAULT_LOAD_FACTOR = 0.75f;
DEFAULT_INITIAL_CAPACITY = 16;

Entry[] table = new Entry[DEFAULT_INITIAL_CAPACITY];
Below video explains the insertion of key values according to equals and hashcode result.





For more detailed and internal understanding of HashMap refer Internal Working of HashMap blog. This blog creates a CustomHashMap and explains step by step process.


TreeMap

Sorted
TreeMap is a structure which is designed to work as a Red - Black - Tree. Here each node has only two child nodes and the insertion is a tree happens same as the insertion strategy of Java Binary Search Tree explained here.

If you see below both the Binary Search Tree and Tree Map look same except for the node. The Tree Map has Key and Value in the node, while the BST only has the value.

         

 



So the elements in a TreeMap are always sorted.
The elements added is a TreeMap should implement comparable and provide implementation of compareTo method. On the basis of this TreeMap decides wether the node is smaller or greater than other node. If Comparable is not implemented, a class which is Comparator should be passed in the constructor of TreeMap. If both Comparable and Comparator are present TreeMap uses Comparator implementation.

TreeMap internally maintains a List of Nodes with each node being a Entry<K,V> class which is actually a implementation of Map.Entry<K,V> interface.
The basic structure of this Entry class is 

class Entry{
 K key;
 V value;
 Entry left = null;
 Entry right = null;
 Entry parent;
 boolean color = BLACK;
}


Entry<K,V> of TreeMap has internally three entries of Entry<K,V> class : left,right,parent;
When we use put(K,V) method it checks if root is pointing anywhere if no it makes the instance of Entry<K,V> and point to root;
The constructor of Entry<K,V> takes key, value and parent. In this case parent is null;
For the next time we enter something using put(K,V) it first identifies the comparison mechanism to use. First it check the Comparator class is present or not . This class is passed when creating the instance of TreeMap. If not present it uses the Key's Comparable implementation.
It then traverse through root and compares each node with the node entered and depending upon the comparison places the node either left or right of the parent node.

Set


Set is a collection that can not contain duplicate values. So if we want to have a unique collection, Set is the obvious choice.

Set has three implementations HashSet, TreeSet and LinkedHashedSet.

Hashset

Non Duplicate
Un-Ordered
Hashset is a special case
HashSet internally uses HashMap. Yes thats true.
HashSet has a instance variable called 'map' which is a instance of HashMap.

add(E element)
When we add a value in Hashset, Hashset internally adds a value in 'map' by calling put(E,o);
where E that is the key is the element passed in add(E element) of HashSet and 'o' as the value which is a dummy Object creted by doing Object o = new Object; which is common for all key's entered in HashMap 'map'.
HashSet internally checks wether the Key that is 'element' is already present by calling the equals method of 'element'.
This method returns false if the Key is already present in HashMap.

TreeSet

Non Duplicate
Sorted
Like HashSet uses HashMap internally, TreeSet uses TreeMap internally. TreeSet ensures that elements added are not duplicate and they are sorted. Sorting is done using TreeMap.

add(E element)
When we add a value in TreeSet, TreeSet internally adds a value in 'map' by calling put(E,o);
where E that is the key is the element passed in add(E element) of TreeSetand 'o' as the value which is a dummy Object creted by doing Object o = new Object; which is common for all key's entered in TreeMap 'map'.
TreeSet internally checks wether the Key that is 'element' is already present by calling the equals method of 'element'.
This method returns false if the Key is already present in TreeMap.

LinkedHashSet

Non Duplicate
Ordered
LinkedHashSet extends HashSet that means it is a HashMap without duplicates. But the difference here with HashSet is that LinkedHashSet is ordered.

It uses a Doubly Linked List that runs through the Set holding the order together.

Custom implementation of Collections and Data Structure:
Java Custom ArrayList
Java Custom LinkedList
Java Custom HashMap

For Interview questions related to Collections refer
Java Collection Interview Questions
For understanding Lambda Expression and its use in Collection refer Lambda Expression Java 8


26 comments:

  1. Hello Hunaid,
    Nice article I have a query regarding increment of ArrayList size "is any specific algorithm followed to increase size of ArrayList or it is always increased by 10"

    ReplyDelete
    Replies
    1. Hi Ankit,
      Good qurstion!
      The answer is NO. The size is not always increased by 10. The size of array increases as
      10 16 25 38 . This is calculated by a computation like
      int newSize = ((oldsize*3)/2)+1;

      Please let me know if you get the answer.

      Delete
    2. After jdk7, The size of array list increased 50% of oldCapacity by using right shift operator.
      This is calculated by a computation like
      int oldCapacity = elementData.length;
      int newCapacity = oldCapacity + (oldCapacity >> 1);

      Delete
    3. I have a doubt for array list size increasing. If we use initial capacity as 0,then as you specified in the formula oldCapacity=0 so 0+1>>1 result in 0 only. How it will be increased can you explain please

      Delete
    4. When capacity is 0 i.e. there is no element in Object[] elementData (internal), the default capacity is set to DEFAULT_CAPACITY which is 10, after that when 10 elements are filled then capacity is increased by 50%

      Delete
  2. Thx a ton man, please explain internal related to Set as well.

    ReplyDelete
    Replies
    1. Hello Manish,
      Thanks for you comment.
      I have added the details of Set as required.

      Thanks

      Delete
  3. thank u for these things.

    ReplyDelete
  4. Please write full code of ArrayList, also you are using foreach loop and using int without making List generics(Like List) please look this how you can do this.
    one more thing if please clear my doubt when we make class reference variable final during instantiation then which type of restriction provided by this Like
    Class A{}
    class B extends A{
    public static void main(String[] args){
    final A a=new B();
    }

    }

    ReplyDelete
    Replies
    1. Hello Bhagwan,

      I corrected the mistake of generics you mentioned above. Thanks for that. Do you want me to write a custom arraylist program?

      Delete
    2. Hello Bhagwan,
      I have created a custom array list here http://www.java-redefined.com/2014/03/java-custom-arraylist.html

      Delete
  5. Hi,
    I see a typo here in Linked List section.
    "When to use ArrayList
    When the elements are getting added and removed frequently.

    When not to use ArrayList
    When you want to access or fetch a element by index."

    In the above statement array list should be replaced by Linked List

    ReplyDelete
  6. Hi Hunaid,

    I have a question in ArrayList. When the arraylist in initialized the default size is 10.
    1)When new elements added to the arraylist, on reaching the arraylist value by 75%, say while adding the 7th value in arraylist will it expand the length.
    2)I have an arraylist al,i have added 3 values in the arraylist 10,20,30. I am trying to add the next value in the 10th index ,but its throwing arrayindexout of bound exception.
    Why it was not allowed to add the values using the index value?

    Regards,
    Mansur

    ReplyDelete
    Replies
    1. 1)When new elements added to the arraylist, on reaching the arraylist value by 75%, say while adding the 7th value in arraylist will it expand the length.
      Answer: Yes, you are right.

      2)I have an arraylist al,i have added 3 values in the arraylist 10,20,30. I am trying to add the next value in the 10th index ,but its throwing arrayindexout of bound exception.
      Why it was not allowed to add the values using the index value?
      Answer: This is a valid question Mansur.
      When we create a ArrayList the internal array has the size 10. But their is also a element in ArrayList called int size, its value is 0 and only increments on adding elements to the arraylist. The same 'size' attribute is used when we call the method size().
      Now if we try to add at an element at 10th index and the 'size' attribute is still 3, check is done as below
      if (index > size || index < 0)
      throw new IndexOutOfBoundsException;
      Hence you get the Exception.

      Please let me know if the post helped.

      Delete
    2. Hi

      I have query regarding add(index, value) in arraylist.

      1. initial size of arraylist is 0 and trying to put element in arraylist using add(2,3) then is it throw exception or not and why?

      2. if above example throw exception then what is the use of add(index, value)

      Delete
    3. Hi Nitin,

      Please check for the jdk version your are using if it greater or equal to jdk 1.7.0_40 then default size for ArrayList will be 0 but for previous versions it's 10.

      Refer below constructor for java 1.7.0_40

      private static final Object[] EMPTY_ELEMENTDATA = {};

      public ArrayList() {
      super(); this.elementData = EMPTY_ELEMENTDATA;
      }


      but for 1.6.30 it is like below

      public ArrayList() {
      this(10);
      }

      Delete
  7. Hi Java_sau,

    java 1.7.x.. also default arraysize 10.
    EMPTY_ELEMENTDATA={} //null, it means no default element data.

    you can see below one FYI

    /**
    * Default initial capacity.
    */
    private static final int DEFAULT_CAPACITY = 10;

    /**
    * Shared empty array instance used for empty instances.
    */
    private static final Object[] EMPTY_ELEMENTDATA = {};

    /**
    * The array buffer into which the elements of the ArrayList are stored.
    * The capacity of the ArrayList is the length of this array buffer. Any
    * empty ArrayList with elementData == EMPTY_ELEMENTDATA will be expanded to
    * DEFAULT_CAPACITY when the first element is added.
    */
    /**
    * Constructs an empty list with an initial capacity of ten.
    */
    public ArrayList() {
    super();
    this.elementData = EMPTY_ELEMENTDATA;
    }

    ReplyDelete
  8. i have a one question.. can we create custom ArrayList or HashMap.
    can't we use in Real time scenario... mean not for example sake, can we use in real time applications.
    if any one customize their own ArrayList or HashMap.. then wht is use of existing one..
    which one will give the best results.. how t get to know, mean how to measure the best one data structure.. if existing one is the best.. then why customizing behavior is given to any one...
    why most of companies[like VMWARE] in Interview focusing/asking create custom ones applying minds to create our own data structures...

    ReplyDelete
    Replies
    1. Hi Sasi,
      Sticking to java defined collection is the best option. The only reason companies ask about creating custom ArrayList or HashMap is to know whether you understand the internally working of it and not just the theory part.

      Hope it answers your question.

      Delete
  9. Superb Explanation, Can you please explain the actual internal working of Collections.sort() method.I want to know how it works actually. An overview is given in many links like basically sort()method use merge sort and quick sot, but for implementation performance it use insertion sort if fewer than seven elements are being sorted. Can you please explain more than this given detail ? Please

    ReplyDelete
  10. Hi Hussain,

    Superb explanation,it clears my whole doubts regarding collections.

    ReplyDelete
  11. Hi Hussain, can you please explain internals of Vectors and PriorityQueue?

    ReplyDelete
  12. Very nice article, just wanted to correct at one point

    When we do this

    List l = new ArrayList();

    the array elementData is initialized with a size of 10.

    I believe the size here is initialized to 0, and it increases only when we enter the 1st element to it, by using ensureCapacityInternal method.

    Pls correct me if I am wrong here.

    ReplyDelete
  13. Hello Hussain,
    Can you please tell me what exactly happens when in HashMap key collisions occurs?
    Is node get replaced or node become same and new node value gets assigned to old one?
    Btw this is one of the best article for collections. Thanks a lot.

    ReplyDelete

Share the post