Friday, 3 July 2015

Custom Java HashMap - Internal Working

HashMap is a Collection class in Java that stores objects with the help of hascode and equals. It is said to be faster in retrieving objects.
Lets create a custom implementation of HashMap  and understand how it works. To understand HashMap you need to first understand Singly and DoublyLinkedList.

HashMap stores objects in form of Key and Value. A value can be fetched by passing the Key.

HashMap stores objects into different buckets(LinkedList). The object with same hashcode are stored in same LinkedList. Remember that the objects are stored in key-value pairs. The hashcode of key is used and not the hashcode of value. Only Key alone is required to put and get the object. Value is just a object placed against the key, So depending upon how many objects(Keys) are present with different hashcodes, that many LinkedList gets created.



An array is used to bind all these LinkedList together. The elements of the array have the pointer to the start of the linked list. Below diagram gives a more clearer picture.


Ok. Lets start with the structure of the Node. This is the class that holds the key value pair together in the LinkedList. The Array shown in the above diagram is also created to keep elements of type Node. In java.util.HashMap the array is called 'Entry'

public class Node<T,K> {
 public T key;
 public K value;
 public Node<T,K> next;
 public Node<T,K> prev;
}
It has key, value and pointers to next and prev Nodes. So this can become the node of a Doubly Linked List

Below we are declaring array of type Node which holds all the doubly linked list together. Basically the 'next' of the element in array points to the first node of a Linked List

public class CustomHashMap<K, V> {

 // Array holding/pointing to the starting elements of the doubly linked
 // lists
 Node<K, V> arrayOfStartNodes[] = null;
}


Below method  findArrayElement is used while putting/adding data in HashMap and while getting data from HashMap. It takes a parameter of type Key. It iterates thorugh the array and finds out the element whose key's hashcode matches with the hashcode of the passed Key parameter.
 
private Node<K, V> findArrayElement(K key) {

  System.out.println("Finding array index for key with hashcode:"+ key.hashCode());

  // Iterate through array
  for (Node<K, V> n : arrayOfStartNodes) {
   if (n != null) {
    // return the node whose 'key' has the same hashcode as the
    // passed 'Key'
    System.out.println("-Matching passed key hasCode: "+ key.hashCode() + " with current element HasCode:" + n.key.hashCode());

    if (n.key.hashCode() == key.hashCode()) {
     System.out.println("--Found Array Index");
     return n;
    } else {
     System.out.println("--Not Found. Go to next index of Array");
    }
   }
  }
  return null;
 }

Below method is used to add the Key-Value pair in HashMap. It is similar to put method of java.util.HashMap. The method first calls the findArrayElement method to find out the element where the key-value pair should be placed. If matching elments key.hashcode is not found in array a new element is created with key as same as the key of the object being saved. Value is kept null in the array element as it is not required.

After the element is created/found in array. The key-value pair is attached to the LinkedList which is pointed by the found element array. Again if no Linked List is present for that element in Array and a new LinkedList is created. This element is attached at the end of the Linked List.

void add(K key, V value) {
  System.out.println("-----------------PUT-------------------------");
  System.out.println("Adding object with Key hashcode as : "+ key.hashCode());
  // Check if array is null
  if (arrayOfStartNodes == null) {
   System.out.println("Array is Null, Instantiating Array");
   // Instantiate the array
   arrayOfStartNodes = new Node[7];
  }
  // Find the index in the array where the Key,value pair can be added to
  // the doubly linked list
  Node<K, V> startNodeOfList = findArrayElement(key);
  // If no matching index node is found, create one
  if (startNodeOfList == null) {
   System.out.println("No element with same hascode found in Array. Creating a new element in Array");
   Node<K, V> startNode = new Node<K, V>();
   startNode.key = key;
   startNodeOfList = arrayOfStartNodes[arrIndex++] = startNode;
  }

  // Add the element key and value as a node to the end of the linked list
  addInList(key, value, startNodeOfList);
  

 }

Below method is used to iterate the Linked List and attached the new Node at the end of it.

public void addInList(K key, V value, Node<K, V> startNodeOfList) {
  System.out.println("Adding object in Linked List with hashCode as : "+ key.hashCode());
  // Create new node
  Node<K, V> p = new Node<K, V>();
  p.key = key;
  p.value = value;
  // Check if List is empty, If yes than point 'start' to the newly
  // created node
  if (startNodeOfList.next == null) {
   System.out.println("No Linked list present for HashCode : "+ key.hashCode());
   System.out.println("Attaching the New node to Next of the found Array element");
   startNodeOfList.next = p;
  } else {
   System.out.println("Linked list present for HashCode : "
     + key.hashCode());
   // If the list is not empty, than traverse till the end of the list
   // and add the node
   Node<K, V> s = startNodeOfList;
   System.out.println("Iterate through Linked List to attach Node at the end");
   while (s.next != null) {
    s = s.next;
   }
   s.next = p;
   p.prev = s;
  }

 }

Below method is used to get the Value based upon the passed key. This is a very important method. This is what makes HashMap different then other Collections.

First we use findArrayElement, to find out the correct bucket, that is the element in array which can point us to correct Linked List. Once the LinkedList is identified the correct key-value pair is identified by iterating though the LinkedList and comparing the key of the passed parameter with the key of key-value pairs in the LinkedList. The equality is checked by using the equals() method.


private V get(K key) {
  System.out.println("-----------------GET-------------------------");
  // Iterate through array to find out the key matches
  Node<K, V> startNodeInArray = findArrayElement(key);
  if (startNodeInArray != null) {
   // get the first node in the linked list
   System.out.println("-Iterate thorugh Linked List");

   Node<K, V> nodeTobeFound = startNodeInArray.next;
   boolean found = Boolean.FALSE;

   System.out.println("-Matching passed key value : " + key.toString()
     + " with current element value : "
     + nodeTobeFound.key.toString());
   while (nodeTobeFound != null && !nodeTobeFound.key.equals(key)) {
    System.out
      .println("--Not Found. Go to next Node in LinkedList");
    nodeTobeFound = nodeTobeFound.next;
    System.out.println("-Matching passed key value : " + key.toString()
      + " with current element value : "
      + nodeTobeFound.key.toString());
   }
   if (nodeTobeFound != null) {
    System.out.println("--Found Node in Linked List");
    return nodeTobeFound.value;
   }
  }
  return null;
 }


Lets now test our code. The key being used in test code is of type EmployeeId class, having Integer empId. Since the EmployeeId objects will be used as Keys, we have implemented hashcode() and equals() method. The equals checks for the empId to be equal. Hashcode() returns the int depending upon the value of empId. it return 1 if empId is 2 or 3, it return 2 if empId is 4 or 5 and so on. By this it is guaranteed that the size of LinkedList will not exceed by 2.
public class EmployeeId {
 private Integer empId;

 public EmployeeId(Integer empId) {
  super();
  this.empId = empId;
 }

 public Integer getEmpId() {
  return empId;
 }
 
 public boolean equals(Object e){
  EmployeeId e1=(EmployeeId)e;
  return e1.empId.equals(this.empId);
 }
 
 /**
  * For empId 2 and 3 the hascode will be 1, for 4 and 5  it will be 2
  */
 public int hashCode(){
  return empId/2;
 }
 
 public String toString(){
  return empId.toString();
 }
 
}
Below is the main method which creates object of our CustomHashMap and adds key-values to it.
The value could be anything from a normal String to a custom class. We have used here Employee class. Having id and name attributes.


public static void main(String[] args) {
  CustomHashMap<EmployeeId, Employee> hashMap = new CustomHashMap<EmployeeId, Employee>();
  
  hashMap.add(new EmployeeId(2), new Employee(1001, "A"));
  hashMap.add(new EmployeeId(4), new Employee(2002, "B"));
  hashMap.add(new EmployeeId(5), new Employee(11425, "C"));
  

  EmployeeId searchKey = new EmployeeId(5);

  Employee foundEmployee = hashMap.get(searchKey);
  if (foundEmployee != null) {
   System.out.println(foundEmployee.getName());
  } else {
   System.out.println("Employee not found");
  }
 }

The whole process of adding 3 key-value pairs is explained in the pictorial view below:

hashMap.add(new EmployeeId(2), new Employee(1001, "A"));




hashMap.add(new EmployeeId(4), new Employee(2002, "B"));



 
 hashMap.add(new EmployeeId(5), new Employee(11425, "C"));




Output of the program below
-----------------PUT-------------------------
Adding object with Key hashcode as : 1
Array is Null, Instantiating Array
Finding array index for key with hashcode:1
No element with same hascode found in Array. Creating a new element in Array
Adding object in Linked List with hashCode as : 1
No Linked list present for HashCode : 1
Attaching the New node to Next of the found Array element
-----------------PUT-------------------------
Adding object with Key hashcode as : 2
Finding array index for key with hashcode:2
-Matching passed key hasCode: 2 with current element HasCode:1
--Not Found. Go to next index of Array
No element with same hascode found in Array. Creating a new element in Array
Adding object in Linked List with hashCode as : 2
No Linked list present for HashCode : 2
Attaching the New node to Next of the found Array element
-----------------PUT-------------------------
Adding object with Key hashcode as : 2
Finding array index for key with hashcode:2
-Matching passed key hasCode: 2 with current element HasCode:1
--Not Found. Go to next index of Array
-Matching passed key hasCode: 2 with current element HasCode:2
--Found Array Index
Adding object in Linked List with hashCode as : 2
Linked list present for HashCode : 2
Iterate through Linked List to attach Node at the end
-----------------GET-------------------------
Finding array index for key with hashcode:2
-Matching passed key hasCode: 2 with current element HasCode:1
--Not Found. Go to next index of Array
-Matching passed key hasCode: 2 with current element HasCode:2
--Found Array Index
-Iterate thorugh Linked List
-Matching passed key value : 5 with current element value : 4
--Not Found. Go to next Node in LinkedList
-Matching passed key value : 5 with current element value : 5
--Found Node in Linked List
C
HashCode Rules:

The performance of the HashMap hugely depends on the implementation of hashCode() method. Based upon the hashCode() the size of the array of a hashMap is determined.

hashCode() should not be unique for each key. This will enforce only one Node present per LinkedList. So the complexity of HashMap will be similar to a ArrayList.

Also hashCode() should not be common for all keys. This will create a single LinkedList and the complexity of HashMap will be equal to a LinkedList.



HashCode should be designed in such a way that the structure of the HashMap is distributed in Array and size of LinkedList. Hashing is a huge topic and not in scope of this blog.

equals Rules:

Two elements with different equals can have same hashcode. This will allow both elements to lie in same LinkedList

If two elements have same equals they should not have same hashcode

If two elements have same equals their hashcode should be same

Differences from java.util.HashMap class:

The process of creating Array is kept simple and restricted to creating only 7 elements. In java.util.HashMap if the Array is about to get full and new array is created and the elements from old array is copied in the new one. The size of the new Array is 3/2 times the first one.

To read more about collection iterations refer Lamda Expression in Java 8

Click Here for Core Java Objective Questions and Answers

3 comments:

  1. Thank's for sharing this awesome post.

    ReplyDelete
  2. I don't like the iteration over arrayOfStartNodes. Instead, the hash code should be used for finding the correct index, that's the whole purpose of it. Like in original HashMap:

    static int indexFor(int h, int length) {
    return h & (length-1);
    }

    ReplyDelete

Share the post