Wednesday, 30 November 2011

Hash Algorithm in Java

Java collections like Hashtable, HashSet and HashMap uses Hashing Algorithm. The objective of this article is to help you to visualize how hashing algorithm works.

Each entry in Hashtable is stored as "Entry" object. The "Entry" object contains members like,
hash index, key object, value object, next entry object.

Let's say Hashtable contains group of employee objects - e1, e2, e3, ...e9. The Employee class must override equals & hashcode method. Each employee object contains members like id, name and department.

Populate the Hashtable,
hashtable.put(new Integer(e1.id), e1);
.......................................................
.......................................................
hashtable.put(new Integer(e9.id), e9);

The hash algorithm works like whenever you put any entry into hastable, the bucket index will be calculated
based on return value of hashcode() from Key object. The hastable contains array of Entry objects. Each entry is inserted into array in the corresponding bucket index.

In this case, the hashtable has got 9 keys and,
bucket index of e1/e5/e6  is 23
bucket index of e7/e2/e3 is 125
bucket index of e4/e8/e9 is 225

Each bucket is represented as vertical bar in the below diagram. In each bucket, the elements are stored in singly linked list format as shown in the picture. The next value (refer diagram - next) is maintained in each element inside hashtable.

The new element always be added into the bottom of the bucket and an element can be removed from any position.

For example, if object e2 is removed, then bucket 125 will look like,
next -> e7.next->e3.next->null

if object e10 is to be added into bucket 23, then it will look like,
next ->e1.next->e5.next->e6.next->e10.next->null

To get any value based on key, the bucket index will be calculated using hashcode() of Key object.
1. Identify the bucket based on bucket index
2. Iterate over each element in the bucket and compare element's key with the given key object.
3. Return the value object from element.
Note: Element is nothing but Entry object which has hash, key, value and next entry.

Also note down that if two objects are equals, the hashcode should be equal. At the same time, if two objects are not equl then hashcode should be different.

Hope this article can help programmers to visualize how the hashing algorithm works in Hashtable, HashSet & HashMap.

Note: one more important info is the HashSet internally uses HashMap. All the elements are represented as keys and dummy object as values for all those keys.

Java 1.4 Collections in Multithreaded Environment

List, Set and Map are not thread-safety by nature. Java 1.5 offers APIs to make it synchronized.
Collections.synchronizedList(new ArrayList());
Collections.synchronizedSet(new Hashset());
Collections.synchronizedMap(new HashMap());

The add/remove methods will become thread-safe by doing as above. One new real problem is "fail-fast". When a thread is iterating over this collections and another thread does add/remove into the same collection, then java.util.ConcurrentModificationException will be thrown.

The above said theory is explained in N number of java papers. But, I'm depicting the fail-fast scenario with an example program. This program is as simple as for self-explanatory.

The below code contains a list and methods to iterate over the list, add elements into it and remove elements from it. Thread t1, t2 and t3 are started together to produce the fail-fast scenario.


package samples.concurrency;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;

public class Sample15_1
{
 private List<String> l = Collections.synchronizedList(new ArrayList<String>());
 private final static int iterationCount = 9000;

 Sample15_1()
 {
  for (int i = 0; i < iterationCount; i++)
  {
   l.add(new String("A" + i));
  }
 }

 public void addName(String str)
 {
  l.add(str);
 }

 public void removeName(String str)
 {
  l.remove(str);
 }

 public void printNames() throws Exception
 {
  int count = 0;
  Iterator<String> i = l.iterator();
  while (i.hasNext())
  {
   count++;
   if (count % 40 == 0)
    System.out.println();
   System.out.print(i.next()+" ");
  }
 }

 public static void main(String[] args) throws Exception
 {
  final Sample15_1 sample = new Sample15_1();
  Thread t1 = new Thread()
  {
   public void run()
   {
    try
    {
     sample.printNames();
    }
    catch (Exception e)
    {
     e.printStackTrace();
    }
   }
  };
  Thread t2 = new Thread()
  {
   public void run()
   {
    for (int i = 0; i < iterationCount; i++)
     sample.addName("B" + i);
   }
  };
  Thread t3 = new Thread()
  {
   public void run()
   {
    for (int i = 0; i < iterationCount; i++)
     sample.removeName("A" + i);
   }
  };
  t3.start();
  t2.start();
  t1.start();
 }
}

Friday, 18 November 2011

Java Collections using Algorithms

IMO, only pointer is not supported in JAVA thats how embedded applications are still using C language. The package java.util in which each and every class is implemented using Algorithms. These classes contains APIs in which some works in O(1) constant time and some other operations works in O(log n) linear time. The below list shows that JAVA collections algorithm names,

java.util.Hashtable, java.util.HashSet, java.util.HashMap -> uses "Singly Linked List" concept combined with "Hashing" algorithm
java.util.LinkedList -> uses "Doubly Linked List" concept.
java.util.TreeSet -> uses "Binary Search" / BFS (breadth first search) / DFS (Depth First Search) algorthims
java.util.Stack -> uses "LIFO" (Last In - First Out) algorithms
java.util.AbstractQueue -> uses "FIFO" (First In First Out) algorithms.

AFAIK, I've listed out all I know. Any comments?

Java Concurrency

Java 1.4 concurrency offers java.util package,

I've come across lot of articles to find out how Hashtable and Vector is replaced by Collections.synchronized wrappers till Java 1.4. Even, the Collections.synchronized wrappers had a major drawback of fail-fast effect. While iterating the collection in a Thread (say t1) and if thread (say t2) modifies the collection, then you will end up in ConcurrentModificationException. Also, the other draw back of Collections.synchronized wrapper is that, it can be thread-safe to an extend for only simple get() and put() methods.

Java 1.5 concurrency offers java.util.concurrent package,

To get rid of fail-fast effect with Collections wrappers, one can use CopyonWriteArrayList as an replacement for Collections.synchronizedList and CopyonWriteArraySet as an replacement for Collections.synchronizedSet.

Well, the CopyonWriteArrayList locks the entire table while iteration for one Thread even the other thread cannot fetch objects from this collection.

To make it more interestingly, the Java 1.5 offers thread-safe, highly scalable collections than in 1.4.

Guys, if you have requirements like iterate over Collection and do add/remove objects into the same collection in Multithreaded environment, then ideal choice would be ConcurrentHashMap as an replacement for Hashtable/HashMap. The retrieval operations are not blocked but overlaped with update operations in ConcurrentHashMap.

Overall, I can say one should live in his own world to think twice before choosing appropriate Collection to deal while working with Multithreaded environment.

Please look into this beautiful article from the URL - https://www6.software.ibm.com/developerworks/education/j-concur/j-concur-a4.pdf