Friday 30 May 2014

Do away with System.out.println and Logger.info and your code now runs really faster!

I wrote a 10-threaded script last evening and was astounded to note that the performance hadn't really improved as I had expected. I knew for sure that I had parallelized all computation to get that 10x boost in speed from my initial single-threaded code and I was struck by this weird idea that 'sysout' and Logger.info were to blame. 

My good friend told me that if I were printing feedback out to the console on every single operation, that was a good reason why my code didn't perform as had been anticipated initially. I removed all those statements and bang! the code now run smooth and fast. 


Thursday 29 May 2014

Arrays are implicitly thread-safe

I am currently tasked with writing to a 2-dimensional array from multiple threads. Let me briefly explain the context of the problem.

I have a 2-D array which looks somewhat like this:

__ __ __ __ __
__ __ __ __ __
__ __ __ __ __
__ __ __ __ __
__ __ __ __ __
__ __ __ __ __
__ __ __ __ __
__ __ __ __ __

So, what I had to do was use 8 threads, one each to write to each row of the array. I was thinking whether I would need to synchronize write access to the array wherein each thread is writing to a separate memory space independent of the other threads. While the array itself is common for the threads, we can see that each thread is in fact writing to a different space and no thread is dependent on any other thread for reads/writes.

As I had expected, you do not need synchronization in such a scenario. While it may sound really trivial, sometimes basic concepts like these keep you thinking and wondering till you figure out that common sense is indeed the best friend.


Thursday 22 May 2014

Idea behind setting a color to transparent

Incredibly fascinating but works:

To set a color to transparent, we need to :

1. First render the color in 4 channel ARGB model

A (alpha channel) -> MSB  (bits 31-24)

R (Red) -> (bits 23 - 16)

G (Green) -> (bits 15 - 8)

B (Blue) -> (bits 7 to 0)

2. Then AND the ARGB color (argb) with 0x00FFFFFF and voila there you are!

What this does is maintains the color itself in RGB model while setting the alpha channel to zero.

In Computer Graphics, what all an zero alpha channel means is do not render the color! This functions like a directive to image formats like .png giving it the flexibility to either render or not render a specific color. Also note that JPEG cannot handle color transparency for its color model consists of 3 channels only (RGB).






Wednesday 14 May 2014

StringBuffer re-initialization with setLength method

Often, we need to re-initialize a string with a value other than the one it was initialized with. Using StringBuffer here is recommended if the original string is not needed once its value has been changed.

A common way to re-initialize a StringBuffer obj is to write:

obj = new StringBuffer ();

What this does is create a new object itself and reassigns the reference obj to refer to the new object.

An optimized approach of achieving the same goal is to set the length of the obj to zero.

obj.setLength(0);

This does not create a new object but instead resets the value of the same StringBuffer object, obj.





Singleton Nested Class Holder idiom tamed!

Singleton Nested Class Holder idiom works by on-demand initialization of singleton objects during class loading. This gives the impression that it does not work when the object initialization depends on another resource for initialization. Here, we show that is not the case with an interesting example.

For example, here is a stream reader class to initialize a singleton stream reader object that could then be used to parse text contained in a URL. The URL to be parsed is input at run-time by the user.


/*attempt to make reader a singleton */

public class Reader {


private DataBean data;
private InputStream ir;

private Reader (DataBean d) {
data = d;
try {
ir = new URL(data.getResource()).openStream();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

********\ Oh oh, Databean not available yet **********/
private static class ReaderHolder {

private static final Reader reader = new Reader();

}

public Reader getInstance (Databean d) {
               return ReaderHolder.reader;
        }
}

Now, the problem here seems to result from the fact that the Databean object is not available yet as it will only be made available later by the user.

We work around this with a Context object that encapsulates the Databean object and may be either be passed around easily or making Context itself a singleton with early-initialization.


/*Context class*/
public class Context {

public static final Context INSTANCE = new Context();

private DataBean dataBean;

private Context() {

}


/**
* @return the db
*/
public DataBean getDataBean() {
return dataBean;
}

/**

* @param db the db to set
*/
public void setDataBean(DataBean db) {
dataBean = db;
}


}


Now, our reader is modified to look like this:

/*Reader is a singleton */
public class Reader {

private InputStreamReader ir;

private static class ReaderHolder {

private static final Reader reader = new Reader();

}

private Reader () {

try {
InputStream stream = new URL(Context.INSTANCE.getDataBean().getResource()).openStream();
ir = new InputStreamReader(stream);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

/**
* @return the ir
*/
public InputStreamReader getIr() {
return ir;
}

public Reader getInstance () {
return ReaderHolder.reader;
}


}



This is actually an interesting enhancement opportunity to plain Java programs derived from J2EE and Spring framework.








Some high-level Hadoop information!

1. It is a good idea to minimize the amount of data transferred between Mapper and Reducer in keeping with the Bandwidth constraints of the network. Thus, a combiner function is employed at the mapper to combine or aggregate mapper output before it is passed on to reducer.


2. Job profiling and tuning a job: make a Hadoop job run faster
3. Job tracker and Task tracker - > Job tracker assigns  tasks to one or more task trackers that actually run the job on its own split.


4. Hadoop Streaming: An interface provided by Hadoop framework that enables programmers to write MapReduce code in any language that supports standard streams. The technique that underpins this ease/convenience is Hadoop Streaming.


5. What is pig? A high level scripting language to write Hadoop programs faster and easily. It automatically maps the problem in MapReduce mode, saving developer efforts.