Sunday 29 December 2013

Variables, Scope and Parameter passing


This post serves to explain what exactly pass by reference and pass by value mean in the context of programming and as to how heaps and stacks relate to them.

It is a known fact that both C and Java pass by value.
Let us consider the following segment of code:

int main () {
int a = 5;
int b = 4;
swap (&a, &b);
printf ( “%d : ” , a );
printf (“%d : ” , b );
return 0;
}
void swap ( int *x , int *y ) {
int temp = *x;
*x = *y;
*y = temp;
}
Listing 1

The above code swaps the values a and b so that a contains 4 and b contains 5.
The function call, swap (&a, &b); passes the reference of the variables a and b and this facilitates the actual swap of the values contained in the two variables. Note that references, &a and &b are themselves passed as values and this is precisely why we say function calls in C pass by value.

Local variables or Stack variables

Now, if we take a look at the variable temp in Listing 1, we can see that it is a local variable. Variables which are local in scope are allocated on the stack and when a function returns (or when the last instruction in the function has been executed), the local variables on the stack corresponding to that function are invalidated and the memory thus reclaimed is open for reallocation.

In Listing 1, variables a and b defined in the main function are local variables which are local in scope to the main function and thus reside on the stack. These variables remain stack-resident till the time the main does not return (until after return 0). Likewise, temp is a local variable defined in the swap method and its scope is confined to the swap method only. After the final instruction in swap has been executed temp is invalidated and the memory thus claimed may be re-allocated.

It is important to understand just how relevant this is sometimes and understand its relation to pass by value.

Consider Listing 2.
int sum ( int a , int b )
{
          int s = 0;
          s = a + b;
          return s;
}
int main () {
          int a = 4;
          int b = 5;
          int s = sum ( a, b );
          printf ( “sum: %d\n” , s );
}
Listing 2

Now, we can see that the values contained in a and b are passed to the method sum. It is important to understand here that in C, copies of the values contained in a and b are created and passed as function arguments, whereas in Java, the references of a and b are themselves passed as values. While this may be suggestive of pass by reference, the references are themselves treated as values and it is because of that we say both C and Java pass values.

Now, when the control returns from the function sum, variable s is out of scope and the memory reclaimed and after printf prints sum: 9, variables a and b will also be out of scope. Thus, we can see that stack-resident variables have a scope that is confined to the block where they have been defined.

Variables allocated on the Heap

In C, variables declared outside function body and the variables created using malloc are allocated on the Heap. In C, the variables declared outside function body have file-level scope and may be accessed by all methods contained in that file. Now, variables allocated using malloc remain allocated in the heap till the time you do not explicitly free them using ‘free’ method. In Java, objects allocated using the ‘new’ keyword are allocated on the heap and they are automatically garbage- collected by the JVM when they are no longer used.

Stack and Heap distribution and impact on variable scope

Each thread in a program has its own stack space while all the threads share the same heap space.

Consider the following block of code in Listing 3:
public String foo ( String city1 , String city2 ) {
String city3 = city1 + “,” + city2;
save (city3);
return city3;
}
Listing 3

As it may be apparent, each thread invokes the method foo with its own values for ‘city1’ and ‘city2’ and the method stores the concatenated result in a database using the save function and the reference to the result is also returned. Now, it is important to understand a few things here:

·       city3 is an object of String type and the statement,
String city3 = city1 + “,” + city2; initializes it.

·     Upon initialization, the object that city3 refers to resides on the heap while the reference to it (the name, city3) resides on the stack corresponding to the calling thread.

·   The heap, as mentioned above is shared by all threads but each thread has access to its own local portion of heap, called arena.
·     Now, the stack space corresponding to a thread may in itself be divided into multiple frames, each frame corresponding to a different function or method and bearing variables that are local in scope to that method.

Bullet 3 tells us that the object for a thread, Thread1 referred by ‘city3’ is stored on the heap arena corresponding to Thread1 and thus local in scope to Thread1. From bullet 4, we can see that the reference itself, that is, the name, ‘city3’ is local in scope to the function foo and thereby resident on the stack frame corresponding to foo in the stack space for Thread1.

As the value of the reference or address to the referred object is returned by foo, the copy of the reference, ‘city3’ is in itself invalidated whereas the object itself will remain existent till the time it is not reclaimed by JVM. This is to re-iterate that Java also exchanges data between methods by passing their values which may be data values or values of addresses.

Key things to remember:

1. Threads have their own stack space

2. Heap space is shared between threads but each thread has access to its own portion of the heap space where objects initialized by that thread (using ‘new’ or ‘malloc’ keywords or by any other method of object initialization such as the method shown above for String initialization) are resident until they are not freed explicitly by the user (e.g. using free method for malloc) or automatically (e.g. JVM garbage collection).

3. Methods in a Thread stack have their own stack frames which contain the variables local to those methods.

4. C and Java pass by value.
Two separate C programs have been given below that help us understand the above mentioned ideas. The first one is unreliable in that it passes values of references to local variables resident on its stack outside of its scope while the other works reliably by allocating the values to be passed on the heap by using the malloc keyword.

UNRELIABLE CODE:
#include <stdio.h>
  long power ( int *a );
  long doubler ( int *a );
 
  int main ( int argc , char *argv[] ) { 
    printf ( "Command line argument: %s\n" , argv[1] );
    int operand = *argv[1] - '0';
    printf ( "Entered cmdline operand :%d\n" , operand );
     
    long *result1;
    long *result2;
   
    result1 = power (&operand);
    printf ( "Power of %d: %lu\n", operand, *result1 );
   
    result2 = doubler (&operand);
    printf ( "Double of %d: %d\n", operand, *result2 );
    return NULL;
  }

 long power ( int *a ) {
    printf ( "*a : %d\n" , *a );
    long result1;
    result1 = (*a) * (*a);
    printf ( "*result1 : %lu\n" , result1 );
    return &result1;
  } 

  long doubler ( int *a ) {
    printf ( "*a : %d\n" , *a );
    int result2;
    result2 = (*a) * 2;
    printf ( "*result2 : %d\n" , result2 );
    return &result2;   
  }

RELIABLE CODE:

#include <stdio.h>

  long *power ( int *a );
  int *doubler ( int *a );
 
  int main ( int argc , char *argv[] ) {
 
    printf ( "Command line argument: %c\n" , *argv[1] );
    int operand = *argv[1] - '0';
    printf ( "Entered cmdline operand :%d\n" , operand );
     
    long *result1;
    int *result2;
   
    result1 = power (&operand);
    printf ( "Power of %d: %lu\n", operand, *result1 );
   
    result2 = doubler (&operand);
    printf ( "Double of %d: %d\n", operand, *result2 );
      
    return NULL;
  }
 
  long *power ( int *a ) {
    printf ( "*a : %d\n" , *a );
    long *result1 = (long *) malloc( sizeof (long *));
    *result1 = (*a) * (*a);
    printf ( "*result1 : %lu\n" , *result1 );
    return result1;
  }
 
  int *doubler ( int *a ) {
    printf ( "*a : %d\n" , *a );
    int *result2 = (long *) malloc( sizeof (int *));
    *result2 = (*a) * 2;
    printf ( "*result2 : %d\n" , *result2 );
    return result2;
   }

Wednesday 4 December 2013

Intrinsic, Explicit and Client-Side Locking

Over the last some days, I have been reading about different locking mechanisms in Java and what interests me most about them is their so near-same yet different nature. Their differences are so subtle and delicate that it is not unusual to mistake one for the other. 

Intrinsic locking is based on locking upon the calling object or "this". When we simply append "synchronized" keyword to a method signature, what we are in fact implying is synchronized("this").

Explicit locking is relying on the lock used by the underlying private data member of the class. This encompasses delegation of thread safety to the class members,  and use of thread-safe members in the class such as ConcurrentHashMap, List and Vector.

Client-side locking relies on the lock employed by the class member itself. While, on first glance, this may seem exactly similar to explicit locks, it is a little different in essence. Explicit locks are used while performing mundane operations associated with the class object such as list.add(), list.removeAll() etc. Client-side locks are like wrappers over these collection objects that lock on the explicit lock associated with the object itself. Such a use case may be contemplated when you wish to perform operations with more than one data member of the class in a single atomic transaction. It relies on explicit locks of each of the concerned data members for manipulating them and uses its own wrapper design to encapsulate them in an atomic transaction.