This post serves to explain what exactly
pass by reference and pass by value mean in the context of programming and as
to how heaps and stacks relate to them.
It is a known fact that both C and Java pass
by value.
Let us consider the following segment of code:
int main () {
int a = 5;
int b = 4;
swap (&a, &b);
printf ( “%d : ” , a );
printf (“%d : ” , b );
return 0;
}
void swap ( int *x , int *y ) {
int temp = *x;
*x = *y;
*y = temp;
}
Listing 1
The above code swaps the values a and b
so that a contains 4 and b contains 5.
The function call, swap (&a, &b);
passes the reference of the variables a and b and this facilitates the actual
swap of the values contained in the two variables. Note that references, &a
and &b are themselves passed as values and this is precisely why we say
function calls in C pass by value.
Local variables or Stack variables
Now, if we take a look at the variable
temp in Listing 1, we can see that it is a local variable. Variables which are
local in scope are allocated on the stack and when a function returns (or when
the last instruction in the function has been executed), the local variables on
the stack corresponding to that function are invalidated and the memory thus
reclaimed is open for reallocation.
In Listing 1, variables a and b defined
in the main function are local variables which are local in scope to the main
function and thus reside on the stack. These variables remain stack-resident
till the time the main does not return (until after return 0). Likewise, temp
is a local variable defined in the swap method and its scope is confined to the
swap method only. After the final instruction in swap has been executed temp is
invalidated and the memory thus claimed may be re-allocated.
It is important to understand just how
relevant this is sometimes and understand its relation to pass by value.
Consider Listing 2.
int sum ( int a , int b )
{
int
s = 0;
s
= a + b;
return
s;
}
int main () {
int
a = 4;
int
b = 5;
int
s = sum ( a, b );
printf
( “sum: %d\n” , s );
}
Listing 2
Now, we can see that the values contained
in a and b are passed to the method sum. It is important to understand here
that in C, copies of the values contained in a and b are created and passed as
function arguments, whereas in Java, the references of a and b are themselves
passed as values. While this may be suggestive of pass by reference, the references
are themselves treated as values and it is because of that we say both C and
Java pass values.
Now, when the control returns from the
function sum, variable s is out of scope and the memory reclaimed and after
printf prints sum: 9, variables a
and b will also be out of scope. Thus, we can see that stack-resident variables
have a scope that is confined to the block where they have been defined.
Variables allocated on the Heap
In C, variables declared outside function
body and the variables created using malloc are allocated on the Heap. In C, the
variables declared outside function body have file-level scope and may be
accessed by all methods contained in that file. Now, variables allocated using
malloc remain allocated in the heap till the time you do not explicitly free
them using ‘free’ method. In Java, objects allocated using the ‘new’ keyword
are allocated on the heap and they are automatically garbage- collected by the
JVM when they are no longer used.
Stack and Heap distribution and impact on
variable scope
Each thread in a program has its own
stack space while all the threads share the same heap space.
Consider the following block of code in
Listing 3:
public String foo ( String city1 , String
city2 ) {
String city3 = city1 + “,” + city2;
save (city3);
return city3;
}
Listing 3
As it may be apparent, each thread
invokes the method foo with its own values for ‘city1’ and ‘city2’ and the
method stores the concatenated result in a database using the save function and
the reference to the result is also returned. Now, it is important to
understand a few things here:
· city3 is an object of String type and the statement,
String city3 = city1 + “,” +
city2; initializes
it.
· Upon initialization, the object that city3 refers to
resides on the heap while the reference to it (the name, city3) resides on the
stack corresponding to the calling thread.
· The heap, as mentioned above is shared by all threads but
each thread has access to its own local portion of heap, called arena.
· Now, the stack space corresponding to a thread may in
itself be divided into multiple frames, each frame corresponding to a different
function or method and bearing variables that are local in scope to that
method.
Bullet 3 tells us that the object for a
thread, Thread1 referred by ‘city3’ is stored on the heap arena corresponding
to Thread1 and thus local in scope to Thread1. From bullet 4, we can see that
the reference itself, that is, the name, ‘city3’ is local in scope to the
function foo and thereby resident on the stack frame corresponding to foo in
the stack space for Thread1.
As the value of the reference or address
to the referred object is returned by foo, the copy of the reference, ‘city3’
is in itself invalidated whereas the object itself will remain existent till
the time it is not reclaimed by JVM. This is to re-iterate that Java also
exchanges data between methods by passing their values which may be data values
or values of addresses.
Key things to remember:
1. Threads have
their own stack space
2. Heap space is
shared between threads but each thread has access to its own portion of the
heap space where objects initialized by that thread (using ‘new’ or ‘malloc’
keywords or by any other method of object initialization such as the method
shown above for String initialization) are resident until they are not freed
explicitly by the user (e.g. using free method for malloc) or automatically
(e.g. JVM garbage collection).
3. Methods in a
Thread stack have their own stack frames which contain the variables local to
those methods.
4. C and Java pass
by value.
Two separate C programs have been given
below that help us understand the above mentioned ideas. The first one is
unreliable in that it passes values of references to local variables resident
on its stack outside of its scope while the other works reliably by allocating
the values to be passed on the heap by using the malloc keyword.
UNRELIABLE
CODE:
#include <stdio.h>
long power ( int *a );
long doubler ( int *a );
int main ( int argc , char *argv[] ) {
printf ( "Command line argument: %s\n" , argv[1] );
int operand = *argv[1] - '0';
printf ( "Entered cmdline operand :%d\n" , operand );
long *result1;
long *result2;
result1 = power (&operand);
printf ( "Power of %d: %lu\n", operand, *result1 );
result2 = doubler (&operand);
printf ( "Double of %d: %d\n", operand, *result2 );
return NULL;
}
long power ( int *a ) {
printf ( "*a : %d\n" , *a );
long result1;
result1 = (*a) * (*a);
printf ( "*result1 : %lu\n" , result1 );
return &result1;
}
long doubler ( int *a ) {
printf ( "*a : %d\n" , *a );
int result2;
result2 = (*a) * 2;
printf ( "*result2 : %d\n" , result2 );
return &result2;
}
RELIABLE
CODE:
#include <stdio.h>
long *power ( int *a );
int *doubler ( int *a );
int main ( int argc , char *argv[] ) {
printf ( "Command line argument: %c\n" , *argv[1] );
int operand = *argv[1] - '0';
printf ( "Entered cmdline operand :%d\n" , operand );
long *result1;
int *result2;
result1 = power (&operand);
printf ( "Power of %d: %lu\n", operand, *result1 );
result2 = doubler (&operand);
printf ( "Double of %d: %d\n", operand, *result2 );
return NULL;
}
long *power ( int *a ) {
printf ( "*a : %d\n" , *a );
long *result1 = (long *) malloc( sizeof (long *));
*result1 = (*a) * (*a);
printf ( "*result1 : %lu\n" , *result1 );
return result1;
}
int *doubler ( int *a ) {
printf ( "*a : %d\n" , *a );
int *result2 = (long *) malloc( sizeof (int *));
*result2 = (*a) * 2;
printf ( "*result2 : %d\n" , *result2 );
return result2;
}