Wednesday, January 6, 2010

arguments in java comparing C and C++

Introduction

The focus of this article is to explain how a function or method passes data arguments to another function or method. The former is referred to as the caller and the latter as the callee. Before we dive in, let's get some terminology right.

Primitive data and Objects

Broadly, the data arguments in C, C++ and Java are either primitive data, i.e. variables of primitive types like int, char, double etc or they are objects of user defined types - structs and unions in C and C++, and classes in C++ and Java.

Stack and Heap

Data can be stored either on the stack or on the heap. Data stored on the stack are the local variables of functions or methods allocated on the call stack. When a function returns, the stack pops and its local variables are said to go out of scope, that is, they are deallocated. All data on the stack has a name - the variable name that is used to access it. They also have an address - the address of the memory location on the stack where they are stored. This address is not valid when the variable goes out of scope.

Heap is a common store where any function or method can allocate objects. It is particularly useful for dynamically creating primitive data or objects. Data on the heap has no name! They have an address - and it is the only way to access it. This address can be stored in a pointer type data variable. This pointer data in turn could either be on the stack or the heap, although this does not hold for Java as we will see.

Handle

In object-oriented terminology, a handle refers to the mechanism using which an object is accessed. C/C++ pointers and Java references are example implementations of handles. A handle is usually different from the object itself. The name of a named stack object is not commonly understood as a handle.

Pointers (C and C++)

Pointers are special types of data objects - they can store the addresses of primitive data or objects that are either on the stack and on the heap. The pointer in turn can be either on the stack or the heap. In C and C++, pointers are treated as first class data objects.

References (C++)

A reference is an alias name for a named primitive data or object. Since heap objects cannot be named, you cannot make references to heap objects. References are aliases to stack variables.


No extra memory is allocated to maintain references. It is an additional entry in the symbol table when the program is compiled and is resolved to the exact same location as the name of the data it is aliasing.) Note that this is the general meaning of the term reference and is implemented this way in C++. Such references are not supported in C and Java.

Java Reference (only in Java)

In Java, all objects are created only on the heap. It is impossible to create an objects of user defined types (classes) on the stack. So on the stack we need some variable that points to this heap object. (Recall that heap objects cannot have a name.) This variable is referred to as the Java Reference. It is very different from a C++ or a C pointer in the sense that Java does not allow the programmer to treat this pointer like another data. That is, Java references are not first class data objects. It is just a handle to the object. The type of the handle is either same as that of the object or its super type (to support polymorphism).


In Java, only primitive data is allocated on the stack. Pointers to primitive data are not allowed. There are no C/C++ like pointers in Java. This was one of the core design ideas in Java - to simplify the language by eliminating pointer types.

Now we have well-defined terminology to understand argument passing and how they are implemented in C, C++ and Java.

Pass By Value

A function may pass a copy of local data on its stack to another function that it calls. This data is copied onto the stack memory of the callee. As obvious, changes made to the copy by the callee will not reflect back on to the original data in the stack memory of the caller. You cannot pass objects in the heap by value. And you don't need to! They can be accessed from any function as long as it has a direct or an indirect handle to these objects.

Pass By Pointer

A function may pass a pointer to a data in its stack or on the heap to another function. If the callee function, modifies the data through the pointer, the changes will be observed by the called function as it is the same data that gets modified. However, there is one subtlety. A pointer in itself is a first class data object in languages like C and C++ (unlike Java). In this case, the pointer is, in fact, passed by value. That is, a copy of the address in the pointer data is made for the callee. So if the callee modifies the value in the pointer data, it will not show in the caller's copy of the pointer data.

Pass By Reference

A function may pass a reference to data on its stack. In this case, under the hoods, the callee function gets direct access to a variable in the caller's stack memory. Any change made to the variable will reflect in the caller function. Since functions (in C, C++ and Java) can return only one data value, passing references is often used as a mechanism to have side effects in the caller's stack in order to mimic multiple data values being returned. (References to pointer data types can also be passed. As obvious, there is no such thing as pointer to references. References are not first class data values.)

Argument Passing in C

C supports pass by value and pass by pointer. You can pass data on stack by value or by pointer. You can pass pointers to data on the heap.

Argument Passing in C++

In addition to what C supports, C++ also supports passing references to data on the stack.

Argument Passing in Java

Java supports pass by value and pass by Java reference which can be thought of pass by value or pass by pointer, only that the pointer or the handle passed is not a full fledged pointer as in C or C++. It is a Java reference.

In Java, since primitive types can only be on the stack - they are passed by value. Objects which are only on the heap are not actually passed at all. A Java reference to them is passed and the callee gets to access the exact same object lying on the heap.

No comments:

Post a Comment