A process in UNIX, or for that
matter, any operating system may be defined as a program in execution. While
the statement is fairly complete and clear in itself, there are some other
things about processes that I will highlight today.
1. Program and Process
A program is in essence, an
executable sequence of code that is run by the processor. The program or the code
is not the process itself. The program containing the process code constitutes
only the content of the text segment of a process. The process itself is more
than the program and the following figure is an illustration lets us get a
deeper picture of this idea.
Process
in memory [1]
We can see from the above figure
that a process in memory comprises various segments such as the stack, heap,
data and text (code).
Stack segment: Holds references
to local variables in the process. Generally, the stack is further split into
several stack frames, one for each function or method in the process. Also
recall that each thread of the process has its own copy or instance of the
stack that is not visible to other threads.
Heap: Dynamic portion of the
memory assigned to a process that is used to allocate memory to objects on
demand during their creation. The heap memory is divided into arenas, one for
each thread of the process. The arena of one thread is not visible to another
thread, but it is visible to all functions and methods in the same thread.
Data: Holds global variables
shared between various threads of the process. The program or code is the passive entity containing
the lines of instructions governing the execution by the processor of the
active entity, process that resides in memory and undergoes several transitions
during its lifetime, since its creation until its death.
·
The process may also be viewed as an instance of
the program in execution. This is similar to the idea of classes and objects. Different
processes are different instances of the same program that share the same code
segment and have their own independent heap, stack and data segments.
2. Process representation in operating
system
Each process is represented
using a data structure called the Process Control Block or Task Control Block
(PCB). The PCB has information regarding a process such as its identifier (pid),
the identifier of the parent process that created it (ppid) state (new, ready, running,
blocked, terminated), reference to the next instruction to be executed for the
process (IP), the list of files opened by the process, the reference to the
process address space (this is OS dependent and this is typically a memory
management concern), the values of the CPU flags, the remaining amount of time
for executing the process (in a time slice) and other process specific
information that needs to be logged.
A typical PCB in Linux is represented
using a structure, task_struct as shown below:
struct task_struct {
pid_t pid;
long state;
unsigned int time_slice;
struct files_struct
*files;
struct mm_struct *mm;
}
[2]
The operating system maintains
a doubly linked list of PCBs. This is typically used while doing a context
switch from one process to the other. When the CPU switches from the execution
of one process to another, the current state of the PCB of the process in
execution is updated and saved and the PCB of the process which is going to be
executed is loaded in memory. It is vital to understand the role of memory
management schemes in this context. It is of particular importance if physical
memory constraints prevent the maintenance of some PCBs in memory when the
processes they correspond to are currently not in execution. In such cases, it
is safe to assume that the operating system maintains references to PCBs (using
virtual memory addressing techniques) if not the PCBs themselves.
3. Fork, Memory Overlaying,
Zombie state
A process in UNIX/Linux may
create a new process using the fork system call. Upon execution of the fork
call, the operating system creates a new process and loads the address space of
the newly spawned child process with the current state of the address space of
the parent process. This is similar to object cloning in programming.
Upon successful invocation of
the fork(), a new process is created and its process identifier is returned to
the parent whereas the child process itself is returned 0.
Typically, after creation, the
new process is tasked with some objective as reflected from its own code
segment. However, keep in mind that the code segment of the newly created
process is identical to that of its parent process immediately after its
creation as its address space is a mere replica of that of its parent. Now, to
get the child process to execute some other instructions and accomplish a different
objective, we use system calls in the exec family to which we pass reference to
another sequence of code. This is shown in the following program:
#include
<sys/types.h>
#include <stdio.h>
#include
<unistd.h>
int main() {
pid_t pid;
pid = fork();
if ( pid < 0 ) {
fprintf ( stderr , “Fork
failed” );
exit (EXIT_FAILURE);
}
else if ( pid == 0 ) { /*in
child */
execlp (“/bin/ls” , “ls”
, NULL);
}
else { /*parent*/
wait (NULL);
printf (“Execution of child
process is complete”);
exit ( EXIT_SUCCESS );
}
This particular example uses
execlp whose signature is
int execlp(const char *file,
const char *arg, ...);
We can see that the first
argument is pointer to a character stream, the second parameter onward are
pointers to strings that serve as instruction arguments. We can see that in our
example, execlp is called with the argument,
“/bin/ls” that is itself a pointer
to the file containing the instruction, “ls” in the second argument.
Now, upon execution of the
execlp system call, the operating system loads the code segment in the address
space of the child process with the code contained in the pointer to the code stream
referenced in the first argument of execlp and the heap, stack and data segments
are also refreshed to reflect to the new program. This technique wherein the address space is entirely replaced without
the creation of a new process identifier is called Overlay. It is important
to understand here that the changes to the address space of the process are
done within the context of the process itself.
Generally, before forking a
process, a shared memory Pipe communication is established so as to facilitate
communication between a process and its child. This detail has been ignored in
the above example.
It is also important to
remember that the parent process waits until the completion of the child
process using the system call, pid_t wait(int* status). The wait system call is
used to prevent the child process being in Zombie state for too long.
In UNIX, when a process
finishes executing, the memory allocated to it is reclaimed but the entry of
the process in the process table is not immediately removed. Such a process is
called a Zombie process for it has terminated execution but is not actually
dead. If the parent of the process does not execute wait(), the process would
remain a Zombie process for long periods of time (until the death of its own
parent when it will be adopted by ‘init’, pid = 1. Once it has been adopted by
init, it will eventually be killed as init periodically executes the wait()). The
argument to the wait is NULL in our example which means that the status of the
child process completion is not stored. Generally, a pointer is passed by the
parent to store the value of the status code of the completion of child process.
References
[2]. Silberschatz, Galvin and
Gagne (Operating Systems Concepts, 7th edition)