Programming in C Notes

Notes from the book: Programming in C (3rd Edition) Paperback – July 18, 2004 by Stephen G. Kochan 

In the early 1980s, a need was seen to standardize the definition of the C language. The American National Standards Institute (ANSI) is the organization that handles such things, so in 1983 an ANSI C committee was formed to standardize C. In 1990, the first official ANSI standard def- inition of C was published.

Because C is used around the world, the International Standard Organization (ISO) soon got involved. They adopted the standard.

Since that time, additional changes have been made to the C language. The most recent standard is ANSI C11.


Programming

The basic operations of a computer system form what is known as the computer's instruction set.
To solve a problem using a computer, you must express the solution to the problem in terms of the instructions of the particular computer.


Higher-Level Languages

When computers were first developed, the only way they could be programmed was in terms of binary numbers that corresponded directly to specific machine instructions and locations in the computer's memory. The next technological software advance occurred in the development of assembly languages, which enabled the programmer to work with the machine on a slightly higher level. Instead of having to specify sequences of binary numbers to carry out particular tasks, the assembly language permits the programmer to use symbolic names to perform various operations and to refer to specific memory locations. A special program, known as an assembler, translates the assembly language program from its symbolic format into the specific machine instructions of the computer system.

Because a one-to-one correspondence still exists between each assembly language statement and a specific machine instruction, assembly languages are regarded as low-level languages. The programmer must still learn the instruction set of the particular computer system to write a program in assembly language, and the resulting program is not portable; that is, the program will not run on a different processor type without being rewritten. This is because different processor types have different instruction sets, and because assembly language programs are written in terms of these instruction sets, they are machine dependent.

Then, along came the so-called higher-level languages, of which the FORTRAN language was one of the first. Programmers developing programs in FORTRAN no longer had to concern themselves with the architecture of the particular computer, and operations performed in FORTRAN were of a much more sophisticated or higher level, far removed from the instruction set of the particular machine. One FORTRAN instruction or statement resulted in many different machine instructions being executed, unlike the one-to-one correspondence found between assembly language statements and machine instructions.

To support a higher-level language, a special computer program must be developed that translates the statements of the program developed in the higher-level language into a form that the computer can understand - in other words, into the particular instructions of the computer. Such a program is known as a compiler.


Operating Systems

An operating system is a program that controls the entire operation of a computer system. All input and output (that is, I/O) operations that are performed on a computer system are channeled through the operating system. The operating system must also manage the computer system's resources and must handle the execution of programs.


Compiling Programs

A compiler analyzes a program developed in a particular computer language and then translates it into a form that is suitable for execution on your particular computer system.

The program that is to be compiled is first typed into a file on the computer system. Computer installations have various conventions that are used for naming files, but in general, the choice of the name is up to you. C programs can typically be given any name provided the last two characters are “.c”

For example, under Unix, the command to initiate program compilation is called cc. If you are using the popular GNU C compiler, the command you use is gcc. Typing the line
gcc prog1.c
has the effect of initiating the compilation process with the source program contained in prog1.c.

After the program has been translated into an equivalent assembly language program, the next step in the compilation process is to translate the assembly language statements into actual machine instructions. This step might or might not involve the execution of a separate program known as an assembler. On most systems, the assembler is executed automatically as part of the compilation process.

The assembler takes each assembly language statement and converts it into a binary format known as object code, which is then written into another file on the system. This file typically has the same name as the source file under Unix, with the last letter an “o” (for object) instead of a “c”.

After the program has been translated into object code, it is ready to be linked. This process is once again performed automatically whenever the cc or gcc command is issued under Unix. The purpose of the linking phase is to get the program into a final form for execution on the computer. If the program uses other programs that were previously processed by the compiler, then during this phase the programs are linked together. Programs that are used from the system's program library are also searched and linked together with the object program during this phase.

The final linked file, which is in an executable object code format, is stored in another file on the system, ready to be run or executed. Under Unix, this file is called a.out by default.

When the program is executed, each of the statements of the program is sequentially executed in turn. If the program requests any data from the user, known as input, the program temporarily suspends its execution so that the input can be entered. Or, the program might simply wait for an event, such as a mouse being clicked, to occur. Results that are displayed by the program, known as output, appear in a window, sometimes called the console. Or, the output might be directly written to a file on the system.


The Preprocessor

The preprocessor is a part of the C compilation process that recognizes special statements that might be interspersed throughout a C program. As its name implies, the preprocessor actually analyses these statements before analysis of the C program itself takes place. Preprocessor statements are identified by the presence of a pound sign, #, which must be the first nonspace character on the line. As you will see, preprocessor statements have a syntax that is slightly different from that of normal C statements.


The #define Statement

One of the primary uses of the #define statement is to assign symbolic names to program constants.The preprocessor statement
#define YES 1
defines the name YES and makes it equivalent to the value 1. The name YES can subsequently be used anywhere in the program where the constant 1 could be used. Whenever this name appears, its defined value of 1 is automatically substituted into the program by the preprocessor.

It's analogous to doing a search and replace with a text editor; in this case, the preprocessor replaces all occurrences of the defined name with its associated text.


Arguments and Macros

#define IS_LEAP_YEAR(y) y % 4 == 0 && y % 100 != 0 || y%400==0
Unlike a function, you do not define the type of the argument y here because you are merely performing a literal text substitution and not invoking a function.

In C, definitions are frequently called macros. This terminology is more often applied to definitions that take one or more arguments. An advantage of implementing something in C as a macro, as opposed to as a function, is that in the former case, the type of the argument is not important. For example, consider a macro called SQUARE that simply squares its argument.The definition
#define SQUARE(x) x * x
enables you to subsequently write statements, such as
y = SQUARE (v);
to assign the value of square of v to y. The point to be made here is that v can be of type int, or of type long, or of type float, for example, and the same macro can be used.

One consideration about macro definitions, which might be relevant to your application: Because macros are directly substituted into the program by the preprocessor, they inevitably use more memory space than an equivalently defined function. On the other hand, because a function takes time to call and to return, this overhead is avoided when a macro definition is used instead.


Variable Number of Arguments to Macros

A macro can be defined to take an indeterminate or variable number of arguments. This is specified to the preprocessor by putting three dots at the end of the argument list. The remaining arguments in the list are collectively referenced in the macro definition by the special identifier _ _VA_ARGS_ _.
#define debugPrintf(...) printf ("DEBUG: " __VA_ARGS__);
Legitimate macro uses would include
debugPrintf ("Hello world!\n");
debugPrintf ("i = %i, j = %i\n", i, j);
In the first case, the output would be

DEBUG: Hello world!
And in the second case, if i had the value 100 and j the value 200, the output would be
DEBUG: i = 100, j = 200


The #include Statement

After you have programmed in C for a while, you will find yourself developing your own set of macros that you will want to use in each of your programs. But instead of having to type these macros into each new program you write, the preprocessor enables you to collect all your definitions into a separate file and then include them in your program, using the #include statement. These files normally end with the characters .h and are referred to as header or include files.

The double quotation marks around the include filename instruct the preprocessor to look for the specified file in one or more file directories (typically first in the same directory that contains the source file, but the actual places the preprocessor searches are system dependent). If the file isn't located, the preprocessor automatically searches other system directories.

Enclosing the filename within the characters < and > instead, as in #include <stdio.h> causes the preprocessor to look for the include file in the special system include file directory or directories.

You can actually put anything you want in an include file not just #define statements, as might have been implied. Using include files to centralise commonly used pre-processor definitions, structure definitions, prototype declarations, and global variable declarations is good programming technique.

Include files can be nested. That is, an include file can itself include another file, and so on.


The typedef Statement

C provides a capability that enables you to assign an alternate name to a data type. This is done with a statement known as typedef.
typedef int Counter;
defines the name Counter to be equivalent to the C data type int.

To define a new type name with typedef, follow these steps:
  1. Write the statement as if a variable of the desired type were being declared.
  2. Where the name of the declared variable would normally appear, substitute the new type name.
  3. In front of everything, place the keyword typedef.
to define a type called Date to be a structure containing three integer members called month, day, and year, you write out the structure definition, substituting the name Date where the variable name would normally appear
typedef struct {
int month; int day; int year;
} Date;
With this typedef in place, you can subsequently declare variables to be of type Date, as in
Date birthdays[100];
This defines birthdays to be an array containing 100 Date structures.


Compiling Multiple Source Files from the Command Line

Suppose you have conceptually divided your program into three modules and have entered the statements for the first module into a file called mod1.c, the statements for the second module into a file called mod2.c, and the statements for your main routine into the file main.c. To tell the system that these three modules actually belong to the same program, you simply include the names of all three files when you enter the com- mand to compile the program.
$ gcc mod1.c mod2.c main.c –o dbtest
Normally, the compiler generates intermediate object files for each source file that it compiles. The compiler places the resulting object code from compiling mod.c into the file mod.o by default.

Typically, these intermediate object files are automatically deleted after the compilation process ends. Some C compilers (and, historically, the standard Unix C compiler) keep these object files around and do not delete them when you compile more than one file at a time. This fact can be used to your advantage for recompiling a program after making a change to only one or several of your modules. So in the previous example, because mod1.c and main.c had no compiler errors, the corresponding .o files - mod1.o and main.o - would still be around after the gcc command completed. Replacing the c from the filename mod.c with an o tells the C compiler to use the object file that was produced the last time mod.c was compiled. So, the following command line could be used with a compiler (in this case, cc) that does not delete the object code files:
$ cc mod1.o mod2.c main.o –o dbtest
If your compiler automatically deletes the intermediate .o files, you can still take advantage of performing incremental compilations by compiling each module separately and using the –c command-line option.
$ gcc –c mod2.c
$ gcc mod1.o mod2.o mod3.o
lists only object files and no source files. In this case, the object files are just linked together to produce the executable output file dbtest.


Communication Between Modules

If a function from one file needs to call a function contained inside another file, the function call can be made in the normal fashion, and arguments can be passed and returned in the usual way. Of course, in the file that calls the function, you should always make certain to include a prototype declaration so the compiler knows the function's argument types and the type of the return value.

It's important to remember that even though more than one module might be specified to the compiler at the same time on the command line, the compiler compiles each module independently. That means that no knowledge about structure definitions, function return types, or function argument types is shared across module compilations by the compiler. It's totally up to you to ensure that the compiler has sufficient information about such things to correctly compile each module.