Introduction To C Essay, Research Paper
This is an introductory essay on C programming. It assumes that you know varying amounts about computers and programming in general. First, I recommend that you purchase The C Programming Language, Second Edition by Brian W. Kernighan and Dennis M. Ritchie (referred to by everyone as K&R2), and also Expert C Programming: Deep C Secrets by Peter van der Linden, and keep both at your side while you program. They are very useful and very handy books. The C language has changed some since the publication of these books, but overall its flavor is still much the same as it has always been.
This essay will focus entirely on “modern C”, that is to say, the ANSI Standard C language dating from 1989. A revised standard was created in 1999, incorporating numerous and sometimes signficant changes to the language; however, I will not refer to it (much) here since its features are not so much of interest to beginning programmers, and (as I write this, 2001) “C99″ compilers are not in wide, or any use at all. Beware: Peter van der Linden and K&R2 occasionally refer to anachronisms which you do not need to know about, and in fact probably should not know about, as they are confusing, useless, and dated.
First, C is a compiled language, like C++ or Fortran (speaking of dated, useless anachronisms) and unlike interpreted languages such as BASIC and DrScheme. Additionally, C has steps that Java (which occupies a middle ground between compilation and interpretation) lacks.
Conceptually, a programmer creates a series of text files, named in the manner *.c and *.h, which make up the source code to a program, and then uses a C compiler on them. The compiler shall behave as if first it “preprocesses” the text files, and then “compiles” them. You do not need to know the magic that actually goes on behind the scenes; in fact, you should stay away from knowing too much about your compiler. Compilers vary on different systems. If you program in a Windows environment, I suggest DJGPP, which is a port of the freedom software compiler gcc for the Linux platform.
Preprocessing only acts on the text files themselves. The preproccesor, and the commands used to instruct it, form a sort of primitive proto-language on top of C. In fact, this is how it historically evolved, if you are curious. The C preprocessor is highly useful, and isn’t too “dangerous” if you know how to command it properly. Then compilation occurs, which transforms the preproccesed source code into a working application destined to be run on a system. Never mind how that works.
C programs are built out of “functions”. Functions take data passed to them (”arguments” – it is a nearly standard abbreviation to say “args”), and perform operations based on them. Functions can do other things while they’re at it – in fact, most C functions operate in this matter. Functions can also return a value to the original function that executed, or “called” them, but they need not. (”Functional programming” takes this to an extreme and declares that doing anything but returning a value is a “side effect” and thus to be avoided. C is a “procedural” or “imperative” language, if you like. Do this, do that – it’s all a C program really is.) No functions are inherently “special” – you can call your functions, and basically everything you use whatever you like, as long as you remember that your C compiler wants to use some (many) names for itself.
There is one special function, though, and that is the “main” function. The main function is called by whatever magic code gets your program actually up and running, and when the main function returns a value the program is finished.
If you are interested, it is perfectly legitimate for a function to call itself. Such “recursive” functions are not commonly seen in C, but are used occasionally. This applies to all functions: even the main function can be recursive. This is a difference from other languages such as C++. Never mind this too much; our early examples will not use recursion at all.
Functions in C use parentheses around a list of the args passed to them. For example: foo(bar, baz) is a function foo that takes the args bar and baz. If we have a function quux that takes no arguments at all, we “declare” it quux(void) to show that it takes nothing at all, but if we want to actually call the function, we just type the statement:
quux();
(statements in C are terminated with semicolons). This is indeed a minor inconsistency. We cannot declare a function that takes no arguments simply as quux() because of an unusual and useless tidbit of history.
Additionally, if we have a function properly declared as quux(void) and in a program type the statement:
quux;
Nothing at all will happen, rather than the desired effect of executing whatever statements are in the quux function. More precisely, this line “quux;” evaluates the “address” of the function quux and then discards it, without ever actually calling the function. Addresses will be covered later.
One more thing: text inside the delimiters /* and */ enclose comments in C programs. Comments are replaced by “whitespace”, essentially a single space, during a particular phase of compilation. C comments also do not nest, if you’re curious.
I think it’s time for our first program. I will include some unnecessary, redundant code in this program, with the intention that it will help you learn how to use other functions than main quickly (which I had trouble figuring out as a beginning C programmer), and will later explain why this quick snippet is unneccesary.
Try taking the code below and putting it into a text file called “hello.c” and executing the command
gcc -Wall -o hello.exe hello.c
or the equivalent on your system:
/* Here is a comment – the compiler ignores these */
/* Begin Hello, world! program */
#include
int main(void);
int main(void) {
printf(”Hello, world!
“);
return 0;
}
/* End Hello, world! program! */
If you successfully compile and run this program (which K&R2 correctly notes is the hardest part of learning a language!) it should print:
Hello, world!
and then return to your operating system.
Doubtless you are wondering exactly what you typed. The line
#include
is a preprocessor directive as mentioned earlier. All commands to the preprocessor begin with # marks and do NOT end with semicolons. This tells the preprocessor “search for the file called ’stdio.h’ somewhere where the compiler stashed it and put its entire contents right here as if they had been typed here all along”. stdio.h is the Standard Input/Output header file. This is because the C language itself has no idea what a screen or a display is, and is incapable of doing anything interesting to us like printing stuff on the screen. If we include stdio.h, we get to use a bunch of functions already written for us, that take care of nasty details like sending stuff to the screen and formatting it properly. More about header files later.
The next line in our program is
int main(void);
This is a “function prototype”, and it ends with a semicolon. This tells the compiler “we have a function called ‘main’. It takes no arguments and returns a value of type ‘int’. (’int’ values are integers, but ‘int’ has a special meaning that will be covered later.) Now that you know about it, I might go and use this function somewhere else in this file! The actual definition of what this function really does is below, but you can’t complain that I haven’t told you what this function really does because I’ve declared its existence to you right here”.
And then we go and immediately define what the function ‘main’ does. Enclosed in
int main(void) {
/* Stuff */
}
Is everything that main does. The braces enclose a “compound statement”, in which many “statements” can appear. Statements are terminated by typing a semicolon. Compound statements end with the closing brace – don’t put a semicolon after the closing brace.
If you are curious, in certain circumstances you can get away with not using any braces at all. (If you’re ahead of me: this is when you only intend to put one statement there, and not a compound statement.) But, “to be safe”, always use the braces. Always format them as I have done: Put the opening brace on the first line, with a space between it and whatever preceeds it, then indent any code that comes below it four spaces, and then finish with a closing brace on a line of its own, at the original indentation level. This is essentially the “One True Brace Style”, and it is a Good Thing.
So, our compiler knows about the main() function, and it will call it upon program execution and clean up stuff once it’s over. But what does main actually do?
printf(”Hello, world!
“);
return 0;
As we can see, main calls the function printf and passes one argument to it. Main then returns 0. When we return 0 from main, that means everything went okay.
What is the printf function? We neither gave a function prototype for it, nor did we define what printf does! Actually, we have. Including the header file stdio.h put the correct function prototype for printf at the beginning of our program, just as if we had typed it ourselves. This is good, because the function prototype for printf is pretty nasty. We also have the definition of what printf does already put in there for us by the compiler. (It’s expected that any sane program will use standard I/O functions and the like, so the relevant code is automagically “linked” in by the compiler.) So, that’s good!
What does ‘printf’ mean? It means “Print Formatted”. This is good, because formatting is a generally ugly process and we’d much rather have the compiler writer figure it out for us than using our own brainpower. What, precisely, is passed to printf?
This will actually require a short digression. C by itself is a very simple language. It has no concept of what a “string” is. You might recognize that term from other programming languages: a string is a sequence of characters (letters). Strings are customarily enclosed in double quotes: “I am a string.” When we give something enclosed in double quotes to printf, some introductory books will say we are passing a string to printf. This is not quite true.
When we write “Hello, world!
” there, we create an array. Never mind exactly what an array is now – you’ll be able to play with your own arrays later. The C compiler, upon seeing “Hello, world!
“, stores the individual characters of that “string literal” somewhere in memory. This little region of memory is automatically “allocated” for you, and is called an array. This array of 15 characters (their data type is actually ‘char’), is the filled – again, all this is done for you – with numbers corresponding somehow to “Hello, world!
“. This unnamed array (you didn’t give it a name, after all – it was automatically made for you because you typed “Hello, world!
“) is then passed to printf, which happily takes it, looks at the numeric values stored in that array, and puts up “Hello, world!” on the screen.
You are almost certainly wondering what that ‘
‘ I kept typing is, and why you had to type it into the Hello, world! program. That is an “escaped newline”. When typing a text file, we use “newlines” to create a new line for us to type with. But what if we want to store a “newline” character somewhere, to tell a printing function that we actually want to print out a new line on the screen? We certainly can’t type something like
“This is one line.
And this is another”
That just won’t work. Rather, if we type ‘
‘ the compiler can recognize this, and knows to stick in whatever numeric value corresponds to a “newline” into the unnamed array mentioned above. This keeps it separate from the newlines we have to use ourselves to actually type the code. If you omit the newline ‘
‘ in the Hello, world! program, nothing particularly bad will happen, but whatever prompt you use might be printed immediately after, like:
C:\>hello.exe
Hello, world!C:\>
This is not exactly what you want. By the way, ‘
‘ occupies a single space in our array, since it is a single newline character.
You may also be wondering why I said “Hello, world!
” creates an array of 15 characters. The Hello, world! itself takes up thirteen characters (5 for Hello, one for the comma, one for the space, five for world, one for the exclamation mark). A fourteenth is used by the newline
. The fifteenth element of the array is a “null” character. It is numerically stored as zero, and we can represent the character by his array holds are:
‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘,’, ‘ ‘, ‘w’, ‘o’, ‘r’, ‘l’, ‘d’, ‘!’, ‘
‘, ‘ng things going on even in our little boring Hello, world! program!
Updates.
This essay will remain unchanged unless bugs are found in various claims it makes.
There is a clarification to be added. String literals do indeed create unnamed arrays of char just big enough to hold the contents of the named null-terminated string. And character constants like ‘a’ do indeed work as described. However, character constants like ‘a’ have type “int”, and NOT type “char”. This is a fine point. More to follow in an essay on data types.