Updated on 2022-09-03
Understand the difference between headers and source files, what they do, and how they work together.
I'm writing this article for a friend of mine, primarily. However, I'd be doing a disservice to the community if I didn't share this with everyone, so here it is.
We're going to explore headers and source files and what they do. Much of this code works in both C and C++, but the C++ specific code will be identified as such.
I'll keep it mercifully brief, and cover the essentials.
The relationship between headers and source files can be confusing at first. In teaching myself C and C++ years ago, I struggled to understand the relationships and where to use them.
Part of this was not understanding the necessity of prototyping fully, and also not understanding the linking process in C or C++. We'll be going over that here.
Before a struct, a variable, or a function can be used, it must be declared. With functions, you can and generally should provide a prototype for the function ahead of the function implementation itself. The prototype is essentially just the function signature, that is the return type, name, parameters and any modifiers like static or (C++) const.
Consider the following:
int sum(int lhs, int rhs);
This is the function prototype for a function named sum. Note that we didn't put { } after it, nor any implementation therein. Instead, it's terminated with a semicolon which indicates to the compiler that this is a prototype rather than the function itself.
Once the prototype has been declared, you may use the function, even if it occurs later in the file, or even in a separate C or C++ source file.
When you use #include to include a header file, the compiler (technically the preprocessor) literally copies the contents of the include into the file that includes it, at the line where the #include directive occurs. This happens before any source code is actually compiled. That happens later.
Therefore, the compilation process itself has no headers in it. It's all been turned into C++ source files with the headers copied directly into the source files themselves.
If we had put the sum() prototype in a header file (mymath.h) and the implementation in a source file (main.c), this all works:
#ifndef MYMATH_H
#define MYMATH_H
int sum(int lhs, int rhs);
#endif // MYMATH_H
You may be asking what's with the #ifdef/#define/#endif stuff? Due to the way the C and C++ work, it's quite likely that you'll end up including a header multiple times often because you'll include more than one file that themselves include the same file. The stuff surrounding sum() ensures that only the first include is processed by the compiler.
In C++, the preferred alternative is:
#pragma once
int sum(int lhs, int rhs);
You should always use one of these techniques in any header you write.
#include <stdio.h>
#include "mymath.h"
int main(int argc, char** argv) {
printf("2 + 3 = %d\n", sum(2, 3));
}
int sum(int lhs, int rhs) {
return lhs + rhs;
}
You'll note the angle brackets around the first include. This means that the compiler will search the predefined include folders. This usually means the "system" and "standard" headers. With quotes, the preprocessor searches relative to the directory of the source file.
In main(), we use sum() without it first having been declared in this file. Normally, we'd have to provide at least a prototype (or the complete implementation) before the function can be used. However, we have - just not in this file, but rather in mymath.h.
Let's view this code more or less in its final form before the compiler begins compiling it.
// <stdio.h> omitted but would otherwise be here
//ifndef MYMATH_H
//define MYMATH_H
int sum(int lhs, int rhs);
//endif // MYMATH_H
int main(int argc, char** argv) {
printf("2 + 3 = %d\n", sum(2, 3));
}
int sum(int lhs, int rhs) {
return lhs + rhs;
}
That's what the compiler "sees" (minus the comments, which I simply added above to make everything clear.)
In this rendition, the prototype exists before the function is used, which satisfies the compiler.
Especially astute readers might be wondering why we go through the trouble of having both headers and source files, when we could put everything in headers, and just use one source file with all the headers in it.
You'll quickly find that doesn't work, especially if you ever have multiple source files that include the same header. You'll get linker errors due to duplicate function implementations.
In general, you want to keep your implementation in source files and your struct or (C++) class declarations and function prototypes in headers, and the implementation for those things in an associated source file. It is possible to create "header only" libraries, and they can have some advantages as well as some drawbacks. Creating header only libraries is beyond the scope here.
There are some exceptions to this in C++, such as templates. When you declare a template, it is unrealistic to declare a prototype as well as an implementation, and as such there is no C++ standard for doing so. Ergo, all template code - including the implementation belongs in the header. Another case in C++ is where you have an inline function. You may put that implementation in the header, but you don't have to.
Your compiler creates one binary file for each source file in your project. It then takes those binaries and links them together, creating an amalgamation of those binaries into a single executable file.
Multiple source files allow you to organize your source code better, and also makes it more realistic to include source code from third parties into your project.
As steveb noted in the comments on this article, it also allows the C++ compiler to only recompile the source files it needs to, so if one changes it doesn't need to recompile all the others.
As mentioned before however, you have to design your header files with only prototypes and type declarations in them or you will not be able to use that header in more than one source file, because there will be duplicate implementations of the function, even though they are in different binaries. As soon as the linker attempts to link the binaries together it will fail because it found more than one copy of the function, and it doesn't know which one to use.
Generally speaking, when you create a header file, it should have an associated source file with the same base name, but different extension. For example, we might have a source file called mymath.c to compliment mymath.h:
#include "mymath.h"
int sum(int lhs, int rhs) {
return lhs + rhs;
}
Note that we included the associated header. This isn't always strictly necessary, but often it is, and it's good practice because it helps the reader understand which header file belongs to it.
Now in main.c, we need to remove the sum() implementation. When you compile the code, you'll specify both source files. Depending on your toolchain, you either pass multiple source files to the compiler which invokes the linker for you, or you compile each source file and run the linker yourself as a separate step.
Hopefully, this clears up some of the mystery behind C and C++ header and source files. Armed with this knowledge, you should be able to both better organize your projects, and understand other projects. Happy coding!