home | list info | list archive | date index | thread index

[bjb: Re: [OCLUG-Tech] Re: Parsing when compiling C - generalized understanding question?]

  • Subject: [bjb: Re: [OCLUG-Tech] Re: Parsing when compiling C - generalized understanding question?]
  • From: "Brenda J. Butler" <bjb [ at ] istop [ dot ] com>
  • Date: Thu, 29 Sep 2005 23:26:05 -0400
----- Forwarded message from "Brenda J. Butler" <bjb> -----

Date: Thu, 29 Sep 2005 23:25:34 -0400
From: "Brenda J. Butler" <bjb>
Subject: Re: [OCLUG-Tech] Re: Parsing when compiling C - generalized understanding question?
To: William Case <billlinux [ at ] rogers [ dot ] com>

On Thu, Sep 29, 2005 at 08:22:12PM -0400, William Case wrote:
> It seems that 'make' through makefile runs as a script that groups all
> the files that have to be compiled or re-compiled together so that they
> can be compiled or linked in one directory.  I know it does more but
> that appears to be the core idea.

Close.  Make is like a souped-up shell script that compiles
all your files for you.

The extra feature that make offers besides collecting a bunch
of commands in one place, is:

    dependencies:  Through the makefile, you can tell make
                   which executables or libs are created from
                   which sources.  So when a change is
                   made to certain sources, only the
                   affected objects, executables and libs are
                   re-generated.

> Somewhere in that process makefile must actually compile the various
> files, yet when I looked in the default makefile I could not see a
> command that would start compilation.  Does the default makefile use
> gcc, or, another perhaps built-in compiler?

Makefiles are, as you say, the config file for the make
program.  However, make has a lot of "built-in" rules
and dependencies.  It knows it has to run the c compiler
to change a .c file into a .o file (and it has a lot
of other rules built-in).  You can get make to dump out
all its default rules with an option:  make -p -f/dev/null

Makefiles have various types of entries.  Two of the most
common are macro definitions and rules.  A macro definition
looks like:

CFLAGS = -g -Wall

and a rule looks like:

target: objs
	$(CC) $(CFLAGS) objs -o target

The line (or lines) in the target indented by tabs are essentially
a little shell script.  Those lines really get executed by a shell
script.  In this case, you can see the c compiler is being called
to compile or link objs into target.

In some makefiles, the author relies entirely on built-in rules
and you might not see such rules unless you dump out the database.

It gets a lot more complicated, but that's a start for you.

> Now, suppose I got the compiler going (I have been using gcc -g foo.c -o
> foo). I then have the following questions:
> 
> Quoting from 'info gcc'.
> 
> "Compilation can involve up to four stages, always in the following
> order:
> 
> *preprocessing
> *compiling
> *assembling
> *linking
> 
> The first three stages apply to an individual source file:"
> 
> "preprocessing establishes the type of source code to process"
> 
> "preprocessor         A program invoked by various compilers to process
> code before compilation.  For example, the C preprocessor, cpp,
> handles textual macro substitution, conditional compilation and
> inclusion of other files."
> 
> I suppose from the above, preprocessing finds and/or adds the files in
> #include or replaces all the constants with the name with the value
> given in #define  value?
> 
> I am assuming that any #statements are what are called macros and the
> preprocessor takes care of macros?
> 
> Does preprocessor perform any other core functions? I have a list of
> options or commands.  When would I use them?  Or, should I just forget
> about them for the time being?

Preprocessors do text substitution, inclusion of other files, and
conditional inclusion/exclusion.  This gets done before the compiler
sees it.

Yes, the # statements are preprocessor directives, of which
macros are one kind.

> Is there a specific call in gcc to cpp to perform the preprocessing, or,
> does it have its own built-in preprocessor?

gcc calls the preprocessor.  You can ask gcc to stop after
preprocessing and dump out the results for you to inspect:

gcc -E input -o output

> If the gcc preprocessor is built-in when would I use cpp?
> 
> "compiling produces an object file,"
> 
> Brenda says:
> 
> "Well, the gcc is the Gnu Compiler Collection.  It has front ends for
> reading and parsing programs (a front end for each language), back ends
> for spitting out machine code (a back end for each architecture/os), and
> a middle section that connects the front end and back end and does
> optimization."
> 
> After preprocessing does gcc front end call flex and then yacc?

gcc calls the preprocessor.

Then gcc compiles.
The compilation step has those three components (front-end,
middle, back-end).
Then gcc calls the linker.

> In a few words, what is optimization -- all the blanks are gone; all the
> syntax has already been rearranged?

Bart could probably do a much better job answering this one,
but anyway:  things that might get done in the optimization step
are getting rid of intermediate variables if you can make do
with fewer, re-arranging the code to be more efficient,
etc.  Removal of dead code and unused variables.  Moving
code into or out of loops.  That kind of thing.

> Then does it call what?? Is there a separate program that does spitting
> out of machine code?  Or, is that part of the gcc coding?

The compile stage is all one program.

However, a fortran compiler might share a middle section and backend
with a C compiler.  How this is implemented, I'm not sure.  Does
it use shared objects (equivalent of dll as you mention above) or
does each compiler just get built from a shared set of sources?
Something like that.

> It sounds like the preprocessor, assembler and linker do everything?

For regular programs, written in a programming language
like fortran or C, yes.

> "assembling establishes the syntax that the compiler expects for
> symbols, constants, expressions and the general directives."
> 
> This sounds like the lexicon and syntax stage, but apparently that has
> been already done by the compiler?
> 
> It sounds like the preprocessor, compiler and linker do everything, why
> do I need the assembler?

Sometimes compiler backends produce assembly language rather
than actual machine code.  Then the last step is to assemble
the assembly language into machine code.

> If I print the file after the assember stage, I can see my source code
> changed into assembler code, can't I?  Shouldn't I see assembler code
> after the compiler is finished?
> 

You can get gcc to stop after that stage too:

gcc -S input.c -o input.S

You will see assembly language code (which is dressed-up
machine code).

> "The last stage, linking, completes the compilation
> process, combining all object files (newly compiled, and those
> specified as input) into an executable file."
> 
> Is this where the code (after being twisted into a meaningful lexicon
> and syntax, or assembler code) finally gets turned into binary?

No, it gets changed to binary in the compile stage, of which
assembly is the last part (it's the spit-out-machine-code part).

> How does the machines instruction set get used then?
> 
> "If you only want some of the stages of compilation, you can use -x (or
> filename suffixes) to tell gcc where to start, and one of the options
> -c, -S, or -E to say where gcc is to stop."
> 
> How would you do that?  Wouldn't you have to name the file+suffix?

See above...

> "ld combines a number of object and archive files, relocates their data
> and ties up symbol references. Usually the last step in compiling a
> program is to run ld."
> 
> Does gcc call ld or does it have its own linker?

gcc calls ld for you.

> Is this when the functions referenced through the header #include get
> added to the compiled program?

#include is a preprocessor step, happens before compilation.

> Is this just a reference or is the actual binary for the function added
> into the compiled file?

Hey!  We're way past two questions here!

> Isn't there something about dynamically linked files (dll) that applies
> here? What are they in relation to a compiled program?

dynamically linked files are libraries that get linked at
run time instead of at compile/link time.

> Lastly, why would a distribution like Fedora Core 4 install all this
> compiling paraphernalia?  Its not a vicesious question.  Is there some
> use for extra preprocessors, assemblers, etc. that I am unaware of? 

The software developers who make the distro leave it in.

"facetious"

> My claiming that a question is probably a stupid one is an old habit.  I
> used to work professionally analyzing provincial government programs
> and/or businesses in order to write communications plans or business
> plans.  When it came to asking the core questions I got in the habit of
> prefacing those questions with "This may be a dumb question."  It was
> partly a joke or tease.  If the question was genuinely core, the
> respondent usually got either very quiet or began to talk really fast.
> If it was really a stupid question and showed that I had missed the
> mark, then the joke was on me.
...
> My wife, if she were here, would reassure you that protecting Bill's ego
> is probably pretty low on the list of things that need to be done in
> this world.

Ok, but don't threaten to ask privately again !  :-)

cheerio,
bjb

----- End forwarded message -----