home | list info | list archive | date index | thread index

Re: [OCLUG-Tech] Re: Parsing when compiling C - generalized understanding question?

  • Subject: Re: [OCLUG-Tech] Re: Parsing when compiling C - generalized understanding question?
  • From: William Case <billlinux [ at ] rogers [ dot ] com>
  • Date: Thu, 29 Sep 2005 20:22:12 -0400
Thanks Brenda, Martin and Normand;

On Thu, 2005-09-29 at 13:28 -0400, Brenda J. Butler wrote:
> On Thu, Sep 29, 2005 at 01:05:17PM -0400, Martin Hicks wrote:
> > 
> > On Thu, Sep 29, 2005 at 12:24:45PM -0400, William Case wrote:
>  
> > > I am trying to develop an overview understanding of what happens when
> > > parsing any little C program I have written.  Where would I find the
> > > rules that the parser uses to translate specific symbols?  For example,
> > > I would like to see exactly what happens with the "{ }" braces or the
> > > ";" semi-colon when translated from source code to object code or
> > > binary.  But my question is not only about those two symbols alone.  I
> 
> Well, the gcc is the Gnu Compiler Collection.  It has front ends for reading
> and parsing programs (a front end for each language), back ends for spitting
> out machine code (a back end for each architecture/os), and a middle section
> that connects the front end and back end and does optimization.
> 
> The middle section deals with the code in a language- and machine-independent
> format.
> 
> So, you won't see the gcc compiler converting a { into machine or
> binary code.
> 
> Have fun with that :-)
> 
> ...
> 
> > Wow.  So this isn't just an easy question that someone can answer.  What
> > happens during parsing of a source language?  Lots.  Far too much to try
> > to explain, but here's a little summary of stuff you should go look
> > into.
> 
> If you *really* want to know more about how compilers work you should:

> Read the dragon book

I have heard of the dragon book and I hope to peruse it in the near
future.

> Read o'reilly's Lex and Yacc

More than I need to know I think.

> Take an undergrad compilers course at university

Me and universities will never mix.

> Read the gcc sources and be prepared to welcome death

I am ready to accept death soon.

I have no intention of ever building a compiler, although the logic of
using things like BNF or EBNF seems intriguing.  In my reading, for
example, I saw the rule of reducing *\ comments to one empty space.  I
was hoping that somewhere in the literature or easily accessible through
source code I could find other rules.  Brenda has supplied me with a
verbal answer about the use of braces and semi-colons and I don't
distrust any advice that she would give me.  However, as I was reading
about compiling I thought "Just maybe there is an easy way to see for
myself what is happening internally when the compiler translates."  It
wouldn't have been the first time that I had discovered that something
that I thought was hard to follow in Linux turned out to in fact be
easy.  Alas, apparently not this time.

> > > I have a couple of other short dumb questions about compiling. If
> > > someone is willing to answer them they can email me directly and I will
> > > send them to you?  I just think they might be boring for the people on
> > > this list, and embarrassingly stupid for me.  To be honest, most of
> > > these questions I could probably sort out for myself but I have already
> > > spent more than a day on getting a compiling overview and a little help
> > > would be gratefully received.
> > 
> > Ask the list.  If dumb questions get answered once then maybe, just
> > maybe, the next person who wants to ask the same dumb question will
> > search the archives first.

I am going to take you at your word and ask the following questions
about compiling.

OK, here goes!

In my info index I have at least the following files related to
compiling: make, gcc, flex, yacc, bison, cpp, ld, as, NASM, gdb, DDD and
I am sure many others I haven't recognized.

I have read at least the introduction and overview of all these files:
and much more for some of them.  The quoted text below is explanations I
have saved concerning the use of these programs either from Info, RedHat
manuals or Wikipedia.

My basic dumb question is: What are they all for?  
[That's a rhetorical question.]

To flesh out the question more:

It seems that 'make' through makefile runs as a script that groups all
the files that have to be compiled or re-compiled together so that they
can be compiled or linked in one directory.  I know it does more but
that appears to be the core idea.

Somewhere in that process makefile must actually compile the various
files, yet when I looked in the default makefile I could not see a
command that would start compilation.  Does the default makefile use
gcc, or, another perhaps built-in compiler?

Now, suppose I got the compiler going (I have been using gcc -g foo.c -o
foo). I then have the following questions:

Quoting from 'info gcc'.

"Compilation can involve up to four stages, always in the following
order:

*preprocessing

*compiling

*assembling

*linking

The first three stages apply to an individual source file:"


"preprocessing establishes the type of source code to process"

"preprocessor         A program invoked by various compilers to process
code before compilation.  For example, the C preprocessor, cpp,
handles textual macro substitution, conditional compilation and
inclusion of other files."

I suppose from the above, preprocessing finds and/or adds the files in
#include or replaces all the constants with the name with the value
given in #define  value?

I am assuming that any #statements are what are called macros and the
preprocessor takes care of macros?

Does preprocessor perform any other core functions? I have a list of
options or commands.  When would I use them?  Or, should I just forget
about them for the time being?

Is there a specific call in gcc to cpp to perform the preprocessing, or,
does it have its own built-in preprocessor?

If the gcc preprocessor is built-in when would I use cpp?

"compiling produces an object file,"

Brenda says:

"Well, the gcc is the Gnu Compiler Collection.  It has front ends for
reading and parsing programs (a front end for each language), back ends
for spitting out machine code (a back end for each architecture/os), and
a middle section that connects the front end and back end and does
optimization."

After preprocessing does gcc front end call flex and then yacc?

In a few words, what is optimization -- all the blanks are gone; all the
syntax has already been rearranged?

Then does it call what?? Is there a separate program that does spitting
out of machine code?  Or, is that part of the gcc coding?

It sounds like the preprocessor, assembler and linker do everything?

"assembling establishes the syntax that the compiler expects for
symbols, constants, expressions and the general directives."

This sounds like the lexicon and syntax stage, but apparently that has
been already done by the compiler?

It sounds like the preprocessor, compiler and linker do everything, why
do I need the assembler?

If I print the file after the assember stage, I can see my source code
changed into assembler code, can't I?  Shouldn't I see assembler code
after the compiler is finished?

"The last stage, linking, completes the compilation
process, combining all object files (newly compiled, and those
specified as input) into an executable file."

Is this where the code (after being twisted into a meaningful lexicon
and syntax, or assembler code) finally gets turned into binary?

How does the machines instruction set get used then?

"If you only want some of the stages of compilation, you can use -x (or
filename suffixes) to tell gcc where to start, and one of the options
-c, -S, or -E to say where gcc is to stop."

How would you do that?  Wouldn't you have to name the file+suffix?

"ld combines a number of object and archive files, relocates their data
and ties up symbol references. Usually the last step in compiling a
program is to run ld."

Does gcc call ld or does it have its own linker?

Is this when the functions referenced through the header #include get
added to the compiled program?

Is this just a reference or is the actual binary for the function added
into the compiled file?

Isn't there something about dynamically linked files (dll) that applies
here? What are they in relation to a compiled program?


> I agree.  We don't usually find your questions dumb anyway, often
> they seem simple but spark some interesting discussions.
> 
> So if you finally do ask a dumb one, you've earned the right :-)
> 
> cheerio,
> bjb

Lastly, why would a distribution like Fedora Core 4 install all this
compiling paraphernalia?  Its not a vicesious question.  Is there some
use for extra preprocessors, assemblers, etc. that I am unaware of? 

Regards Bill

P.S. For Brenda. Brenda I appreciate your comforting me and my small ego
with the reassurances that you have given me on several occasions that
my questions are not dumb or stupid.  If you can keep a secrete I will
make a confession.

My claiming that a question is probably a stupid one is an old habit.  I
used to work professionally analyzing provincial government programs
and/or businesses in order to write communications plans or business
plans.  When it came to asking the core questions I got in the habit of
prefacing those questions with "This may be a dumb question."  It was
partly a joke or tease.  If the question was genuinely core, the
respondent usually got either very quiet or began to talk really fast.
If it was really a stupid question and showed that I had missed the
mark, then the joke was on me.

Friends who know me usually answer one of three ways:
1) By responding thoughtfully,
2) By listening carefully and then pointing out that I have mis-posed
the question and should have asked an alternative one, or,
3) By saying something like "Yea, your right that is a really stupid
question."

My wife, if she were here, would reassure you that protecting Bill's ego
is probably pretty low on the list of things that need to be done in
this world.

Thanks for all your help.