home | list info | list archive | date index | thread index

Re: [OCLUG-Tech] Re TCP Ports -- and the little guy listening

  • Subject: Re: [OCLUG-Tech] Re TCP Ports -- and the little guy listening
  • From: Adrian Irving-Beer <wisq-oclug [ at ] wisq [ dot ] net>
  • Date: Wed, 27 Jul 2005 12:23:41 -0400
On Wed, Jul 27, 2005 at 11:16:58AM -0400, William Case wrote:

> I am particularly confused by the concept that something is
> "listening" on certain ports.  I hope there isn't a little guy in my
> computer with big ears, because he is probably dead by now.  I
> haven't been feeding him.

No more than there are people singing inside your radio. :)

> If someone wants to send me an email, their 'sendmail' calls on
> various functions?? from a library?? that adds an IP and a TCP
> header that contains my machines (and rogers.com) address and other
> info to a datagram(s) .

Sorta.  The kernel itself handles the TCP negotiations over the IP
protocol.  I'm a bit rusty on C socket programming, but here's what the
manpages suggest is happening on sendmail's side is.  First,

	fd = socket(PF_INET, SOCK_STREAM, 0);

which creates an IPv4 (PF_INET) socket, with error-checking and
out-of-band capability (SOCK_STREAM).  Since TCP is the only IP protocol
that fits that specification, it doesn't need to specifically ask for
TCP, it simply says '0' for "give me the right protocol".

Now, sendmail needs to know who to talk to, in order to reach your
domain.  So it would do various DNS requests -- also using system calls,
but the exact call to get an MX record eludes me.  Let's stay it stores
that in the_address.

It might also want to resolve "smtp" to a port number.  So it'd do

	serv = getservbyname("smtp");
	the_port = serv.s_port;

Then, it fills in the structure of the address it wants to connect to:

	addr.sin_family = AF_INET;
	addr.sin_addr.s_addr = inet_addr(the_address);
	addr.sin_port = htons(the_port);

Finally, it makes the connection.

	connect(fd, addr, sizeof(struct sockaddr));

Ignoring the last part, it's now requesting that the socket we made
(way up at the top) connect to you.  That's actually all that sendmail
does, aside from error-checking -- making sure it really got a socket,
that it really found an address via DNS, that it found the right port,
and that the connect() succeeded.

After that, all sendmail does is poll the socket using select(), read
from it using read() (when data is available), and write to it using
write().  Everything else is up to the kernel.

> In particular, on the TCP segment header bits 17 to 32 contain the
> receivers (my) port number for smtp mail i.e. 25 - the standard smtp
> port number.

Yes.  But it's a multi-stage process.  It involves initiating the
connecting (sending SYN), receiving the connection acknowledgement
(SYN+ACK), waiting for the SMTP banner (PSH), acknowledging that and
sending the HELO command (ACK+PSH), waiting for acknowledgement and
response (ACK+PSH), etc.

The mail is not just instantly sent out; there's plenty of
negotiation going on.  Generally, it goes

	<-- 220 (server banner)
	--> HELO (my-domain-name)
	<-- 250 (acknowledge command)
	--> MAIL FROM: <my_address@my_domain>
	<-- 250 (ditto)
	--> RCPT TO: <actual_recipient@domain>
	<-- 250 (ditto)
	..> (repeat RCPT TO for all recipients, including CC and BCC)
	<.. 250 (ditto each time)
	--> DATA
	<-- 354 (acknowledgement and wait for mail body)
	--> (send entire message; period when done)
	--> .
	<-- 250 (acknowledge sending of mail)
	--> QUIT
	<-- 221 (acknowledge and disconnect)

	(connection closed by remote host)
	
> The senders email is sent out; bounces around the internet for a bit
> and ends up at rogers.com (my IP provider).

The packets bounce around the Internet, yes.  The mail is typically
actually delivered directly to the rogers.com servers, unless the
sending host is configured to send all mail via a relay.  Then, it
would hit the relay, who would go directly to you.  Mail doesn't
bounce around like packets do; typically, it's a very short or
entirely direct process.

> Next, after I check rogers.com using my emailer, Evolution, and see
> that there is an email for me, I use Evolution (which uses
> fetchmail) to download my mail (or at least its headers - I have a
> pop account).

I'm guessing Evolution connects and sees what you have, then lets
fetchmail do the dirty work.  Okay.

> The message arrives through my cable modem as a series of 1 or 0
> electrical pulses. (Or, for a cable modem, does it still arrive as
> analogue sound pulses that are translated by the modem into electric
> (voltage) pulses?)

Dunno for sure, but I'm pretty sure it's digital, to get the kind of
speed it does.  I'm not a hardware geek.

As for the rest:

I can see your confusion.  Actually, fetchmail is designed to limit
itself to the work of actually connecting to a server and retrieving
the mail.  It doesn't care about delivery whatsoever.  Instead, it
sends it to a *local* mail transport agent (MTA), which could be
sendmail, Postfix, Exim, etc.

It's then your MTA's job to deliver the mail.  How it does this is up
to it.  For something like Postfix, it's a very divided step.  First,
fetchmail calls the actual MTA binary (or connects to a local port,
not sure).  Either way, the MTA receives the message, destined to your
local mailbox.

For something like Postfix, it would just put the message into a queue
and let the later steps handle it.  Actually delivering the message is
*not* the responsibility of either the SMTP daemon (which is only
concerned with talking to incoming clients) or the binary (which is
only concerned with injecting mail into the system and returning
control to the calling application).  From there, various daemons
within Postfix move the mail along, until finally the last daemon
drops it in your mail spool file.

sendmail might take a more direct approach, not sure.

I'll see what I can do about the remaining questions...

> The instant the message arrives it has to be stored somewhere
> temporarily and a signal or interrupt has to be sent to the CPU to
> tell the CPU to process a message.

As the message arrives as 1's and 0's "down the pipe", fetchmail is
waiting for it.  It's actually using select(), asking the system to
wake it up every time more data arrives to process.  Then, it
processes the data.  It either stores it in memory, or perhaps just
pipes it directly to the MTA.  All this is internal to fetchmail.

> Where is it stored?

fetchmail either keeps it in memory until it's ready to send it to the
MTA, or is piping it directly to the MTA, not sure.  Your MTA writes
it to a temporary file, internal to itself, until it's ready to dump
it in your mailbox file.

> Who or what sends the signal or interrupt?

Initially, the Ethernet card talking to your Rogers modem.  It
notifies the kernel that it has data ready.

> Does it use a stack or what?.  I am assuming all of the above
> happens before the message can even be allocated to a port number.
> The CPU has to then call a library function that processes the TCP
> segment header looking for the port number. -- ???

All I really know is, the kernel reads data from the Ethernet card. It
processes it, determines it's IP, determines it's TCP, determines it's
part of the fetchmail connection (looking at the addresses, ports, and
sequence number), reads the data inside, wakes up fetchmail, and hands
fetchmail the data on request.

> When it finds the port number -- 25 in our example -- it has to
> check against a table? of some kind for the memory address for port
> 25. -- ???

Port 25 doesn't actually enter into this (the initial retrieval of the
mail), since fetchmail is an outbound connection.  But when fetchmail
is ready to deliver, it connects to your local machine on port 25.

Thing is, your MTA is already listening on port 25.  That means it has
registered its desire to accept connections.  When a connection is made,
the process is notified (generally by rousing it from its select() sleep
state) and accept()s the connection.

From that point on, the connection acts like a "pipe", where each end
just reads and writes without concern for how the data is getting to
the other end.  The kernel's TCP stack's job is to ensure that data is
sent and received, that the data arrives in proper order, and that it
notifies the processes if it believes the connection is dead.

> Is the memory address dynamic or static; i.e. is a permanent memory
> address set aside when the port is originally opened or is it set as
> needed?  In either case would not the CPU have to call a special
> routine to set port numbers to memory addresses and save that
> somewhere?
>
> Another routine, as I understand it, would have to move the message from
> its temporary location to the port memory address. -- ??.

Dunno.  I doubt it's per port.  There's probably just a buffer space
that holds stuff (probably per connection), and when a process asks
for the data sitting on its connection, the data is returned.  That's
also why kernels typically impose a maximum number of open sockets --
you need buffer space and processing time for all those.

> Before or after it goes into memory the TCP/IP segment headers have
> to somehow be removed. --??

It probably just throws that stuff away and puts the data itself into
the buffer.

> Also, somehow another routine has to send another signal or
> interrupt to tell the CPU to call the MTA ?? to begin processing
> the mail.

These processes are typically sitting in a select() loop, i.e. asleep
(possibly for a limited amount of time, if they request a timeout for
select()).  To wake them up, the kernel simply lets them return from
their select loop.  From the results of select(), the MTA (or fetchmail)
now knows that there's pending data to be read on the socket.

So it'll run read(), retrieving the data from the buffer.  Now the
data is no longer the kernel's concern; it's handled inside the
calling program.

> As you can see from all the question marks, none of the manuals that
> I can find are clear on what exactly is happening, but just use the
> anthropomorphic metaphor of "listening".

"Listening" is simply registering one's intent to handle all incoming
connections on a certain port.  So when you create a socket(), bind() it
to a port, and listen(), the kernel knows you want to "listen" for
connections on that port.  When you then select() on your socket, the
kernel will return you from your select() when a connection arrives.

You accept() the connection, and then you have *two* sockets -- one
which is the original listener, and one of which is now a direct
streaming socket that behaves *independently* of your listening socket.
If you shut down your listening socket, incoming connections to that
port will be refused, but the connection you established will *remain*.

Listening is simply the act of being able to accept new sockets.  Once a
socket is established, the kernel handles it independently.


Hope this answers some questions.

Attachment: signature.asc
Description: Digital signature