On Wed, Jul 27, 2005 at 11:16:58AM -0400, William Case wrote: > I am particularly confused by the concept that something is > "listening" on certain ports. I hope there isn't a little guy in my > computer with big ears, because he is probably dead by now. I > haven't been feeding him. No more than there are people singing inside your radio. :) > If someone wants to send me an email, their 'sendmail' calls on > various functions?? from a library?? that adds an IP and a TCP > header that contains my machines (and rogers.com) address and other > info to a datagram(s) . Sorta. The kernel itself handles the TCP negotiations over the IP protocol. I'm a bit rusty on C socket programming, but here's what the manpages suggest is happening on sendmail's side is. First, fd = socket(PF_INET, SOCK_STREAM, 0); which creates an IPv4 (PF_INET) socket, with error-checking and out-of-band capability (SOCK_STREAM). Since TCP is the only IP protocol that fits that specification, it doesn't need to specifically ask for TCP, it simply says '0' for "give me the right protocol". Now, sendmail needs to know who to talk to, in order to reach your domain. So it would do various DNS requests -- also using system calls, but the exact call to get an MX record eludes me. Let's stay it stores that in the_address. It might also want to resolve "smtp" to a port number. So it'd do serv = getservbyname("smtp"); the_port = serv.s_port; Then, it fills in the structure of the address it wants to connect to: addr.sin_family = AF_INET; addr.sin_addr.s_addr = inet_addr(the_address); addr.sin_port = htons(the_port); Finally, it makes the connection. connect(fd, addr, sizeof(struct sockaddr)); Ignoring the last part, it's now requesting that the socket we made (way up at the top) connect to you. That's actually all that sendmail does, aside from error-checking -- making sure it really got a socket, that it really found an address via DNS, that it found the right port, and that the connect() succeeded. After that, all sendmail does is poll the socket using select(), read from it using read() (when data is available), and write to it using write(). Everything else is up to the kernel. > In particular, on the TCP segment header bits 17 to 32 contain the > receivers (my) port number for smtp mail i.e. 25 - the standard smtp > port number. Yes. But it's a multi-stage process. It involves initiating the connecting (sending SYN), receiving the connection acknowledgement (SYN+ACK), waiting for the SMTP banner (PSH), acknowledging that and sending the HELO command (ACK+PSH), waiting for acknowledgement and response (ACK+PSH), etc. The mail is not just instantly sent out; there's plenty of negotiation going on. Generally, it goes <-- 220 (server banner) --> HELO (my-domain-name) <-- 250 (acknowledge command) --> MAIL FROM: <my_address@my_domain> <-- 250 (ditto) --> RCPT TO: <actual_recipient@domain> <-- 250 (ditto) ..> (repeat RCPT TO for all recipients, including CC and BCC) <.. 250 (ditto each time) --> DATA <-- 354 (acknowledgement and wait for mail body) --> (send entire message; period when done) --> . <-- 250 (acknowledge sending of mail) --> QUIT <-- 221 (acknowledge and disconnect) (connection closed by remote host) > The senders email is sent out; bounces around the internet for a bit > and ends up at rogers.com (my IP provider). The packets bounce around the Internet, yes. The mail is typically actually delivered directly to the rogers.com servers, unless the sending host is configured to send all mail via a relay. Then, it would hit the relay, who would go directly to you. Mail doesn't bounce around like packets do; typically, it's a very short or entirely direct process. > Next, after I check rogers.com using my emailer, Evolution, and see > that there is an email for me, I use Evolution (which uses > fetchmail) to download my mail (or at least its headers - I have a > pop account). I'm guessing Evolution connects and sees what you have, then lets fetchmail do the dirty work. Okay. > The message arrives through my cable modem as a series of 1 or 0 > electrical pulses. (Or, for a cable modem, does it still arrive as > analogue sound pulses that are translated by the modem into electric > (voltage) pulses?) Dunno for sure, but I'm pretty sure it's digital, to get the kind of speed it does. I'm not a hardware geek. As for the rest: I can see your confusion. Actually, fetchmail is designed to limit itself to the work of actually connecting to a server and retrieving the mail. It doesn't care about delivery whatsoever. Instead, it sends it to a *local* mail transport agent (MTA), which could be sendmail, Postfix, Exim, etc. It's then your MTA's job to deliver the mail. How it does this is up to it. For something like Postfix, it's a very divided step. First, fetchmail calls the actual MTA binary (or connects to a local port, not sure). Either way, the MTA receives the message, destined to your local mailbox. For something like Postfix, it would just put the message into a queue and let the later steps handle it. Actually delivering the message is *not* the responsibility of either the SMTP daemon (which is only concerned with talking to incoming clients) or the binary (which is only concerned with injecting mail into the system and returning control to the calling application). From there, various daemons within Postfix move the mail along, until finally the last daemon drops it in your mail spool file. sendmail might take a more direct approach, not sure. I'll see what I can do about the remaining questions... > The instant the message arrives it has to be stored somewhere > temporarily and a signal or interrupt has to be sent to the CPU to > tell the CPU to process a message. As the message arrives as 1's and 0's "down the pipe", fetchmail is waiting for it. It's actually using select(), asking the system to wake it up every time more data arrives to process. Then, it processes the data. It either stores it in memory, or perhaps just pipes it directly to the MTA. All this is internal to fetchmail. > Where is it stored? fetchmail either keeps it in memory until it's ready to send it to the MTA, or is piping it directly to the MTA, not sure. Your MTA writes it to a temporary file, internal to itself, until it's ready to dump it in your mailbox file. > Who or what sends the signal or interrupt? Initially, the Ethernet card talking to your Rogers modem. It notifies the kernel that it has data ready. > Does it use a stack or what?. I am assuming all of the above > happens before the message can even be allocated to a port number. > The CPU has to then call a library function that processes the TCP > segment header looking for the port number. -- ??? All I really know is, the kernel reads data from the Ethernet card. It processes it, determines it's IP, determines it's TCP, determines it's part of the fetchmail connection (looking at the addresses, ports, and sequence number), reads the data inside, wakes up fetchmail, and hands fetchmail the data on request. > When it finds the port number -- 25 in our example -- it has to > check against a table? of some kind for the memory address for port > 25. -- ??? Port 25 doesn't actually enter into this (the initial retrieval of the mail), since fetchmail is an outbound connection. But when fetchmail is ready to deliver, it connects to your local machine on port 25. Thing is, your MTA is already listening on port 25. That means it has registered its desire to accept connections. When a connection is made, the process is notified (generally by rousing it from its select() sleep state) and accept()s the connection. From that point on, the connection acts like a "pipe", where each end just reads and writes without concern for how the data is getting to the other end. The kernel's TCP stack's job is to ensure that data is sent and received, that the data arrives in proper order, and that it notifies the processes if it believes the connection is dead. > Is the memory address dynamic or static; i.e. is a permanent memory > address set aside when the port is originally opened or is it set as > needed? In either case would not the CPU have to call a special > routine to set port numbers to memory addresses and save that > somewhere? > > Another routine, as I understand it, would have to move the message from > its temporary location to the port memory address. -- ??. Dunno. I doubt it's per port. There's probably just a buffer space that holds stuff (probably per connection), and when a process asks for the data sitting on its connection, the data is returned. That's also why kernels typically impose a maximum number of open sockets -- you need buffer space and processing time for all those. > Before or after it goes into memory the TCP/IP segment headers have > to somehow be removed. --?? It probably just throws that stuff away and puts the data itself into the buffer. > Also, somehow another routine has to send another signal or > interrupt to tell the CPU to call the MTA ?? to begin processing > the mail. These processes are typically sitting in a select() loop, i.e. asleep (possibly for a limited amount of time, if they request a timeout for select()). To wake them up, the kernel simply lets them return from their select loop. From the results of select(), the MTA (or fetchmail) now knows that there's pending data to be read on the socket. So it'll run read(), retrieving the data from the buffer. Now the data is no longer the kernel's concern; it's handled inside the calling program. > As you can see from all the question marks, none of the manuals that > I can find are clear on what exactly is happening, but just use the > anthropomorphic metaphor of "listening". "Listening" is simply registering one's intent to handle all incoming connections on a certain port. So when you create a socket(), bind() it to a port, and listen(), the kernel knows you want to "listen" for connections on that port. When you then select() on your socket, the kernel will return you from your select() when a connection arrives. You accept() the connection, and then you have *two* sockets -- one which is the original listener, and one of which is now a direct streaming socket that behaves *independently* of your listening socket. If you shut down your listening socket, incoming connections to that port will be refused, but the connection you established will *remain*. Listening is simply the act of being able to accept new sockets. Once a socket is established, the kernel handles it independently. Hope this answers some questions.
Attachment:
signature.asc
Description: Digital signature