Figure 1. The Layers of the TCP/IP Protocol Suite
The first, the link layer, is responsible for communicating with the actual network hardware (e.g., the Ethernet card). Data it receives off the network wire it hands to the network layer; data it receives from the network layer it puts on the network wire. This is where device drivers for different interfaces reside.
The second, the network layer, is responsible for figuring out how to get data to its destination. Making no guarantee about whether data will reach its destination, it just decides where the data should be sent.
The third, the transport layer, provides data flows for the application layer. It is at the transport layer where guarantees of reliability may be made.
The fourth, the application layer, is where users typically interact with the network. This is where telnet, ftp, email, IRC, etc. reside.
Packets are the basic unit of transmission on the Internet. They contain both data and header information. Simply put, headers generally consist of some combination of checksums, protocol identifiers, destination and source addresses, and state information. Each layer may add its own header information, so it can interpret the data the lower layer is handing it. In Figure 2, we see a sample Ethernet frame. This is the product of a packet which has gone from that application layer all the way to the link layer. Each layer takes the previous layer's packet, viewing almost all of it as data, and puts its own header on it.
Figure 2. A Sample Ethernet Frame
We will now examine each part in turn, with a particular emphasis on the network and transport layers. In examples that follow, we'll refer to two machines: swell.cs.umass.edu and cool.alaska.edu. swell is the machine we are on, cool is the destination computer. We assume cool and swell are on Ethernets at their respective organizations. Most of our examples assume an Ethernet, but could work with any kind of network (e.g., token-ring).
The benefit of separating out the hardware layer is that protocol implementors only have to write the network layer once. Then they provide a common interface to the network layer by writing different device drivers for each kind of network interface.
IP is able to get packets to their destinations because every network interface on the Internet has a unique, numeric address. Oddly enough, these numbers are called IP addresses. Notice, every interface has its own address. If a machine has multiple interfaces (as is the case with a router), each one has its own IP address. The Internic is responsible for assigning sets of addresses to organizations, thereby insuring uniqueness.
Because it's a pain to refer to machines with strings of numbers, the designers of TCP/IP allowed network administrators to associate names with IP addresses. Although this has nothing to do with the IP layer per se, we feel this is useful material. Originally, every host on the Internet maintained its own complete copy of this database (on Unix systems, it's in /etc/hosts). However, as the Internet reached its current size, this soon became unwieldly -- both in terms of raw size and the administrative nightmare of updating it. And so was born the domain name system (DNS). It is a distributed database of IP addresses and their natural language names, called host names. In fact one IP address can have multiple names associated with it. When a network administrator adds a new machine to her network, she is responsible for updating her organization's nameserver table. Her changes quickly propagate. All communication with a machine is done via IP numeric addresses, so the hostname for a machine is only used at the beginning of a connection.
The steps IP takes to send a packet are simple: based on its IP address, figure out how to get it there and send it on its way.
Deciding out how to get the packet there, aka routing, is the critical task for IP. Fortunately, swell doesn't have to know how to get a packet all the way to Alaska, it just needs to figure out which local router is responsible for getting packets to Alaska. A router differs from a typical machine on the net because it has at least two network interfaces -- this allows it to connect to two or more networks. For a small organization, there will typically be a local network (e.g., Ethernet) and then a leased-line link to the Internet. The organization's router is connected to both the local network and the Internet link. All packets bound for the Internet are sent to the router, which then puts it on the leased line, bound for the next router.
Each router only needs to know about the routers to which it is connected. Those routers then know about all the routers to which they are connected. This allows swell's local router to say, ``Well, all packets bound for the West go to MIT, so I'll just send it there and let MIT figure out what to do next.'' MIT puts it on a T3 line to Cleveland, from there it goes to Chicago, San Francisco, Seattle, and into Alaska, where it goes from the organization's router to the Ethernet interface on swell. The router at each hop is only concerned with where to send it next. It doesn't try to determine the full path which the packet will take.
To determine where a given packet will go next, machines on the Internet maintain routing tables. They consist of three major items: addresses of routers, addresses they can handle, and the interface to which they are connected. In the case of a machine on a local net (like cool.cs.umass.edu), it probably has three entries: one for the loopback interface (which allows a host to connect to itself), one for the local network and a default entry.
The local network entry lets IP know that the machine is directly connected to a certain set of IP addresses. Rather than try and route those packets, IP figures out the hardware address of the Ethernet interface to which the IP address corresponds and sends the packet there. With this entry, cool is essentially the router, the addresses it can handle are all the IP addresses on the local net, and the destination interface is an Ethernet card on the local net.
The default entry says ``for all other addresses, send it to this router.'' Instead of trying to deliver a packet for cool.alaska.edu to a machine on the local net, cool sends it to the router's interface, saying, ``Here, I don't know where this goes, you figure it out.'' The router then looks at its table, sees it doesn't have a direct connection to cool, so sends it to its default destination, MIT. And so the process continues.
At this point, the reader should have a rough idea of how packets are transmitted on the Internet. When receiving data, IP takes the packet from the link layer, checks for any blatant corruption, and hands the packet to the proper process at the transport layer. If there is any problem with the packet, IP silently discards it because it doesn't have to worry about whether a packet reaches its destination.
We have left a huge amount out of this picture. Here are just some of the issues we're ignoring: packet fragmentation, netmasks and other routing tricks, network error handling, and the interactions between the network and transport layer.
TCP creates a ``virtual circuit'' between two processes. It insures that packets are received in the order they are sent and that lost packets are retransmitted. We won't go into the details of how it works, but interactive programs like ftp and telnet use it.
So far we have discussed addressing on the host level -- how to identify a particular machine. But once at a machine, we need a way to identify a particular service (e.g., mail). This is the function of ports -- identification numbers included with every UDP or TCP packet. TCP/IP ports are not hardware-based. They are a just a way of labeling packets. A process on a machine ``listens'' on a particular port. When the transport layer receives a packet, it checks the port number and sends the data to the corresponding process. When a process starts up, it registers a port number with the TCP/IP stack. Only one process per protocol can listen on a given port. So while a process using UDP and one using TCP can both listen on port 111, two processes that both used TCP could not. There are a number of ports which are reserved for standard services. For example, SMTP, the mail protocol, is always on port 25, and telnetd is always on port 23. To see a list of the reserved ports on a Unix system, look at /etc/services.
We've examined how ports work on the server end -- specific ports are reserved for set tasks. On the initiator end, port assignment is dynamic. When a telnet client on swell starts up, it gets a new port number (e.g., 1066). This is the source port which swell's TCP layer puts on every packet. This allows the telnet daemon (telnetd) on cool to responds to the correct telnet process on swell. The combination of source/destination IP addresses and ports provides a unique conversation identifier. Each conversation is called a flow.
UDP is essentially IP with port numbers (flows). It gives the user access to IP-style datagrams. The network file system (NFS) and talk are two examples of UDP-based protocols.
This has been an extremely cursory exploration of TCP and UDP. At this point, you should have a decent understanding of how the network (IP) and transport (TCP/UDP) layers interact. We now turn to the final layer.
Telnet is used for remote login. It removes the need for hardwired terminals. A user on swell types ``telnet cool.alaska.edu'' and he is rapidly connected to cool.alaska.edu, which asks him to login. He can then interact with cool. Here's a breakdown of the process:
Copyright 1994 by Jason Yanowitz