SF Senior Mod
Joined: 21 Sep 2003
|Posted: Sat Jan 14, 2006 5:46 pm Post subject: IPC: Ports, Services and Connections Explained
Ok, I see some confusion here, I will try to explain matters a bit. Some of this may already be known to some of you, but I believe the whole thing should be of interest to those with doubts on ports, IP, processes, sockets, and how they all relate.
IPC: Ports, Services and Connections Explained
Copyright 2006 Israel G. Lugo
Ports and sockets: Inter-process communication
When two processes want to share data, they need some vessel to transmit the data on. There are different types of inter-process communication (IPC) mechanisms available to them, some only applicable when the two are on the same machine (shared memory, for example, or anonymous pipes), some applicable for the general case - this includes named pipes and sockets. We will be looking into sockets here.
A socket is an IPC mechanism. It is an operating system resource that serves to let two processes communicate with each other (a process is a running program). These two processes may or may not be in the same machine.
Functionally, from the point of view of a process, a socket is a black box, that functions much like a kind of "special file". The process can write data to it, and the data will get sent to the other side - it will end up on the socket of the other process, available for reading.
Internally, sockets can use a wide array of protocols, both for actually transferring the data in the first place and for other functionalities - such as transport and addressing, for example ("who is this data directed at?"), or assuring reliability ("did the data get there intact?"), and so on. Describing the OSI seven layers model in detail is outside the scope of this post, interested readers should consult the Wikipedia article on the subject, which is linked at the beginning of this sentence.
On the Internet
In the context of the Internet, the common denominator in terms of protocols is IP (Internet Protocol). This is a network layer protocol, its function is to transfer a piece of data (a "packet") from host A to host B. Note that "host", in this context, refers to an online machine, specified by IP address. Communication at the IP level is made on a machine-wide basis, there is no concept of "port" in this protocol (or "connection" either, for that matter).
On top of the IP protocol, a transport protocol is generally used. The most common examples are of course TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). When used in conjunction with IP, they are referred to as TCP/IP and UDP/IP (meaning, respectively, "TCP over IP" and "UDP over IP", that is, TCP segments or UDP datagrams inside IP packets). The transport protocol is usually responsible, among other things, for providing more granular addressing (targeting a specific process on a machine), data integrity (in the case of TCP, not so in UDP)... In essence, it provides transparent transfer of data between end points.
It is at the TCP (and UDP) level that the concept of "port" arises. A port is simply a way of distinguishing between different connections to a given machine. Remember, IP only lets us target the machine itself (by IP number). Once data arrives to the machine, it needs to be sent to the appropriate process by the operating system. To specify the target process inside a given machine, port numbers are used. So, a port is simply a number that uniquely identifies a socket, within a given machine (and thus, the process that owns that socket).
So, you ask, how does this all work together, from a practical stance?
Let's say we've run an HTTP server program in machine A. The HTTP server wants to listen for incoming connections from other processes (the browsers), so it creates a socket (through special functions provided by the operating system). It tells the OS that it wants a TCP/IP socket for accepting incoming connections, and that it wants this socket to be identified by the port number 80. This number is completely arbitrary, it is only by convention that port number 80 is the one generally chosen for HTTP servers.
Assuming everything went well (that the server process was running with enough permissions, and that there were no system errors, etc) machine A's operating system will allocate internal resources to sustain the "socket", and it will register in its own internal tables that incoming TCP/IP connections destined to port number 80 are to be accepted, and their data directed to the HTTP server process.
On the other side, let's say we have a browser program running in machine B. The user enters http://A in the address field, and the browser knows this is an order to fetch the webpage from a server running on machine A. Since the user did not specify a port number within the target machine, the browser will assume the default convention for HTTP and use port 80. If the user wanted to specify a different port (because he knows the server at the target machine is listening on some non-standard port number), he could have typed http://A:123 or whatever.
So, the browser will create a socket of its own; it will tell the OS that it needs a TCP/IP socket. The OS will create the socket, and assign it a random port number (because the browser will usually not ask for any specific port number for itself). This is fine because the browser doesn't really need to be "reachable" by anyone; it just needs to be able to call out.
It may help if you think of port numbers inside a machine like phone extensions in an office building. The IP addresses would be the office building's general phone number. The server needs to have a recognizable port number (phone extension), because others want to be able to call it. The client doesn't really care what its port number (phone extension) is, because it just wants to call out to the server; it doesn't want to receive calls itself.
Making the connection
Now the browser has a newly created socket, with a random port number of its own. Let's assume the random port number is 1500. The browser will then ask the operating system to connect that socket to socket number 80 on machine A (that is, the TCP/IP socket on machine A to which port number 80 belongs to). Of course, it will first have to find out the IP address for machine A, but we won't go into details there (see the Wikipedia article on DNS for more details).
So, here is where the connection is started. Machine B's operating system will send a TCP/IP connection request directed at machine A, and to port 80 within machine A (because it was told to do so by the browser, remember). Concretely, it will send a TCP/IP packet with the destination IP set to machine A's IP, and the destination port set to 80. The source IP will of course be that of machine B, and the source port will be 1500 (in this example). To signal that this is a request to initiate a connection, a TCP flag on the packet will be set (the SYN flag).
When machine A's operating system receives the connection request, it will notice it has an existing TCP/IP socket identified by port number 80 and that this socket is accepting connections; as such, it will accept the connection. It will send an acknowledge back to machine B, that is, it will send a TCP/IP packet with the destination IP set to machine B's IP and the destination port set to 1500 (remember that was the source port number in machine B). The source IP will be machine A's, and the source port will be 80. The TCP/IP flags SYN and ACK will be set, signaling that this is a "connection accepted" packet.
An analogy that might be of help here is if you think of packets as envelopes, with a source and destination address. The server process on machine A wants to send data back to the browser process on machine B; to use an analogy, we could say Andrea, who lives in Building A, wants to send data to Bob, who lives in Building B. Andrea would represent the server process, and Bob would represent the browser process. In the envelope, Andrea would place Building B's address as destination, and Bob's name to identify him inside his home (there may be other people living there). For the source, she would put her own name (so that Bob knows who this came from) and her Building A's address. Likewise, the TCP/IP packet will have machine B's IP as destination address, destination port number 1500 to identify the browser process, and machine A's IP as source, along with source port 80 to identify the server process.
Back in machine B, the browser process will be told by the operating system that its socket was able to successfully connect to the server on machine A's port 80. From then on, the browser will start writing data to its own socket (an HTTP request for a webpage, in this case). This data will be sent through the socket to the other machine's socket and from then to the server process, who will respond with its own data (in this case, the contents of the webpage, or an error if it doesn't exist). And so on, until the connection is broken by either side.
Of course, if in the meantime other machines try to connect to the HTTP server through port 80, the same thing will happen. A TCP/IP socket which is in a connection-accepting state is not limited to only one single connection. The operating system maintains and tracks each connection (e.g. machine A <-> machine B, machine A <-> machine C) in independent fashion.
Rejected connections, or "closed" ports
What happens if a machine receives a connection request to a port number that doesn't match any existing socket in "accept connections" state? The operating system will simply reject the request (it has to, it would have no process to send the data to, because no process has requested use of that port). It will send a packet to the source indicating that the connection request has been refused (a TCP/IP packet with the RST and ACK flags turned on, to indicate that it has acknowledged the connection request but has turned it down).
Of course, a natural consequence of all this is the fact that you most certainly cannot "force" a port "open" remotely (it would not even make sense). A port is "open" when there is an existing socket on the machine, which is associated with that particular number and is set to accept connections. And a socket can only exist in the first place if it was created by some process running on the machine. Remember, a port is simply a way of identifying a given socket in a machine, and a socket is only a mechanism for two processes to communicate with each other.
Likewise, it makes no sense for two different processes to share the same port on the same machine. That would make it impossible for that machine's operating system to know to which process it should send incoming data on that port. Remember, the whole reason of a port number is to uniquely identify a process on a given machine.
Firewall: a filter
All right, we should now have a reasonable understanding of what a port is, and how a normal connection is established. Knowing this, it is not very difficult to find out what a firewall really does: it is a filter. Nothing more, nothing less. A software firewall will sit between the processes and the network interface, and it will look at incoming and outgoing traffic, deciding what it lets through either way. For each packet that it sees, it will check its own rules table to see if that packet should be allowed or denied.
Let's say the client machine, machine B, is running a software firewall. When the browser tries to connect to the server on machine A, it will send a specific TCP/IP packet, remember? Destined to machine A's IP, port number 80, with the SYN flag set (to signal that this is an request to initiate a connection). The firewall on machine B will see this outgoing packet, and it will look at its own rule table to see if it's allowed. It will check whether that particular program is allowed to initiate connections, and in particular to that specific machine and port. Assuming the firewall has been configured to allow normal browsing, it will see that the browser has permission to connect to any machine on port 80, and will allow the packet to be sent. Otherwise, it will simply deny the packet from being sent.
The firewall on the server side, machine A, will do a similar thing. It will see an incoming packet coming from machine B with a source port of 1500, with the destination IP set to machine A, and a destination port of 80. It will check its own internal rules, to see if packets with those characteristics should be allowed in. Assuming it's been properly configured, it will have a rule that tells it to allow incoming connection requests to this machine's port 80, from any source machine (we want any machine on the world to be able to connect to our server) and any source port (remember the browser can use any source port it wants). So it will simply let the packet in.
If the firewall sees something it should not accept (because it's in the rules that it should not accept it), such as a packet coming in to port 13245 (assuming there is no rule to accept incoming packets to port 12345), it will simply reject it. It will block the packet, and act as if it had never existed. The operating system will be unaware of the packet ever having arrived, and no reply will be sent (that is, no RST packet).
That's it, folks; your "Stealthed" ports, all that buzzword stuff that firewall vendors like to use. It's simply a filter. A port being "Stealthed" simply means that incoming packets that request to connect to that port are silently dropped by the firewall, before the OS sees them and has a chance to send a reply back (either "connection refused" or "connection accepted", depending on whether there was a socket on the machine that matched that port number). Not so glamorous when you call things by what they are, though, is it?
Further reading on software firewalls
If you are interested in a somewhat more in-depth look at the structure and architecture of a software firewall (and some of its inherent design flaws), you are invited to read the following SecurityFocus articles, written by myself and alt.don:
Software Firewalls: Made of Straw? Part 1 of 2
Software Firewalls: Made of Straw? Part 2 of 2
The articles focus on Windows software firewalls, but the same inherent principles are true for other operating systems as well (although of course the internal implementation would differ hugely).
Last edited by capi on Wed Jul 19, 2017 2:41 am; edited 9 times in total
SF Senior Mod
Joined: 21 Sep 2003
|Posted: Sun Jan 15, 2006 3:53 pm Post subject: Re: Ports, services and connections explained
|I feel this needs some further clarification, because many times, processes do sort of share a single port for connection establishment and sometimes even communication. Linux's Inetd is one example of such an implementation, were it multiplexes services by forwarding incoming connections to the "correct" binary. I am not hundred procent sure, but TCP-wrappers might serve as another example. A portmapper , commonly used by RPC-implementations, might be another? It is however correct that just a single service can be bound to a single port, but nothing keeps that service for multiplexing more services.
True; while a given socket (and thus its port) can only be associated with one process, there is nothing to stop that process from doing whatever it wants with the data it receives from the socket. In particular, there is nothing stopping the process from resending the data elsewhere; to the console, to a file, or to another socket (which may be communicating with a process on the same or another computer). This isn't really anything specific to sockets, though, any process will of course do whatever it was meant to do with its input and output (pending it has enough permissions, obviously). This would be, for example, what application-level proxys and port redirectors do.
While the above can't really be considered an exception to the statement that a given port can only be used by a single process, there is something that does qualify as an exception.
Consider a server process, listening on port X, that upon accepting a new connection on that port, forks a child process to deal with that connection - your typical multiprocess server. The child process will usually close the file descriptor of the socket which is accepting connections (because it doesn't want to accept new connections), however it will obviously keep the file descriptor of the socket that has the established connection (that's the whole point of its existance, to deal with that connection).
If one looks at two children of that parent server process, one can say that, externally, they are both sharing the same port; after all, the two are sending and receiving data using port 80. Of course, they are each sending and receiving to different ports on (possibly) different machines, but still, locally they are both using port 80.
I should probably rephrase the statement in question to reflect that a socket (and its associated port) can be shared by a process if it so chooses, with other processes launched by itself (that is, with other programs executed by it).