The Session Initiation Protocol (SIP) is an application-layer control (signaling) protocol for creating, modifying, and terminating multimedia sessions with one or more participants.
SIP can be used with other IETF protocols to build a complete multimedia architecture, such as the Real-time Transport Protocol (RTP) for transporting real-time data and providing QoS feedback, the Real-time streaming protocol (RTSP) for controlling delivery of streaming media, the Media Gateway Control Protocol (MEGACO) for controlling gateways to the Public Switched Telephone Network (PSTN) and the Session Description Protocol (SDP) for describing multimedia sessions.
The main features of the SIP protocol are:
Lightweight: about computational view, not about bandwidth.
Transport independent: SIP can be used with UDP, TCP, ATM & so on.
Text-based (UTF-8): SIP messages have a header with multiple header fields and a message body.
SIP is modular and uses other protocols to define media details (SDP) and voice timing and synchronization (RTP). The SDP protocol is inserted in the message body (e.g. In an INVITE message) and it contains the RTP UDP port number; classic SIP implementations use TCP port 5060 and/or UDP port 5060 to handle signaling and various UDP ports for RTP.
SIP is a peer-to-peer protocol; the peers in a session are called User Agents (UAs). A user agent can function in one of the following roles:
User agent client (UAC): a client application which initiates the SIP request.
User agent server (UAS): a server application which contacts the user when a SIP request is received and that returns a response on behalf the user.
When a SIP client initializes a communication, it sends to a server an INVITE request which contains the number of the UDP port used by RTP. If the client is behind a NAT the UDP port for RTP will be mapped in different mode, so SIP Server will be not able to send voice to the right port and this cause the SIP NAT Traversal problem.
There are four types of NATs:
Full Cone: all requests from the same internal IP address and port are mapped to the same external IP address and port. Any external host can send a packet to the internal host, by sending a packet to the mapped external address.
Restricted Cone: all requests from the same internal IP address and port are mapped to the same external IP address and port. Unlike a Full Cone NAT, an external host (with IP address X) can send a packet to the internal host only if the internal host had previously sent a packet to IP address X.
Port Restricted Cone: it's like a Restricted Cone NAT, but the restriction includes port numbers. An external host can send a packet, with source IP address X and source port P, to the internal host only if the internal host had previously sent a packet to IP address X and port P.
Symmetric: all requests from the same internal IP address and port, to a specific destination IP address and port, are mapped to the same external IP address and port. If the same host sends a packet with the same source address and port, but to a different destination, a different mapping is used. Only the external host that receives a packet can send an UDP packet back to the internal host.
If the client is behind one of the first three NAT types, the solution for NAT traversal is simple. The client must find out how its internal address:port pair looks to the world (i.e. The NAT mapping) and then it must put that information into the SDP message (instead of the information reflecting its internal address:port pair). There are two basic methods for a client to determine the NAT mapped public address:port pair; the first is to ask the NAT, the second is to ask someone outside the NAT what the actual address:port should be.
There are a lot of proposals about the solution:
UPnP: a client queries the NAT via UPnP (Universal Plug and Play) protocol asking what mapping it should use if it wants to receive on port P; the NAT responds with the address:port pair needed to reach the client on this port P. This solution doesn't work with Symmetric NATs and cascading NATs; furthermore, there is a huge installed base of existing NATs that do not support UPnP.
External query: a server sits listening for packets (call this a NAT probe); when it receives a packet from a client, it returns a message from the same port to the source of the received packet containing the address:port pair which it sees as the source of that packet. The client can then determine if it's behind a NAT and the public address:port pair it should use in the SDP message in order for the endpoint to reach it. This solution doesn't work with Symmetric NATs and cascading NATs.
STUN (Simple Traversal of UDP Through NATs): is a protocol for setting up the kind of NAT probe. It returns the public address:port pair and can also determine which kind of NAT the client is behind. Clients can set their SDP messages accordingly; the STUN server does not sit in the signaling or media data path. This solution doesn't work with Symmetric NATs.
Connection Oriented Media: the client must send out a RTP to, and receive RTP back from the same IP address. Any RTP connection between an endpoint outside a NAT and one inside a NAT must be established point to point. The endpoint outside the NAT must wait until it receives a packet from the client before it can know where to reply. The client informs the endpoint to wait for the incoming packet through the addition of the a=direction:active tag in the SDP message; this approach is useful in Symmetric NATs but is still problematic because there aren't many endpoints supporting this tag.
Symmetric RTP: the server simply “ignore” the RTP, UDP port indicated in the SDP of the INVITE request and always respond to the port from where it receives RTP traffic. This solution works well and is the de-facto solution used by Cisco Gateways to problems arising from symmetric NATs.
TURN (Trasversal Using Realy NATs): complements STUN and places the probe in the signaling and media path. The probe terminates the media for both endpoints so that the probe which detected the address:port pair is also the probe, which is sending the client media. In this way the symmetric issue is taken care of.
Media Relay: combines the strengths of both “Symmetric RTP” and the “TURN server”. The relay can send media packets to an endpoint on a port previously used to send a media packet to the relay. As opposed to the TURN server, since the relay has access to the SIP message this media port manipulation is quite trivial.
The following figure shows the various situation concerning NAT traversal problem with Abilis.
The cases 2 and 6 never work because a thrid part with public IP would be needed, while in the cases 3 and 7 there is no NAT.
Consider the other situations:
Case 1: UAC behind a NAT connected to public IP UAS. If UAC is behind a full cone NAT, restricted cone NAT, port restricted NAT then a simple STUN server may be used. In case of symmetric NAT, the UAC has to hope that UAS supports symmetric RTP, like CISCO gateways and Asterisk PBX do.
Case 4: public IP UAC connected to UAS behind a NAT. Similar but opposite to case 1.
Case 5: UAS behind a NAT connected to a public IP UAC. In this case NAT will contain a port forwarding for TCP port 5060 and UDP port 5060, and other UDP ports needed by RTP; the UAS will send SDP using the configured public IP address (it should be known), and public domain. Qualifying should be on to maintain a pinhole in the NAT.
Case 8: public IP UAS connected to a UAC behind a NAT. UAS should support “Symmetric RTP”.