IP Multimedia Subsystem Basics : SIP and SDP


Many different protocols have been developed for Voice over IP and conversational services. Some provide QoS mechanisms like resource reservation or the means for differentiating between services while others deal with the transfer and encoding of media. Two major standards for session control are H.323 (by ITU-T) and SIP (by IETF).

SIP was not specifically developed for IMS, it can be used to establish any kind of session between end-systems (referred to as User Agents in SIP). SIP allows the users to discover each other’s terminal capabilities and to negotiate and decide about session properties. Services like Presence, Messaging and Video conferencing can all be provided.

SIP has gained a large momentum since it was selected for session control in the IMS (IP Multimedia Subsystem) by 3GPP. IMS is a product of the 3G standardization and in Release 5 of the product it specifies an advanced architecture for providing IP Multimedia services to mobile subscribers using GPRS access. In Release 6 IMS was made ‘access independent’, allowing for example fixed access and WLAN access.

The SIP standard only specifies a protocol for session establishment and control. IMS on the other hand is a complete architecture for IP Multimedia services (using SIP). IMS specifies how security, charging and QoS should be managed. Traffic cases, network elements, interfaces and interworking with different external networks and accesses are all described. This article will describe the SIP protocol with reference to IMS. Familiarity with the IMS Network is assumed.

One important decision in 3GPP was to reuse protocols developed for the Internet rather than specifying new protocols for IMS. The benefit is that services and mechanisms developed for the Internet can be used seamlessly between IMS subscribers and Internet hosts. It was however not as straightforward as just selecting a set of existing standards. Many extensions were needed, taking into account the specific requirements of mobile subscribers and wireless access. 3GPP and IETF have worked together closely to extend SIP and its supporting protocols. Most of these extensions have general use and benefit not only IMS but also the Internet as a whole.

SIP messages explained

All SIP Phone calls consist of 2 channels:

  • Messaging – SIP
  • Media – RTP

SIP is a transport independent protocol that can run over any reliable or unreliable transport (e.g. TCP, SCTP or UDP). RFC3261 defines procedures for transport over UDP or TCP and states that “All SIP elements MUST implement UDP and TCP. SIP elements MAY implement other protocols”.

The media is usually transferred using RTP/UDP/IP. Real-time Transport Protocol (RTP) adds timestamps that are used to preserve the timing relationship between media frames. Other information added includes sequence numbers and payload type. Message Session Relay Protocol (MSRP) can be used in combinational services, for example to transfer media files in nonrealtime.

SIP Messages Listing

There are essentially 2 message types:

  1. Request messages, submitted using one of the SIP Methods listed below.
    • ACK – ACK is a final acknowledgement that terminates an INVITE transaction. It is sent by the UAC after receiving a final response from the UAS. ACK is in itself a transaction without a response.
    • BYE – BYE ends a session/dialog and triggers a release in the user plane and is typically sent when a user ends a call/session.
    • CANCEL – CANCEL can be used by the UAC to terminate a transaction before the final response has been received. It has no meaning after a final response has been generated.
    • INFO – INFO is used for transporting application level information, such as midcall events, via the SIP signalling path without changing the state of the session or the SIP dialog.
    • INVITE – INVITE establishes a dialog and is typically used to request a session with another User Agent. The INVITE will in most cases include an attached offer listing the media formats that the User Agent wants to use (codecs) and the IP address and port number where it wants to receive the media flow.
    • NOTIFY – NOTIFY is used by a Presence Server to send presence information about a user to other users. It can also be used for other types of event notifications, for example to send a Message Wait Indicator.
    • OPTIONS – OPTIONS allows a UA to query another UA or a proxy server as to its capabilities. This allows a client to discover information about the supported methods, content types, extensions, codecs, etc. without “ringing” the other party. If the response to an OPTIONS request is generated by a proxy server the proxy returns a 200 OK message listing the capabilities of the server. The response does not contain a message body. OPTIONS can be used for checking both signalling capabilities as well as media capabilities.
    • PRACK – Provisional responses are mapped to/from ISUP messages in PSTN interworking scenarios. This puts requirements for a reliability mechanism for
      provisional responses. RFC3262 defines the mechanism in IMS by the introduction of the PRACK message, a Provisional message Acknowledgement.
    • REFER – REFER indicates that the recipient (identified by the Request-URI) should contact a third party using the ‘refer-to’ information provided in the request. Refer can be used to enable many applications, including Call Transfer.
    • REGISTER – REGISTER is used to create a binding between the SIP address (Public User ID) of a user and the current UA contact address (IP address & port). Its purpose has been extended for IMS to include Authentication of the User.
    • SUBSCRIBE – SUBSCRIBE is used to request presence information updates about a user from a Presence Server. It can also be used for other types of event notifications also.
    • UPDATE – UPDATE is an extension to the SIP protocol that allows for updating the session during session establishment phase, without affecting the state of a SIP dialog. This makes it particularly useful for early media negotiations.
  2. Status messages, submitted with a numeric status indicator.
    • 1xx Provisional : Request received, continuing to process the request. Example : “100 Trying” or “180 Ringing” .
    • 2xx Success : The action was successfully received, understood, and accepted. Example : “200 OK” .
    • 3xx Redirection : Further action needs to be taken in order to complete the request. Example : “302 Moved Temporarily” .
    • 4xx Client Error : The request contains bad syntax or cannot be fulfilled at this server. Example : “401 Unauthorized” , “404 Not Found” , “486 Busy Here” .
    • 5xx Server Error : The server failed to fulfil an apparently valid request. Example : “500 Server Internal Error” and “501 Not Implemented” .
    • 6xx Global Failure : The request cannot be fulfilled at any server. Example : “606 Not Acceptable” .

The main SIP INVITE Header Fields explained

SIP messages are text encoded and the format has much in common with HTTP and SMTP. SIP messages consist of a “starting line”, a variable number of header fields, an empty line and an optional body part. The INVITE transaction is different from those of other methods because of its extended duration. Normally, human input is required in order to respond to an INVITE (someone has to answer/accept the session).

The image below shows a typical SIP INVITE.

sip IP Multimedia Subsystem Basics : SIP and SDP


The Request-URI indicates the user (or service/proxy) to which this request is being addressed and is used for message routing unless a Route header field is included in the received message.


The Via header field indicates the path taken by the request so far and indicates the path that should be followed in routing responses. A proxy inserts (adds) its own address in a Via header before
proxying a request, this guarantees that it stays in the signalling path also for the response.


Indicates the origin of the SIP message (UAC). Note: The “From” field is an informational field and it is not used for routing purposes, e.g. for telephony this field can be used to indicate Calling party-number.


Indicates the destination for the SIP message (UAS). Note: The “To” field is an informational field and it is not used for routing purposes, e.g. for telephony this field can be used to indicate Called party-number.


The SIP URI to which the sender (UAC) expects new requests to be addressed.


An optional header field listing the methods supported by the UA generating the message. If the Allow header is included it must list all supported methods. Example : ACK, BYE, CANCEL, INFO, INVITE, NOTIFY, OPTIONS, PRACK, REFER, REGISTER, SUBSCRIBE, UPDATE .


SIP URI – A SIP URI is a user’s SIP phone number. Can use a FQDN or an IP address. FQDN follows the <user>@<host>.<domain> notation. When an IP address is used the <host>.<domain> will be replaced with the IP address (resolved by DNS). It starts with the word “sip” followed by colon and the FQDN.


<sip: [email protected]>
<sip:[email protected] or
<sip: [email protected]>.

The Session Description Protocol (SDP)

SIP is used to deliver an INVITE to User B, inviting that user to a media session with User A. It does not specify details about the type of session or related data such as: media attributes, codecs, IP addresses & ports. This information is carried in the body of the SIP message.

SDP requires a transport protocol, in IMS this is usually SIP for session establishment. The media stream establishment between two SIP User Agents is based on the Session Description Protocol (SDP). Session Description Protocol specifies a way to describe a session in text format. However it was originally conceived as a way to describe multicast sessions. Since many VoIP sessions are unicast there is a need for the two participants to negotiate and agree on session properties both ways.

The Protocol explained

The SDP protocol describes the media related information for a session, i.e. IP address, port, codecs etc. As SDP does not have a transport protocol it is transported in other protocols, such as SIP. A description coded according to SDP is basically a text file with session properties. Each line is of the format <type>=<value> where the type is a one letter code representing a certain property. The SDP protocol consists of the following mandatory and optional (marked with*) types:

Session Description

v= (protocol version)
o= (owner/creator and session identifier)
s= (session name)
i=* (session information)
u=* (URI of description)
e=* (email address)
p=* (phone number)
c=* (connection information – not required if included in all media)
b=* (bandwidth information)
One or more time descriptions (see below)
z=* (time zone adjustments)
k=* (encryption key)
a=* (zero or more session attribute lines)
Zero or more media descriptions (see below)

Time description

t= (time the session is active)
r=* (zero or more repeat times)

Media description

m= (media name and transport address)
i=* (media title)
c=* (connection information – optional if included at session-level)
b=* (bandwidth information)
k=* (encryption key)
a=* (zero or more media attribute lines)

An example of a simple SDP body is shown below :

sdp 1024x570 IP Multimedia Subsystem Basics : SIP and SDP

Short description of each field

Version (v=):
Currently only v=0 exists as specified in RFC2327.

Origin (o=):
0= <username> <session id> <version> <network type> <address type> <address>.
The content of <username> is set to ‘-‘ (dash)
<session id> and <version> are numerical values
<network type> and <address type> indicates ‘IN’ (Internet) and ‘IP4’ (IPv4) respectively
<address> indicates IP address or FQDN of Origin.

Session name (s=):
Name of the session. Default value ‘-‘ (dash), must NOT be left empty.

Session description (i=):
Descriptive information about the session. This is an optional field and it is not considered necessary. If it is received it can be ignored.

URI (u=):
A URI is a Universal Resource Identifier as used by WWW clients. This is an optional field and it is not considered necessary. If it is received it can be ignored.

Email address (e=):
Contact information for the session responsible. The use of this field is optional.

Phone number (p=):
Contact information for the session responsible. The use of this field is optional.

Connection information (c=):
Connection information for the media.
c=<network type> <address type> <connection address>.
<network type> and <address type> indicates ‘IN’ (Internet) and ‘IP4’ (IPv4) respectively.
The <connection address> identifies the IP address where media should be received.

Bandwidth information (b=):
b=<modifier>:<bandwidth-value>, where <modifier> is a single alphanumeric word giving the meaning of the bandwidth figure and <bandwidth-value> is in kilobits per second.

Time zone (z=):
This is an optional field and it is not considered necessary. If it is received it can be ignored. If used, it is necessary to specify the difference from the originating time.

Encryption key (k=):
This is an optional field and it is not considered necessary. If it is received it can be ignored. SDP can be used to specify an encryption key for the whole session or for each media description, if the channel is secure and trusted.

Attributes (a=):
a=<attribute> or <attribute>:<value>.
An attribute can be a property attribute a=<flag> (e.g. a=recvonly) or a value attribute
a=<attribute>:<value> (e.g. a= orient:landscape).
Attributes can be “session level” or “media level” or both. Session level attributes apply to the whole session whereas Media attributes apply to a specific media. The following attributes are common in IMS:
• a=rtpmap: media attribute used with dynamic payload types
• a=ptime: media attribute, describing RTP Packetization
• a=ftmp:media attribute, used for format specific parameters
Session attributes are used for controlling media direction, for example for ‘Call hold’. It should be noted that this is typically used to update an existing session. (e.g. ‘sendrecv’->’sendonly’ or ‘recvonly’-> ’inactive’). If ‘sendonly’ is received in offer, ‘recvonly’ will be included in answer.

Time (t=):
t=<start time> <stop time>, the start and stop time for the session, used for scheduling media sessions.

Repeat (r=):
This is an optional field and it is not considered necessary. If it is received it can be ignored. If used, the default values will be specified in seconds, but other formats can be used.

Media (m=):
m=<media> <port> <transport> <fmt list>.
<media> can be “audio”, “video”, “application”, “data” or “control”
<port> describes the RTP port used for the media session.
<transport> denotes the transport used for the media, for voice calls ‘RTP/AVP’ is used and for fax calls also ‘udptl’ and ‘tcptl’ are supported.
<fmt list> specifies the list of payload formats that can be used for the session.

Payload Formats

For voice applications the following payload formats are supported:
• Payload types
• Telephone event
• “t38” may appear as a string value in the <fmt list>. “t38” is a MIME subtype of the media type “image”, so it will be single value in the <fmt list>. It is not a Payload Type of the RTP/AVP.
When more than one payload format is indicated it implies that any of these formats can be used for the session, however the first is the default.

Static & Dynamic Payload Types

There are two Payload Types, “static” and “dynamic”.
When a static payload type is used, the rtpmap attribute line in the Media Attribute (a) is not needed in the Session Description (but is often included).
When a dynamic payload type is used it must be complemented with an rtpmap attribute line, rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding params>]).
Media Attribute (a): rtpmap:34 H263/9000/1
Media Attribute (a): rtpmap:103 ISAC/16000


The SIP model, conclusions and recommendations

SIP has been developed with flexibility and extendibility in mind. SIP provides for the routing of session control messages between clients and it can be used to deliver a session description (SDP) or another object to an end system. SDP is typically used for Multimedia sessions to describe end points and to negotiate properties for the session.

SIP is different from most other IP applications in that it is a client-to-client communication model rather that client-to-server. Any user can send a request that the other user should respond to. Actually all endpoints have client and server functionality, requests will be sent by the client part of the terminal and responses will be sent by the server part.

SIP also differs from traditional telecom signaling in the way that the control is distributed. In ISDN control resides in the exchanges in the network and the terminals contain almost no intelligence. Signaling is also different for the access and inter-exchange communication. SIP on the other hand is an end-to-end protocol. SIP User Agents are intelligent and the Proxy Servers in the network execute less control, basically providing routing for the requests.

Article reviewed and approved by: Garais Gabriel
PhD Lecturer at "School of Computer Science for Business Management" - Romanian-American University

Article Autor/s:

Mihai Ciobanu