h264 stream internal
Obviously, the decoder operates with a sequence of bits received in a specific format. The binary stream is structured and divided into packets. On the upper level, there is separation of the stream on NAL-packets, and the stream has approximately the following form:
|Figure 1. Stream separation on NAL-packets|
The abbreviation NAL stands for Network Abstraction Layer. The packet structure is shown in Figure 2.
The first byte of a NAL-packet is a header that contains information about the type of packet. All the possible packet types are described in Table 1.
Table 1. NAL types
|1||Slice layer without partitioning non IDR|
|2||Slice data partition A layer|
|3||Slice data partition B layer|
|4||Slice data partition C layer|
|5||Slice layer without partitioning IDR|
|6||Additional information (SEI)|
|7||Sequence parameter set|
|8||Picture parameter set|
|9||Access unit delimiter|
|10||End of sequence|
|11||End of stream|
NAL-type defines what data structure is represented by current NAL-packet. It can be slice, or parameter set, or filler and so on.
|Figure 2. NAL-packet structure|
As can be seen from the figure, the payload of NAL-packet identified as RBSP (Raw Byte Sequence Payload). RBSP describes a row of bits specified order of SODB (String Of Data Bits).
So RBSP contains SODB. According to the ITU-T specification if SODB empty (zero bits in length), RBSP is also empty. The first byte of RBSP (most significant, far left) contains the eight bits SODB; next byte of RBSP shall contain the following eight SODB and so on, until there is less than eight bits SODB. This is followed by a stop-bits and equalizing bit (Figure 3)
|Figure 3. Raw Byte Sequence Payload (RBSP)|
Now let’s look closer to our bitstream:
|Figure 4. Detailed H.264 stream|
Any coded image contains slices, which in turn are divided into macroblocks. Most often, one encoded image corresponds to one slice. Also, one image can have multiple slices. The slices are divided into the following types:
Table 2. Slice types
|0||P-slice. Consists of P-macroblocks (each macro block is predicted using one reference frame) and / or I-macroblocks.|
|1||B-slice. Consists of B-macroblocks (each macroblock is predicted using one or two reference frames) and / or I-macroblocks.|
|2||I-slice. Contains only I-macroblocks. Each macroblock is predicted from previously coded blocks of the same slice.|
|3||SP-slice. Consists of P and / or I-macroblocks and lets you switch between encoded streams.|
|4||SI-slice. It consists of a special type of SI-macroblocks and lets you switch between encoded streams.|
Looks like table 2 contains some redundant data, But that is not true: types 5 – 9 mean that all other slices of the current image will be the same type.
As you noticed every slice consists of header and data. Slice header contains the information about the type of slice, the type of macroblocks in the slice, number of the slice frame. Also in the header contains information about the reference frame settings and quantification parameters. And finally the slice data – macroblocks. This is where our pixels are hiding.
Macroblocks are the main carriers of information, because they contain sets of luminance and chrominance components corresponding to individual pixels. Without going into details it can be concluded that the video decoding is ultimately reduced to the search and retrieval of macroblocks out of a bit stream with subsequent restoration of pixels colors with help of luminance and chrominance components. This is how single macroblock looks like:
|Figure 5. Macroblock|
Here we have macroblock type, prediction type (which is the subject of the next article), Coded Block Pattern, Quantization Parameter (if we have CPB) and finally – data: the sets of luminanceand chrominance components.
That is all for now. Next H.264 topic will be definitely dedicated to the macroblock prediction.
I hope you enjoyed this article. Feel free to comment or ask any questions. Good luck.