SPS in H264:
An h.264 bitstream contains a sequence of Network Abstraction Layer (NAL) units.
The SPS and PPS are both types of NAL units. The SPS NAL unit contains
parameters that apply to a series of consecutive coded video pictures,
referred to as a “coded video sequence” in the h.264 standard. The PPS
NAL unit contains parameters that apply to the decoding of one or more
individual pictures inside a coded video sequence.
/* h.264 bitstreams */
const uint8_t sps[] =
{0x00, 0x00, 0x00, 0x01, 0x67, 0x42, 0x00, 0x0a, 0xf8, 0x41, 0xa2};
const uint8_t pps[] =
{0x00, 0x00, 0x00, 0x01, 0x68, 0xce, 0x38, 0x80};
Let’s decode this into something readable from the spec. The first thing I did was to look at section 7 of the
h.264 specification.
I saw that at a minimum I had to choose how to fill in the SPS
parameters in the table below. In the table, as in the standard, the
type
u(n)
indicates an unsigned integer of n bits, and
ue(v)
indicates an unsigned
exponential-golomb coded value of a variable number of bits. The spec doesn’t seem to define the maximum number of bits anywhere, but the reference encoder software uses 32. (People wishing to explore the security of decoder software may find it interesting to violate this assumption!)
Parameter Name |
Type |
Value |
Comments |
forbidden_zero_bit |
u(1) |
0 |
Despite being forbidden, it must be set to 0! |
nal_ref_idc |
u(2) |
3 |
3 means it is “important” (this is an SPS) |
nal_unit_type |
u(5) |
7 |
Indicates this is a sequence parameter set |
profile_idc |
u(8) |
66 |
Baseline profile |
constraint_set0_flag |
u(1) |
0 |
We’re not going to honor constraints |
constraint_set1_flag |
u(1) |
0 |
We’re not going to honor constraints |
constraint_set2_flag |
u(1) |
0 |
We’re not going to honor constraints |
constraint_set3_flag |
u(1) |
0 |
We’re not going to honor constraints |
reserved_zero_4bits |
u(4) |
0 |
Better set them to zero |
level_idc |
u(8) |
10 |
Level 1, sec A.3.1 |
seq_parameter_set_id |
ue(v) |
0 |
We’ll just use id 0. |
log2_max_frame_num_minus4 |
ue(v) |
0 |
Let’s have as few frame numbers as possible |
pic_order_cnt_type |
ue(v) |
0 |
Keep things simple |
log2_max_pic_order_cnt_lsb_minus4 |
ue(v) |
0 |
Fewer is better. |
num_ref_frames |
ue(v) |
0 |
We will only send I slices |
gaps_in_frame_num_value_allowed_flag |
u(1) |
0 |
We will have no gaps |
pic_width_in_mbs_minus_1 |
ue(v) |
7 |
SQCIF is 8 macroblocks wide |
pic_height_in_map_units_minus_1 |
ue(v) |
5 |
SQCIF is 6 macroblocks high |
frame_mbs_only_flag |
u(1) |
1 |
We will not to field/frame encoding |
direct_8x8_inference_flag |
u(1) |
0 |
Used for B slices. We will not send B slices |
frame_cropping_flag |
u(1) |
0 |
We will not do frame cropping |
vui_prameters_present_flag |
u(1) |
0 |
We will not send VUI data |
rbsp_stop_one_bit |
u(1) |
1 |
Stop bit. I missed this at first and it caused me much trouble. |
A handy tool for decoding h.264 bitstreams, including the SPS, is the
h264bitstream
tool. It comes with a command line program that decodes a bitstream to
the parameter names defined in the h.264 specification. Let’s look at
its output for a sample mp4 file I downloaded from youtube. First, I extract the h.264 NAL units from the file using
ffmpeg:
ffmpeg.exe -i video.mp4 -vcodec copy -vbsf h264_mp4toannexb -an out.h264
The NAL units now reside in the file of.h264
. I then run the h264_analyze command from the h264bitstream package to produce the following output:
h264_analyze of.h264
!! Found NAL at offset 4 (0x0004), size 25 (0x0019)
==================== NAL ====================
forbidden_zero_bit : 0
nal_ref_idc : 3
nal_unit_type : 7 ( Sequence parameter set )
======= SPS =======
profile_idc : 100
constraint_set0_flag : 0
constraint_set1_flag : 0
constraint_set2_flag : 0
constraint_set3_flag : 0
reserved_zero_4bits : 0
level_idc : 31
seq_parameter_set_id : 0
chroma_format_idc : 1
residual_colour_transform_flag : 0
bit_depth_luma_minus8 : 0
bit_depth_chroma_minus8 : 0
qpprime_y_zero_transform_bypass_flag : 0
seq_scaling_matrix_present_flag : 0
log2_max_frame_num_minus4 : 3
pic_order_cnt_type : 0
log2_max_pic_order_cnt_lsb_minus4 : 3
delta_pic_order_always_zero_flag : 0
offset_for_non_ref_pic : 0
offset_for_top_to_bottom_field : 0
num_ref_frames_in_pic_order_cnt_cycle : 0
num_ref_frames : 1
gaps_in_frame_num_value_allowed_flag : 0
pic_width_in_mbs_minus1 : 79
pic_height_in_map_units_minus1 : 44
frame_mbs_only_flag : 1
mb_adaptive_frame_field_flag : 0
direct_8x8_inference_flag : 1
frame_cropping_flag : 0
frame_crop_left_offset : 0
frame_crop_right_offset : 0
frame_crop_top_offset : 0
frame_crop_bottom_offset : 0
vui_parameters_present_flag : 1
=== VUI ===
aspect_ratio_info_present_flag : 1
aspect_ratio_idc : 1
sar_width : 0
sar_height : 0
overscan_info_present_flag : 0
overscan_appropriate_flag : 0
video_signal_type_present_flag : 0
video_signal_type_present_flag : 0
video_format : 0
video_full_range_flag : 0
colour_description_present_flag : 0
colour_primaries : 0
transfer_characteristics : 0
matrix_coefficients : 0
chroma_loc_info_present_flag : 0
chroma_sample_loc_type_top_field : 0
chroma_sample_loc_type_bottom_field : 0
timing_info_present_flag : 1
num_units_in_tick : 100
time_scale : 5994
fixed_frame_rate_flag : 1
nal_hrd_parameters_present_flag : 0
vcl_hrd_parameters_present_flag : 0
low_delay_hrd_flag : 0
pic_struct_present_flag : 0
bitstream_restriction_flag : 1
motion_vectors_over_pic_boundaries_flag : 1
max_bytes_per_pic_denom : 0
max_bits_per_mb_denom : 0
log2_max_mv_length_horizontal : 11
log2_max_mv_length_vertical : 11
num_reorder_frames : 0
max_dec_frame_buffering : 1
=== HRD ===
cpb_cnt_minus1 : 0
bit_rate_scale : 0
cpb_size_scale : 0
initial_cpb_removal_delay_length_minus1 : 0
cpb_removal_delay_length_minus1 : 0
dpb_output_delay_length_minus1 : 0
time_offset_length : 0
The only additional thing I’d like to point out here is that this
particular SPS also contains information about the frame rate of the
video (see
timing_info_present_flag
). These parameters must
be closely checked when you generate bitstreams to ensure they agree
with the container format that the h.264 will eventually be muxed into.
Even a small error, such as 29.97 fps in one place and 30 fps in
another, can result in severe audio/video synchronization problems.