H.264 Frames Types:
H.264 streams include three types of frames (see Figure 5):
H.264 streams include three types of frames (see Figure 5):
- I-frames: Also known as key frames, I-frames are
completely self-referential and don't use information from any
other frames. These are the largest frames of the three, and the
highest-quality, but the least efficient
from a compression perspective. - P-frames: P-frames are "predicted" frames. When producing a P-frame, the encoder can look backwards to previous I or P-frames for redundant picture information. P-frames are more efficient than I-frames, but less efficient than B-frames.
- B-frames: B-frames are bi-directional predicted frames.
As you can see in Figure 5, this means that when producing
B-frames, the encoder can look both forwards and backwards for
redundant picture information. This makes B-frames the most
efficient
frame of the three. Note that B-frames are not available when producing using H. 264's Baseline Profile.
Figure 5. I, P, and B-frames in an H. 264-encoded stream
Now that you know the function of each frame type, I'll show you how to optimize their usage.
from a compression
perspective, they do perform two invaluable functions. First, all
playback of an H.264 video file has to start at an I-frame because
it's the only frame type that doesn't refer to any other frames during
encoding.
Since almost all streaming video may be played interactively, with the viewer dragging a slider around to different sections, you should include regular I-frames to ensure responsive playback. This is true when playing a video streamed from Flash Media Server, or one distributed via progressive download. While there is no magic number, I typically use an I-frame interval of 10 seconds, which means one I-frame every 300 frames when producing at 30 frames per second (and 240 and 150for 24 fps and 15 fps video, respectively).
The other function of an I-frame is to help reset quality at a scene change. Imagine a sharp cut from one scene to another. If the first frame of the new scene is an I-frame, it's the best possible frame, which is a better starting point for all subsequent P and B-frames looking for redundant information. For this reason, most encoding tools offer a feature called "scene change detection," or "natural key frames," which you should always enable.
Figure 6 shows the I-frame related controls from Flash Media Encoding Server. You can see that Enable Scene Change detection is enabled, and that the size of the Coded Video Sequence is 300, as in 300 frames. This would be simpler to understand if it simply said "I-frame interval," but it's easy enough to figure out.
Working with I-frames
Though I-frames are the least efficientSince almost all streaming video may be played interactively, with the viewer dragging a slider around to different sections, you should include regular I-frames to ensure responsive playback. This is true when playing a video streamed from Flash Media Server, or one distributed via progressive download. While there is no magic number, I typically use an I-frame interval of 10 seconds, which means one I-frame every 300 frames when producing at 30 frames per second (and 240 and 150
The other function of an I-frame is to help reset quality at a scene change. Imagine a sharp cut from one scene to another. If the first frame of the new scene is an I-frame, it's the best possible frame, which is a better starting point for all subsequent P and B-frames looking for redundant information. For this reason, most encoding tools offer a feature called "scene change detection," or "natural key frames," which you should always enable.
Figure 6 shows the I-frame related controls from Flash Media Encoding Server. You can see that Enable Scene Change detection is enabled, and that the size of the Coded Video Sequence is 300, as in 300 frames. This would be simpler to understand if it simply said "I-frame interval," but it's easy enough to figure out.
Figure 6. I-frame related controls from Flash Media Encoding Server
Specifically, the Coded Video Sequence refers to a "Group of
Pictures" or GOP , which is the building block of the H.264
stream—that is, each H.264 stream is composed of multiple GOPs. Each
GOP starts with an I-frame and includes all frames up to, but not
including, the next I-frame. By choosing a Coded Video Sequence size of
300, you're telling Flash Media Encoding Server to create a GOP of
300 frames, or basically the same as an I-frame interval of 300.
Briefly, the H.264 specification enables two types of I-frames: normal I-frames and IDR frames. With IDR frames, no frame after the IDR frame can refer back to anyframe before the IDR frame. In contrast, with regular I-frames, B and P-frames located after the I-frame can refer back to reference frames located before the I-frame.
In terms of random access within the video stream, playback can always starton an IDR frame because no frame refers to any frames
behind it. However, playback cannot always start on a non-IDR I-frame
because subsequent frames may reference previous frames.
Since one of the key reasons to insert I-frames into your video is to enable interactivity, I use the default setting of 1, which makes every I-frame an IDR frame. If you use a setting of 0, only the first I-frame in the video file will be an IDR frame, which could make the file sluggish during random access. A setting of 2 makes every second I-frame an IDR frame, while a setting of 3 makes every third I-frame an IDR frame, and so on. Again, I just use the default setting of 1.
Minimum IDR interval defines the minimum number of frames in a group of pictures. Though you've set the Size of Codec Video Sequence at 300, you also enabled Scene Change Detection, which allows the encoder to insert an I-frame at scene changes. In a very dynamic MTV-like sequence, this could result in very frequent I-frames, which could degrade overall video quality. For these types of videos, you could experiment with extending the minimum IDR interval to 30–60 frames, to see if this improved quality. For most videos, however, the default interval of 1 provides the encoder with the necessary flexibility to insert frequent I-Frames in short, highly dynamic periods, like an opening or closing logo. For this reason, I also use the default option of 1 for this control.
encoder , the most common B-frame related
control is simply the number of B-frames, or "B-Pictures" as shown in
Figure 6. Note that the number in Figure 6 actually refers to the
number of B-frames between consecutive I-frames or P-frames.
Using the value of 2 found in Figure 6, you would create a GOP that looks like this:
IBBPBBPBBPBB...... all the way to frame 300. If
the number of B-Pictures was 3, the encoder would insert
three B-frames between each I-frame and/or P-frame. While
there is no magic number, I typically use two sequential
B-frames.
How much canB -frames improve the quality of
your video? Figure 7 tells the tale. By way of background, this
is a frame at the end of a very-high-motion skateboard sequence, and
also has significant detail, particularly in the fencing
behind the skater. This combination of high motion and high
detail is unusual, and makes this frame very hard to encode.
As you can see in the figure, the video file encoded using
B-frames retains noticeably more detail than the file
produced without B-frames. In short, B-frames do improve quality.
IDR frames
I'll describe the Number of B-Pictures setting further on, and I've addressed Entropy Coding Mode already; but I wanted to explain the Minimum IDR interval and IDR frequency. I'll start by defining an IDR frame.Briefly, the H.264 specification enables two types of I-frames: normal I-frames and IDR frames. With IDR frames, no frame after the IDR frame can refer back to any
In terms of random access within the video stream, playback can always start
Since one of the key reasons to insert I-frames into your video is to enable interactivity, I use the default setting of 1, which makes every I-frame an IDR frame. If you use a setting of 0, only the first I-frame in the video file will be an IDR frame, which could make the file sluggish during random access. A setting of 2 makes every second I-frame an IDR frame, while a setting of 3 makes every third I-frame an IDR frame, and so on. Again, I just use the default setting of 1.
Minimum IDR interval defines the minimum number of frames in a group of pictures. Though you've set the Size of Codec Video Sequence at 300, you also enabled Scene Change Detection, which allows the encoder to insert an I-frame at scene changes. In a very dynamic MTV-like sequence, this could result in very frequent I-frames, which could degrade overall video quality. For these types of videos, you could experiment with extending the minimum IDR interval to 30–60 frames, to see if this improved quality. For most videos, however, the default interval of 1 provides the encoder with the necessary flexibility to insert frequent I-Frames in short, highly dynamic periods, like an opening or closing logo. For this reason, I also use the default option of 1 for this control.
Working with B-frames
B-frames are the most efficient frames because they can search both ways for redundancies. Though controls and control nomenclature varies from encoder toUsing the value of 2 found in Figure 6, you would create a GOP that looks like this:
IBBPBBPBBPBB...
How much can
Figure 7. File encoded with B-frames (left) and no B-frames (right)
What's the performance penalty on the decode side? I
ran a battery of cross-platform tests, primarily on older,
lower-power computers, measuring the CPU load required to play back a
file produced with the Baseline Profile (no B-frames), and a
file produced using the High Profile with B-frames. The
maximum differential that I saw was 10 percent, which isn't
enough to affect my recommendation to always use the High
Profile except when producing for devices that support only the Baseline
Profile.
Advanced B-frame options
Adobe Flash Media Encoding Server also includes the B and P-frame related controls shown in Figure 8. Adaptive B-frame placement allows the encoder to override the Number of B-Pictures value when it will enhance the quality of the encoded stream; for instance, when it detects a scene change and substitutes an I-frame for the B. I always enable this setting.
Reference B-Pictures lets the encoder to use
B-frames as a reference frame for P frames, while Allow pyramid
B-frame coding lets the encoder use B-frames as references for other
B-frames. I typically don't enable these options because the
quality difference is negligible, and I've noticed that these
options can cause playback to become unstable in some
environments.
Reference frames is the number of frames that the encoder can search for redundancies while encoding, which can impact both encoding time and decoding complexity; that is, when producing a B-frame or P-frame, if you used a setting of 10, the encoder would search until it found up to 10 frames with redundant information, increasing the search time. Moreover, if the encoder found redundancies in 10 frames, each of those frames would have to be decoded and in memory during playback, which increases decode complexity.
Intuitively, for most videos, the vast majority of redundancies are located in theframes most proximate to
the frame being encoded. This means that values in excess of 4 or 5
increase encoding time while providing little value. I
typically use a value of 4.
Finally, though it's not technically related to B-frames, consider the number of Slices per picture, which can be 1, 2, or 4. At a value of 4, the encoder divides each frame into four regions and searches for redundancies in other frames only within the respective region. This can accelerate encoding onmulticore computers because the encoder can
assign the regions to different cores. However, since redundant
information may have moved to a different region between frames—say
in a panning or tilting motion—encoding with multiple slices
may miss some redundancies, decreasing the overall quality of
the video.
In contrast, at the default value of 1, the encoder treats each frame as a whole, and searches for redundancies in the entire frame of potential reference frames. Since it's harder to split this task among multiple cores, this setting is slower, but also maximizes quality. Unless you're in a real hurry, I recommend the default value of 1.
Reference frames is the number of frames that the encoder can search for redundancies while encoding, which can impact both encoding time and decoding complexity; that is, when producing a B-frame or P-frame, if you used a setting of 10, the encoder would search until it found up to 10 frames with redundant information, increasing the search time. Moreover, if the encoder found redundancies in 10 frames, each of those frames would have to be decoded and in memory during playback, which increases decode complexity.
Intuitively, for most videos, the vast majority of redundancies are located in the
Finally, though it's not technically related to B-frames, consider the number of Slices per picture, which can be 1, 2, or 4. At a value of 4, the encoder divides each frame into four regions and searches for redundancies in other frames only within the respective region. This can accelerate encoding on
In contrast, at the default value of 1, the encoder treats each frame as a whole, and searches for redundancies in the entire frame of potential reference frames. Since it's harder to split this task among multiple cores, this setting is slower, but also maximizes quality. Unless you're in a real hurry, I recommend the default value of 1.
No comments:
Post a Comment