H.264 Frames  Types:
H.264 streams include three types of frames (see Figure 5):
H.264 streams include three types of frames (see Figure 5):
-  I-frames: Also known as key frames, I-frames are 
completely      self-referential and don't use information from any 
other frames. These are the      largest frames of the three, and the 
highest-quality, but the least efficient      
from a compression perspective. - P-frames: P-frames are "predicted" frames. When producing a P-frame, the encoder can look backwards to previous I or P-frames for redundant picture information. P-frames are more efficient than I-frames, but less efficient than B-frames.
 -  B-frames: B-frames are bi-directional predicted frames. 
As you      can see in Figure 5, this means that when producing 
B-frames, the encoder can      look both forwards and backwards for 
redundant picture information. This makes      B-frames the most 
efficient 
frame of the three. Note that B-frames are not available when producing using H. 264's Baseline Profile. 

Figure 5. I, P, and B-frames in an H. 264-encoded stream
Now that you know the function    of each frame type, I'll show you how to optimize their usage.
from  a compression 
perspective, they do perform two invaluable    functions. First, all 
playback of an H.264 video file has to start at an I-frame    because 
it's the only frame type that doesn't refer to any other frames during  
  encoding.
Since almost all streaming video may be played interactively, with the viewer dragging a slider around to different sections, you should include regular I-frames to ensure responsive playback. This is true when playing a video streamed from Flash Media Server, or one distributed via progressive download. While there is no magic number, I typically use an I-frame interval of 10 seconds, which means one I-frame every 300 frames when producing at 30 frames per second (and 240 and 150for  24 fps    and 15 fps video, respectively).
The other function of an I-frame is to help reset quality at a scene change. Imagine a sharp cut from one scene to another. If the first frame of the new scene is an I-frame, it's the best possible frame, which is a better starting point for all subsequent P and B-frames looking for redundant information. For this reason, most encoding tools offer a feature called "scene change detection," or "natural key frames," which you should always enable.
Figure 6 shows the I-frame related controls from Flash Media Encoding Server. You can see that Enable Scene Change detection is enabled, and that the size of the Coded Video Sequence is 300, as in 300 frames. This would be simpler to understand if it simply said "I-frame interval," but it's easy enough to figure out.
Working with I-frames
Though I-frames are the least efficientSince almost all streaming video may be played interactively, with the viewer dragging a slider around to different sections, you should include regular I-frames to ensure responsive playback. This is true when playing a video streamed from Flash Media Server, or one distributed via progressive download. While there is no magic number, I typically use an I-frame interval of 10 seconds, which means one I-frame every 300 frames when producing at 30 frames per second (and 240 and 150
The other function of an I-frame is to help reset quality at a scene change. Imagine a sharp cut from one scene to another. If the first frame of the new scene is an I-frame, it's the best possible frame, which is a better starting point for all subsequent P and B-frames looking for redundant information. For this reason, most encoding tools offer a feature called "scene change detection," or "natural key frames," which you should always enable.
Figure 6 shows the I-frame related controls from Flash Media Encoding Server. You can see that Enable Scene Change detection is enabled, and that the size of the Coded Video Sequence is 300, as in 300 frames. This would be simpler to understand if it simply said "I-frame interval," but it's easy enough to figure out.

Figure 6. I-frame related controls from Flash Media Encoding Server
Specifically, the Coded Video    Sequence refers to a "Group of
 Pictures" or GOP , which is the    building block of the H.264 
stream—that is, each H.264 stream is composed of    multiple GOPs. Each 
GOP starts with an I-frame and includes all frames up to,    but not 
including, the next I-frame. By choosing a Coded Video Sequence size of 
   300, you're telling Flash Media Encoding Server to create a GOP of 
300 frames,    or basically the same as an I-frame interval of 300.
Briefly, the H.264 specification enables two types of I-frames: normal I-frames and IDR frames. With IDR frames, no frame after the IDR frame can refer back to anyframe  before the IDR frame. In contrast, with regular I-frames, B and P-frames located after the I-frame can refer back to reference frames located before the    I-frame.
In terms of random access within the video stream, playback can always starton  an IDR frame because no    frame refers to any frames 
behind it. However, playback cannot always start on    a non-IDR I-frame
 because subsequent frames may reference previous frames.
Since one of the key reasons to insert I-frames into your video is to enable interactivity, I use the default setting of 1, which makes every I-frame an IDR frame. If you use a setting of 0, only the first I-frame in the video file will be an IDR frame, which could make the file sluggish during random access. A setting of 2 makes every second I-frame an IDR frame, while a setting of 3 makes every third I-frame an IDR frame, and so on. Again, I just use the default setting of 1.
Minimum IDR interval defines the minimum number of frames in a group of pictures. Though you've set the Size of Codec Video Sequence at 300, you also enabled Scene Change Detection, which allows the encoder to insert an I-frame at scene changes. In a very dynamic MTV-like sequence, this could result in very frequent I-frames, which could degrade overall video quality. For these types of videos, you could experiment with extending the minimum IDR interval to 30–60 frames, to see if this improved quality. For most videos, however, the default interval of 1 provides the encoder with the necessary flexibility to insert frequent I-Frames in short, highly dynamic periods, like an opening or closing logo. For this reason, I also use the default option of 1 for this control.
encoder , the most common B-frame    related 
control is simply the number of B-frames, or "B-Pictures" as    shown in
 Figure 6. Note that the number in Figure 6 actually refers to the    
number of B-frames between consecutive I-frames or P-frames.
Using the value of 2 found in Figure 6, you would create a GOP that looks like this:
IBBPBBPBBPBB...... all the way to frame 300. If
            the number of B-Pictures was 3, the encoder would insert 
three B-frames between            each I-frame and/or P-frame. While 
there is no magic number, I typically use            two sequential 
B-frames.
How much canB -frames improve            the quality of 
your video? Figure 7 tells the tale. By way of background, this         
   is a frame at the end of a very-high-motion skateboard sequence, and 
also has            significant detail, particularly in the fencing 
behind the skater. This            combination of high motion and high 
detail is unusual, and makes this frame            very hard to encode. 
As you can see in the figure,            the video file encoded using 
B-frames retains noticeably more detail than the            file  
produced without B-frames. In short, B-frames do improve quality.
IDR frames
I'll describe the Number of B-Pictures setting further on, and I've addressed Entropy Coding Mode already; but I wanted to explain the Minimum IDR interval and IDR frequency. I'll start by defining an IDR frame.Briefly, the H.264 specification enables two types of I-frames: normal I-frames and IDR frames. With IDR frames, no frame after the IDR frame can refer back to any
In terms of random access within the video stream, playback can always start
Since one of the key reasons to insert I-frames into your video is to enable interactivity, I use the default setting of 1, which makes every I-frame an IDR frame. If you use a setting of 0, only the first I-frame in the video file will be an IDR frame, which could make the file sluggish during random access. A setting of 2 makes every second I-frame an IDR frame, while a setting of 3 makes every third I-frame an IDR frame, and so on. Again, I just use the default setting of 1.
Minimum IDR interval defines the minimum number of frames in a group of pictures. Though you've set the Size of Codec Video Sequence at 300, you also enabled Scene Change Detection, which allows the encoder to insert an I-frame at scene changes. In a very dynamic MTV-like sequence, this could result in very frequent I-frames, which could degrade overall video quality. For these types of videos, you could experiment with extending the minimum IDR interval to 30–60 frames, to see if this improved quality. For most videos, however, the default interval of 1 provides the encoder with the necessary flexibility to insert frequent I-Frames in short, highly dynamic periods, like an opening or closing logo. For this reason, I also use the default option of 1 for this control.
Working with B-frames
B-frames are the most efficient frames because they can search both ways for redundancies. Though controls and control nomenclature varies from encoder toUsing the value of 2 found in Figure 6, you would create a GOP that looks like this:
IBBPBBPBBPBB...
How much can

Figure 7. File  encoded with B-frames (left) and no B-frames (right)
What's the performance penalty            on the decode side? I
 ran a battery of cross-platform tests, primarily on            older, 
lower-power computers, measuring the CPU load required to play back a   
         file produced with the Baseline Profile (no B-frames), and a 
file produced            using the High Profile with B-frames. The 
maximum differential that I saw was            10 percent, which isn't 
enough to affect my recommendation to always use the            High 
Profile except when producing for devices that support only the Baseline
          Profile.
Advanced B-frame options
Adobe Flash Media Encoding Server also includes the B and P-frame related controls shown in Figure 8. Adaptive B-frame placement allows the encoder to override the Number of B-Pictures value when it will enhance the quality of the encoded stream; for instance, when it detects a scene change and substitutes an I-frame for the B. I always enable this setting.
Reference B-Pictures lets  the            encoder to use 
B-frames as a reference frame for P frames, while Allow pyramid         
   B-frame coding lets the encoder use B-frames as references for other 
B-frames.            I typically don't enable these options because the 
quality difference is            negligible, and I've noticed that these
 options can cause playback to become            unstable in some 
environments.
Reference frames is the number of frames that the encoder can search for redundancies while encoding, which can impact both encoding time and decoding complexity; that is, when producing a B-frame or P-frame, if you used a setting of 10, the encoder would search until it found up to 10 frames with redundant information, increasing the search time. Moreover, if the encoder found redundancies in 10 frames, each of those frames would have to be decoded and in memory during playback, which increases decode complexity.
Intuitively, for most videos, the vast majority of redundancies are located in theframes  most proximate to            
the frame being encoded. This means that values in excess of 4 or 5 
increase            encoding time while providing little value. I 
typically use a value of 4.
Finally, though it's not technically related to B-frames, consider the number of Slices per picture, which can be 1, 2, or 4. At a value of 4, the encoder divides each frame into four regions and searches for redundancies in other frames only within the respective region. This can accelerate encoding onmulticore  computers because            the encoder can 
assign the regions to different cores. However, since redundant         
   information  may have moved to a different region between frames—say 
in a            panning or tilting motion—encoding with multiple slices 
may miss some            redundancies, decreasing the overall quality of
 the video.
In contrast, at the default value of 1, the encoder treats each frame as a whole, and searches for redundancies in the entire frame of potential reference frames. Since it's harder to split this task among multiple cores, this setting is slower, but also maximizes quality. Unless you're in a real hurry, I recommend the default value of 1.
Reference frames is the number of frames that the encoder can search for redundancies while encoding, which can impact both encoding time and decoding complexity; that is, when producing a B-frame or P-frame, if you used a setting of 10, the encoder would search until it found up to 10 frames with redundant information, increasing the search time. Moreover, if the encoder found redundancies in 10 frames, each of those frames would have to be decoded and in memory during playback, which increases decode complexity.
Intuitively, for most videos, the vast majority of redundancies are located in the
Finally, though it's not technically related to B-frames, consider the number of Slices per picture, which can be 1, 2, or 4. At a value of 4, the encoder divides each frame into four regions and searches for redundancies in other frames only within the respective region. This can accelerate encoding on
In contrast, at the default value of 1, the encoder treats each frame as a whole, and searches for redundancies in the entire frame of potential reference frames. Since it's harder to split this task among multiple cores, this setting is slower, but also maximizes quality. Unless you're in a real hurry, I recommend the default value of 1.
No comments:
Post a Comment