WAVE PCM soundfile format
The WAVE file format is a subset of Microsoft's RIFF specification for the storage of multimedia files. A RIFF file starts out with a file header followed by a sequence of data chunks. A WAVE file is often just a RIFF file with a single "WAVE" chunk which consists of two sub-chunks -- a "fmt " chunk specifying the data format and a "data" chunk containing the actual sample data. Call this form the "Canonical form". Who knows how it really all works. An almost complete description which seems totally useless unless you want to spend a week looking over it can be found at MSDN(mostly describes the non-PCM, or registered proprietary data formats).
The canonical WAVE format starts with the RIFF header:
0 4 ChunkID Contains the letters "RIFF" in ASCII form
(0x52494646 big-endian form).
4 4 ChunkSize 36 + SubChunk2Size, or more precisely:
4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
This is the size of the rest of the chunk
following this number. This is the size of the
entire file in bytes minus 8 bytes for the
two fields not included in this count:
ChunkID and ChunkSize.
8 4 Format Contains the letters "WAVE"
(0x57415645 big-endian form).
The "WAVE" format consists of two subchunks: "fmt " and "data":
The "fmt " subchunk describes the sound data's format:
12 4 Subchunk1ID Contains the letters "fmt "
(0x666d7420 big-endian form).
16 4 Subchunk1Size 16 for PCM. This is the size of the
rest of the Subchunk which follows this number.
20 2 AudioFormat PCM = 1 (i.e. Linear quantization)
Values other than 1 indicate some
form of compression.
22 2 NumChannels Mono = 1, Stereo = 2, etc.
24 4 SampleRate 8000, 44100, etc.
28 4 ByteRate == SampleRate * NumChannels * BitsPerSample/8
32 2 BlockAlign == NumChannels * BitsPerSample/8
The number of bytes for one sample including
all channels. I wonder what happens when
this number isn't an integer?
34 2 BitsPerSample 8 bits = 8, 16 bits = 16, etc.
2 ExtraParamSize if PCM, then doesn't exist
X ExtraParams space for extra parameters
The "data" subchunk contains the size of the data and the actual sound:
36 4 Subchunk2ID Contains the letters "data"
(0x64617461 big-endian form).
40 4 Subchunk2Size == NumSamples * NumChannels * BitsPerSample/8
This is the number of bytes in the data.
You can also think of this as the size
of the read of the subchunk following this
number.
44 * Data The actual sound data.
Wave File Header - RIFF Type Chunk
Wave file headers follow the standard RIFF file format structure. The first 8 bytes in the file is a standard RIFF chunk header which has a chunk ID of "RIFF" and a chunk size equal to the file size minus the 8 bytes used by the header. The first 4 data bytes in the "RIFF" chunk determines the type of resource found in the RIFF chunk. Wave files always use "WAVE". After the RIFF type comes all of the Wave file chunks that define the audio waveform.
Byte Number | Size | Description | Value |
0-3 | 4 | Chunk ID | "RIFF" (0x52494646) |
4-7 | 4 | Chunk Data Size | (file size) - 8 |
8-11 | 4 | RIFF Type | "WAVE" (0x57415645) |
Format Chunk - "fmt "
The format chunk contains information about how the waveform data is stored and
should be played back including the type of compression used, number of channels,
sample rate, bits per sample and other attributes.
Byte Number Size Description Value 0-3 4 Chunk ID "fmt " (0x666D7420) 4-7 4 Chunk Data Size Length Of format Chunk (always 0x10) 8-9 2 Compression code Always 0x01 10 - 11 2 Channel Numbers 0x01=Mono, 0x02=Stereo 12 - 15 4 Sample Rate Binary, in Hz 16 - 19 4 Bytes Per Second 20 - 21 2 Bytes Per Sample 1=8 bit Mono, 2=8 bit Stereo or 16 bit Mono, 4=16 bit Stereo 22 - 23 2 Bits Per Sample
Data Chunk - "data"
The Wave Data Chunk contains the digital audio sample data which can be decoded using the format and compression method specified in the Wave Format Chunk.
Byte Number Size Description Value 0-3 4 Chunk ID "data" (0x64617461) 4-7 4 Chunk Data Size length of data to follow 8-end Data sound samples
As an example, here are the opening 72 bytes of a WAVE file with bytes shown as hexadecimal numbers:
52 49 46 46 24 08 00 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 02 00
22 56 00 00 88 58 01 00 04 00 10 00 64 61 74 61 00 08 00 00 00 00 00 00
24 17 1e f3 3c 13 3c 14 16 f9 18 f9 34 e7 23 a6 3c f2 24 f2 11 ce 1a 0d
Here is the interpretation of these bytes as a WAVE soundfile: ![](https://lh3.googleusercontent.com/blogger_img_proxy/AEn0k_vuWX85yAsvkAN_enpnaAs0hhgJs2IzcOZ2S9ghzf7QTa8A7ZZ4C-bDaLZ4Mtou8KRbhRxoPEZleGCy22KmI_0be9Jrrv1Kqw9wEEm36HoBQvfOjZ_VHvhlQA=s0-d)
Example2:
The easiest approach to this file format might be to look at an actual WAV file to see how data is stored. In this case, we examine DING.WAV which is standard with all Windows packages. DING.WAV is an 8-bit, mono, 22.050 KHz WAV file of 11,598 bytes in length. Lets begin by looking at the header of the file (using DEBUG).
The ASCII characters for "WAVE" and "fmt " follow. Next (line 2 above) we find the value 0x00000010 in the first 4 bytes (length of format chunk: always constant at 0x10). The next four bytes are 0x0001 (Always) and 0x0001 (A mono WAV, one channel used).
As expected, the file begins with the ASCII characters "RIFF" identifying it as a WAV file. The next four bytes tell us the length is 0x2D46 bytes (11590 bytes in decimal) which is the length of the entire file minus the 8 bytes for the "RIFF" and length (11598 - 11590 = 8 bytes).
Since this is a 8-bit WAV, the sample rate and the bytes/second are the same at 0x00005622 or 22,050 in decimal. For a 16-bit stereo WAV the bytes/sec would be 4 times the sample rate. The next 2 bytes show the number of bytes per sample to be 0x0001 (8-bit mono) and the number of bits per sample to be 0x0008.
more references:
http://mathmatrix.narod.ru/Wavefmt.html
http://www.onicos.com/staff/iz/formats/wav.html
No comments:
Post a Comment