April 26, 2002

The JPEG2000 in-house simulator currently pads passes with 4 additional bytes in order to ensure correct decoding. This practice is known to be suboptimal. Hence, a more optimal truncation scheme was implemented for the simulator. This page is devoted to the analysis of the relative benefits of each of the methods.

NOTE:

Optimal truncation as outlines in the verification model and Taubman & Marcellin's book is not possible given our current implementation of the entropy coder. For true absolute optimal truncation, one must have access to previously encoded bytes and be able to remove them from the codestream. This accounts for the specific instance of repeated 0xFF7F entropy coded pairs. Since they are generated by the bit un-stuffer (i.e. BYTEIN() ) algorithm, we can sometimes truncate a pass before the end of the marked length Ln. Aside from this very unlikely instance, there is the additional issue of witholding the emission of 0xFF bytes from the entropy coder. In the case of optimal truncation, these bytes are "delayed" pending the end of the pass or the end of the segment. The presence of a delayed byte is indicated by the E and En variables on a byte-by-byte and pass-by-pass basis respectively. This may then be exploited in an attempt to save a byte here or there in the codestream. It is unlikely that this will make any significant contribution in improving our rate/distortion statistics.

Given the statements in the above paragraph, I cannot truly call the truncation scheme implemented as optimal. Hence, I will call it sub-optimal in the sense that it is missing exactly those features listed above which would be required to make it "truly" optimal.

This method has really no functional requirements other than having access to the 4 bytes of arithmetic coded data immediately following the byte at which one would like to truncate. (i.e. if I want to truncate after the 5th pass of entropy coding, then I would require the 4 bytes which immediately follow the last byte of entropy coded data from the 5th pass).

This method requires the entropy coder to save the state variables after each pass. The saved variables include:

UINT An[NPASSES]; The value of the A register.

int Cn[NPASSES]; The value of the C register.

BYTE Bn[NPASSES]; The contents of the byte buffer.

int En[NPASSES]; Boolean variable which indicates if the previous value

sent to BYTEOUT() was 0xFF. This is not necessary for our implementation.

int Ln[NPASSES]; The length in bytes of the codestream where 0 indicates the start of the

current codeblock.

UCHAR E; Boolean variable -- same as En[] above.

int L; Length of segment --- same as Ln[] above.

The actual computational complexity of the truncation scheme is very low and requires about 3 comparisons, 3 bitwise operations (i.e. shift, and, etc...), and 1 addition. This being a rough estimate based on my non-optimized code at the moment. The operations above are executed in a loop which will execute up to a worst-case of 4 times. Each loop iteration corresponds to appending a byte to the truncated pass. Hence, according to my experiments this loop with execute on average about 2.3-2.4 times per code block in the image.

Example:

An image with 684 codeblocks could expect approximately 1573-1642 loop executions.

Synthetic contexts and bits were generated for the entropy coder over a 2 pass arithmetic coding scheme. The resulting codestrean was subsequently truncated between passes 1 and 2. Statistics were then collected on the generated truncated pass lengths. Given a uniform probability distribution on the bits and contexts generated for the entropy coder, it was found experimentally that one would save approximately 1.7 bytes per truncated pass.

The new truncation scheme was integrated with the in-house simulator and tests were run on real images. For the results provided below, the images were encoded such that every codeblock would be truncated at a constant pass number (i.e. for 10 passes, for each codeblock passes 0-9 were kept by in the code stream). Therefore , the provided results will be biased since this effectively bypasses the rate distortion algorithm. However, the results here should provide useful insight to the benefits which one would expect when using sub-optimal truncation.

Coding Parameters

Number of encoded passes = 10

Wavelet Levels = 3

Tilesize = 256x256

Core = 9/7 Irreversible

Image Size |
Padded Size (bytes) |
Sub-Optimal Size (bytes) |
Bitrate Improvement (bpp) |
Codeblocks |

1024x768x3 |
748738 |
747607 |
0.011505 |
684 |

1024x768x3 |
779799 |
778679 |
0.011393 |
684 |

1024x768x3 |
637312 |
636171 |
0.011607 |
684 |

1024x768x3 |
494413 |
493245 |
0.011882 |
684 |

Coding Parameters

Number of encoded passes = 10

Wavelet Levels = 3

Tilesize = 512x512

Core = 9/7 Irreversible

Image Size |
Padded Size (bytes) |
Sub-Optimal Size (bytes) |
Bitrate Improvement (bpp) |
Codeblocks |

1024x768x3 |
656687 |
655700 |
0.0100402 |
588 |

1024x768x3 |
684613 |
683625 |
0.010050 |
588 |

The following experiment allows the number of encoded passes to vary to verify that the bitrate improvement is invariant to this parameter.

Coding Parameters

File = adi1.ppm

Wavelet Levels = 3

Tilesize = 512x512

Core = 9/7 Irreversible

Image Size |
Number of Encoded Passes |
Padded Size (bytes) |
Sub-Optimal Size (bytes) |
Bitrate Improvement (bpp) |
Codeblocks |

1024x768x3 |
4 |
127750 |
126718 |
0.01049 |
588 |

1024x768x3 |
8 |
544603 |
543599 |
0.0102132 |
588 |

1024x768x3 |
12 |
972115 |
971245 |
0.008850 |
588 |

Use of sub-optimal truncation should improve our bitrate by approximately 1%. I think use of "truly" optimal truncation will not significantly increase this value.