Deal With Large Data Size for K2 Movie Stack

Chen Xu

$BrandeisEM: 2015-06-14 08:11:10 emdoc-xml/en_US.ISO8859-1/articles/deal-with-k2-movie-data-size/article.xml xuchen Exp $

When we take a lot of dose fractionation series, or we call movie stacks, we have to consider the data size we deal with. For a super-resolution movie stack, each single exposure can generate a few GB data for a single stack. It is a challenging task for any of us who want to keep the data for longer term storage. Even transferring off the amount of data from K2 computer to other devices can take significant mount of time.

Therefore, our main goal is to reduce data size as much as possible without losing information. SerialEM has implemented some feature to help with this situation. In this document, I would like to give you an example how to deal with this.

Most of the information in this document can be found from SerialEM helpfile regarding Direct Detectors.

You can also get pdf version of this document here.

Table of Contents
1 Background Information
2 Packing and Compression
3 Keep Gatan Software Gain Reference File
4 Post-Processing: Decompress, Unpack and Apply Gain Reference

1 Background Information

For a Super-resolution exposure, the subframe output AFTER hardware processors is in format of 4-bit unsigned integer. If it is passed to a software layer such as DigitalMicrograph, this 4-bit integer data is first converted into 32-bit floating points and then applied with a software gain reference which is also a 32-bit floats. For a Counted mode image, the subframe output from hardware processors is 16-bit integer. It has to be converted into floats first and applied software gain reference, like Super-resolution case.

As you can see, in both cases, this process will not only consume significant mount of memory, but also generate relatively bigger dataset, as they are in 32-bit floating point format.

It might be worth mentioning that the 4-bit unsigned integer means all the pixel value in a frame is within the range 0-15. Therefore, we have to set our imaging condition accordingly. For example, if the beam is at dose rate of 10 electron per physical pixel per second, and if we use 1.5 seconds or more as frame time, then we could reach the limit of 15. In this case, the pixel value will overflow, and we lose information. Although this is almost unlikely to be the real condition we ever to use, we should be aware of such limitation.

The fact is that the real information is in such 4-bit or 16-bit integer frame and it is necessary to apply the software gain reference for normalization purpose. For K2 Counted and Super-res image, the dose rate is small, usually around the range of 10 electron per pixel per second or even lower. The image from such low dose rate contains a lot of zeros (~50%) and mean value for such image is usually just above 1. This kind of image is well suitable for loss-less compression algorithm such as LZW and ZIP.

So the idea is NOT to apply software gain reference in the data collection step. Instead, the gain reference file is saved somewhere and to be applied later as post-processing.

For these unnormalized, integer images, SerialEM tries to reduce the file size by Packing and Compression.

2 Packing and Compression

To take the advantage of the integer pixel values of Counted and Super-res image, one way that SerialEM uses is to "pack" the data. Specifically, for Counted image, the pixel values are truncated into byte from 16-bit integer before saving. And for Super-res image, the two 4-bit data is packed into one 1-byte space. It reduces the file size to half of the original, in both cases.

Another way SerialEM uses to save file size is to use compression. Currently, two kind of compression methods are used, they are LZW and ZIP. Both are lossless methods. Compression is only available for TIFF images. And it can be used on top of packed data too. In another word, one can select to have a stack saved as packed and also compressed, for TIFF format. For MRC type selected, only packing is available. However, MRC image should also be able to be compressed. This can only be done manually as a separate post-process.

Below is the Camera Parameter Setup window. Pay attention to the right lower corner of the window where one can define how to save K2 frame files by clicking on "Set File Option" button.

Clicking on "Set File Option" brings up a dialog window "K2 Frame File Options" as below.

As you can see, you can also choose not to rotate/flip frames.

3 Keep Gatan Software Gain Reference File

If you want to use "Pack unnormalized as 4-bit(Super-Res) or 8-bit(Counting)" as frame option, you have to select "Dark Subtracted" from Camera Parameter Setup Window. In this case, you want to make sure the Gatan software gain reference files are safely copied to somewhere for later post-process of applying gain normalization.

In order to do that, two lines have to be added in SerialEM property file, for Counting and Super-Res mode respectively.

K2CountingReference C:\ProgramData\Gatan\Reference Images\K2-0001 1 Gain Ref. x1.m2.dm4
K2SuperResReference C:\ProgramData\Gatan\Reference Images\K2-0001 1 Gain Ref. x1.m3.dm4

These two lines define the exact filenames and full path where the Gatan reference files locate. With this, SerialEM will make sure the latest reference files to be saved into frame saving directory. This is important, you should check it.

4 Post-Processing: Decompress, Unpack and Apply Gain Reference

4.1 Decompress

Decompressing the compressed TIFF file is straightforward. IMOD program tif2mrc takes input of compressed AND uncompressed TIFF(the property of compression is hidden).

% tif2mrc FrameStack.tif FrameStack.mrc

If it is also packed, you can unpack it as this:

% clip unpack FrameStack.mrc UnpackedFrameStack.mrc

In either above case, the MRC output file has 8-bit integer data type.

4.2 Apply gain reference to frames with standard orientation

The gain software gain reference is not rotated or flipped.

If the box next to "Save frames wothout rotation/flip to standard orientation" IS checked, the frames are saved as is - no rotation and/or flip. In this case, to apply Gatan gain reference file is straightforward.

  1. Convert Gatan reference from dm4 to mrc format.

    % dm2mrc gatanRef.dm4 gatanRef.mrc 
  2. Apply reference file. Depending on wether frames are saved as packed or not, using one of the following commands.

    • For packed frames, unpacking them and applying gain reference file can be made with a single command.

      % clip unpack fileWithFrames_packed.mrc gatanRef.mrc normalizedFrames.mrc
    • For unpacked,

      % clip mult -n 16 fileWithFrames.mrc gatanRef.mrc normalizedFrames.mrc

      Here, after gain reference applied, the result is multiplied by 16 first, then rounded to integer. In theory, any rounding will cause information loss. In this case, the 4-bit data are scaled up by 16 before rounding, the loss is very small.

      If one does want full precision without any rounding, then it is better to use floats as output.

      % clip mult -m 2 fileWithFrames.mrc gatanRef.mrc normalizedFrames.mrc

      From DM 2.31, there is also a defects.txt saved along with gain reference file. The correction for this defects can be done by adding "-D defects.txt" before fileWithFrames.mrc, like following

      % clip mult -m 2 -D defects.txt fileWithFrames.mrc gatanRef.mrc normalizedFrames.mrc

4.3 Apply gain reference to frames with rotated/fliped orientation

If the box next to "Save frames wothout rotation/flip to standard orientation" is NOT checked, the frames are saved with rotation and/or flip. In this case, in order to apply gain reference to image or frames, Gatan gain reference file has to be converted to MRC first and then ALSO rotated and flipped accordingly. IMOD commands newstack -rot and clip flipx will do rotation and flip.

If in DigitalMicrograph camera configuration windows, the rotation is 270 degree and "flip along Y" is checked, that means that frames are rotated to 270 and flipped along Y axis. We need to rotate Gatan gain reference file in opposite way.

The following IMOD command requires version 4.6.26 or higher.

Here are step to process Gatan gain reference file. The order of the actions is required.

  1. Convert Gatan reference from dm4 to mrc.

    % dm2mrc gatanRef.dm4 gatanRef.mrc 
  2. Rotate -270 degree.

    % newstack -rot -270 gatanRef.mrc gatanRef_rot-270.mrc
  3. Flip image around Y.

    % clip flipy gatanRef_rot-270.mrc  gatanRef_rot-270_flipy.mrc 

Now, gain reference is ready to be applied on our frame stacks. Using one of the follows commands, depending on frames being saved with packing or not.

  1. For packed frames, unpacking them and applying gain reference file can be made with a singple caolland.

    % clip unpack fileWithFrames_packed.mrc gatanRef_rot-270_flipy.mrc  normalizedFrames.mrc
  2. For unpacked,

    % clip mult -n 16 fileWithFrames.mrc gatanRef_rot-270_flipy.mrc normalizedFrames.mrc

For more details, please see SerialEM HelpFile.