The Different Binary Formats of NetCDF

Under the covers, the netCDF library uses several different binary formats for data. It's important for advanced users to understand the different formats, and use the correct one.

The Classic Format

Originally there was just one format for netCDF files. It is described in detail in the netCDF documentation (including here), but it's not necessary to understand the low-level details.

Once we introduced the 64-bit offset format, we needed a name for this format, so we call it the "classic" format.

The classic format can be read and written by any install of the netCDF library, even if all the bells and whistles are turned off at install.

The classic format has limits which make it hard to use on data files larger than 2 GB.

The 64-bit Offset Format

In order to address the 2 GB limits of the classic format, the 64-bit offset format was contributed by a user, and introduced in netCDF version 3.6.0.

The 64-bit offset format changes the classic format very slightly, changing some numbers from 32-bit to 64-bit. The numbers that are changed are offsets into the data. As a result, the 64-bit offset format can use much larger data sizes.

Like the classic format, the 64-bit offset format is build into the classic library, and 64-bit offset can be written and read by any library starting with version 3.6.0.

CDF5

The new CDF5 format is like the classic format, but with all limits blown away. It was developed at Argonne National Labs to hold some very large data they use. They also use parallel-netcdf for parallel access to the data, and they don't want to switch to netCDF-4/HDF5.

A good description of CDF5 (as well as the classic and 64-bit offset) format can be found here.

CDF5 supports the additional atomic data types of the netCDF-4 enhanced data model - unsigned ints, 64-bit ints, and a string type.

HDF5

In netCDF-4 we introduced the use of the HDF5 format as an additional netCDF binary format. 

Writing netCDF-4 was the reason I was hired at Unidata, and I very much enjoyed designing and implementing this  large extension to netCDF capabilities.

NetCDF-4 Enhanced Model

The use of HDF5 offered the opportunity to expand the netCDF data model. The classic model includes 6 data types, but the enhanced model includes additional types, including unsigned integers, 64-bit integers, and a string type. The enhanced model also includes compound data types, allowing users to directly store arrays of C (or Fortran) structures.

Here's Russ' diagram of the enhanced netCDF data model:



NetCDF-4 Classic Model

The classic data model includes dimensions, variables, and attributes, and has six data types.

Here's Russ' diagram of the classic model:


OPENDAP

OPENDAP is not a data format, it is a way of connecting to data servers and reading subsets of a netCDF file remotely. Only the subsets that you want are actually transmitted to your machine, which is very handy for doing sparse reads of large datasets.

OPENDAP was introduced in version 4.2.0 and was written by Dennis.

Comments

Popular posts from this blog

Building NetCDF for HPC

What is NetCDF, and When Should You Use It