Posts

Showing posts from August, 2017

The Development Process for NetCDF

Image
What Management Style is used for the NetCDF Project? NetCDF is a very successful free software package. It's used by NASA, NOAA, the ESA, and the IPCC, to list just a few. It's a global standard for weather and climate data, and also used in other sciences. As such, the netCDF library is a critical piece of infrastructure for many very important software systems. It has performed reliably and well for more than 20 years, yet it has not remained static. In the last 20 years new features include NetCDF-Java and a whole ecosystem of Java data tools, NetCDF-4 and HDF5 integration, remote data access with OPeNDAP, and the addition of large-file capable binary formats 64-bit offset and CDF5. NetCDF continues to grow and evolve with the hardware and software that make up science data processing systems. It is as relevant for cutting edge computer science today as it was when introduced You may well wonder about the development process used for this very successful project.

Parallel Access to NetCDF Data

Image
Parallel access to netCDF data files can speed read/write times from paralleled code - that is, code that is written with the MPI library to run on (many) multiple cores. How Much Does it Help? The amount of improvement you can get for parallel access depends heavily on your hardware. Running tests on a high-powered, multi-core linux box, I can see improvements of 4X or even greater. On HPC systems with a parallel file system, you should be able to do better. But parallel IO performance (IMHO), does not scale as well as processor performance. On systems with a large number of cores, you will max out the hardward channels to your storage before you are using all the cores for IO. Using a subset of the cores for IO is a good solution, but annoying to program. This functionality is provided by the PIO library (see below). Building NetCDF with Parallel Access In order to use parallel access with netCDF, you must build netCDF correctly. The following must be true: All l

The Different Binary Formats of NetCDF

Image
Under the covers, the netCDF library uses several different binary formats for data. It's important for advanced users to understand the different formats, and use the correct one. The Classic Format Originally there was just one format for netCDF files. It is described in detail in the netCDF documentation (including here ), but it's not necessary to understand the low-level details. Once we introduced the 64-bit offset format, we needed a name for this format, so we call it the "classic" format. The classic format can be read and written by any install of the netCDF library, even if all the bells and whistles are turned off at install. The classic format has limits which make it hard to use on data files larger than 2 GB. The 64-bit Offset Format In order to address the 2 GB limits of the classic format, the 64-bit offset format was contributed by a user, and introduced in netCDF version 3.6.0. The 64-bit offset format changes the classic format very