Howto Compile Source Code

Source Code Compilation

Let's assume that we're compiling a source code that will run as a parallel application using MPI for internode communication and the code is written in Fortran, C, or C++. In this case, it's easy because you will use standard compiler wrapper script that bring in all the include files and library paths and set linker options that you'll need. One should use the following wrappers: mpif90, mpicc, or mpic++ for Fortran, C, and C++, respectively.

To compile on DC³, execite in a command line: mpif90 -o hello.x hello.f90

In case we need to use for compilation an extra library like HDF5, one must load it through module utility. Even with the module loaded, the compiler doesn't know where to find files related to the HDF5 library. Another way to try to figure it out for yourself is to look under the covers in the HDF5 module.

The ml show hdf5-parallel command reveals (most of) what the module actually does when you load it. You can see that it defines some environment variables you can use, for example HDF5_INCLUDE, which you can use in your build script or Makefile. Look at the definition of the HDF5_XXX environment variables. They contains all the include and link options.

Therefore, we can use mpicc -o hd_copy.x hd_copy.c $HDF5_INCLUDE $HDF5_LIB

Compiler Optimizations

These are some common compiler optimizations and the types of code that they work best with.

Vectorization

The registers and arithmetic units on DC³ are capable of performing the same operation on several double precision operands simultaneously in a SIMD (Single Instruction Multiple Data) fashion. This is often referred to as vectorization because of its similarities to the much larger vector registers and processing units of the Cray systems of the pre-MPP era. Vector optimization is most useful for large loops with in which each successive operation has no dependencies on the results of the previous operations. Loops can be vectorized by the compiler or by compiler directives in the source code.

Inter-procedural Optimization

This is defined as the compiler optimizing over subroutine, function, or other procedural boundaries This can have many levels ranging from inlining, the replacement of a function call with the corresponding source code at compile time, up to treating the entire program as one routine for the purpose of optimization. This can be the most compute intensive of all optimizations at compile time, particularly for large applications and can result in an increase in the compile time of an order of magnitude or more without any significant speedup and can even cause a compile to crash. For this reason none of the DC³ recommended compiler optimization options include any significant inter-procedural optimizations. It is most suitable when there are function calls embedded within large loops.

Relaxation of IEEE Floating-point Precision

Full implementation of IEEE Floating-point precision is often very expensive. There are many floating-point optimization techniques that significantly speed up a code's performance by relaxing some of these requirements. Since most codes do not require an exact implementation of these rules, all of the DC³ recommended optimizations include relaxed floating-point techniques.

Optimization Arguments

This table shows how to invoke these optimizations for each compiler. Some of the options have numeric levels with the higher the number, the more extensive the optimizations, and with a level of 0 turning the optimization off. For more information about these optimizations, see the compiler on-line man pages.

Optimization	Intel	GCC/gfortran	PGI
Vectorization	`-vec`	`-ftree-vectorize`	`-Mvect`
Interprocedural	`-ipo`	`-finline-[opt],-fipa[-opt]`	`-Mipa`
IEEE FP relaxation	`-mno-ieee-fp`	`-ffast-math`	`-Knoieee`