We advise to use the EnKF to perform the analysis step only. A separate program is needed to integrate the model ensemble forward (one can write a script or Fortran control program to alternatingly perform a model integration and analysis and take care of the time parameter).

The EnKF will need as input files:- The forecast ensemble file produced by the model. This file can be written as a binary file with every member on a separate record, preceded by a character chain that identifies the model and its version. TIP: if one member propagation fails for some reason, then the record header chain will not be recognized and you can avoid processing the file.
- The observations file. We advise you to store together with each data value
all informations that relate the observation to the model state :
- An identifier ('SST', 'SLA', 'TEM', 'TB' ...)
- The measurement error variance.
- The coordinates x,y,z of the observation.
- The nearest model grid cell (i,j)
- Some coefficients for interpolation of the model values to the observation location (the 4 bilinear interpolation coefficients for example).
- If the observations are given on larger averaging cells (or "support") than the model mesh the number of model cells covered by the observation should also be given. This seems unfortunate but consider for example Reynolds SST data or El Nino indexes.

- The topography/bathymetry in order to compute only wet points.

The makefile is written for the IBM and some compiling options may need to be changed. It is advised however to stick to the filenaming and other conventions followed. mksource.sh is used to create a list of source files. mkdepend.pl then diagnoses the dependencies between the routines. Use 'make' to compile and create the EnKF executable.

The following points should be considered when adapting the code to make it suitable for one's own model:

- 1) Specify the model dimensions nx,ny and nz in mod_dimensions.F90
- 2) Identify the prognostic variables of the model. If necessary, modify the type declarations, interfaces, subroutines and functions in mod_states.F90 (mod_states2D.F90).
- 3) m_pseudo2D.F90 contain calls to FFT routines for CRAY, IBM, DEC and SGI machines. An experimental version of this routine using a Numerical Recipes routine is also available.
- 4) Reset the random seed to a different integer each time you use the EnKF executable, this will avoid trouble due to "Extremely time-correlated measurement error". m_set_random_seed2.F90 is given in NERSC_implementation/Random/
- 5) An identifier (cident) is used to check for the proper model ensemble files and whether the files contain single or double precision data . This identifier may need to be changed (e.g. m_read_ensemble.F90).
- 6) The expected forecast ensemble file is 'ensembleF.uf'.
- 7) The restart file written by the model should be in the format expected by the EnKF.
- 8) analysis.F90 uses double or single precision BLAS routines dgemm or sgemm.
- 9) analysis.F90 is not m_analysis.F90. This keeps the compiler from complaining about the implicit conversion from type(states) A(:) to the real-valued array A(:,:) which should not be a problem as long as you have only floating points values in the state vector (no integer, no character values ...).

- 1) The routine m_consistency_A.F90 checks the model state ensemble for extreme or invalid values. Appropriate min/max values should be put here. (in HYCOM_IO/)
- 2) Both gridded fields and profile data can be assimilated. Some types of data are already accounted for (see e.g. list in m_Generate_element.F90 in Obs_operator/).
- 3) m_modstate_point.F90 (in Obs_operator/) identifies the model state variables corresponding to different types of data and may thus be model dependent (it is the core of the H matrix).
- 4) m_obs_pert.F90 (in Random/) determines the grid size on which random 2D fields are generated, select the range of 2D perturbations there.
- 5) mkensemble.F90 generates an initial ensemble for HYCOM coupled to an ice model.
- 6) Both local and global analysis use almost the same routines, in our implementation they are selected at run-time when reading the file assimilation.in

The model should be able to restart several times from the same time step, usually it is just a matter of reading the forcing fields at the right place. Each of the member can then run with its own random perturbations independently from the other members. They will only meet at analysis time. Remember to reset the random numbers to a different seed every time you restart the executable.

That is where science really starts. Feel free to randomize wherever you think the model has its main error. The ensemble variance or spread will indicate whether you have touched a sensitive parameter or not. The assimilation results (in particular the reduction in ensemble spread for each variable) will then show you where the model builds the multivariate links between model variables an measurements. For example in the TOPAZ experiment, the ECMWF atmospheric forcing fields are perturbed by adding 2D random fields (m_pseudo2D.F90 was used for this purpose) and each ensemble member is running in HYCOM with its own Richardson mixing parameter (but constant in time). The model error can be made time-correlated, then remember to store its value and read it again on restart time.

To use a log-transformation for example (positive variables with exceptionally large values), set A = log (A), d = log (d) at the beginning, specify measurement errors as a ratio of their actual value, think hard about the observation matrix [can I say log(d) = H log(A)?], and use the same routines as provided here. Back transformed the ensemble at the end.

Well ... Don't believe too much in miracles nor in *"saturation effects"*.
If your problem is really interesting, then 10 members won't be smart enough to
tell you all about the most tricky multivariate covariances.
We advise 100 members to get sound reproduceable results. 100 is the ensemble size
that has been used during the DIADEM and TOPAZ real-time experiments.
See the paper by Natvik (2002) to see how well 20, 40, 60, 80 and 100 members
behave with a coupled physical-ecological model of the North Atlantic.
If you can afford more than 100, then go for more! But remember that 1000 members will
not help you if the model has serious biases: then all 1000 members will miss the
solution and you will feel like beating a dead horse.