Programming Style Manual

The purpose of this Style Manual is to enforce clarity and consistency in coding style for SeqQuest development. Clear code is easier to develop further – rebugging (to add features), and debugging (to find and fix the bugs that inevitably arise in an actively developed and used code) – and a consistency of style is a vital aspect of code clarity. SeqQuest may have many conventions that are idiosyncratic, code that is not optimally efficient, and coding practices that may not reflect the latest thought in computer science. It is, however, very consistent in its application of its idiosyncratic conventions, it is highly portable, scales well, and, key for the future viability of the code, is easily read, understood, and modified. SeqQuest has its own distinctive style, that will be different from any other code, as all other codes will be different from each other. Any changes to SeqQuest should respect the coding conventions that already exist, so that a future developer will continue to have a code as consistent, accessible, and understandable as the current version. Mimic the patterns you see in the code, to the extent possible. This manual will codify many of these conventions, and describe the basic rules to be followed in development of the current version of SeqQuest.

Outline

Overview

SeqQuest is a code to do electronic structure calculations within the density functional approximation and pseudopotentials, using contracted Gaussian basis sets. SeqQuest is written in Fortran 77, and a very vanilla, portable f77. SeqQuest has a very "flat" structure, with a single main program that does very little computation itself. The main program manages memory and program flow, and calls a sequence of (shallow) subroutines to do the computational work. Data is communicated via passed argument lists, not common blocks. Those familiar with BLAS/LAPACK libraries will recognize the style. There is one large central workspace in the main program, wk(maxwkd), from which all significant storage is taken, and maxwkd is set in the parameter file. There is one main input file, and one main user output file, and a multitude of larger temporary binary files the program uses to do its work.

The Laws

  1. Consistency and clarity of code are paramount. Only on very rare occasions can clarity be subordinated to other considerations. You will not see those occasions.
  2. Conform to the existing style style in the code, not your own, not the latest fad in computer science. Consistency is the key to the long-term viability of any code. Mixing styles is fatal.
  3. Clean up any new code. Changes are NOT done when they work, they are done when they work AND they are clean.
  4. The language is FORTRAN 77, and a very vanilla f77.
    • "fixed form" 72-character lines (i.e. NO "extended source" a la f90)
    • No machine dependent code.
    • No dynamic memory allocations, no pointers, no structures, no "clever" coding where a plainer implementation will serve.
    • Sacrifice performance if necessary.
  5. Variable declarations will be consistent with IMPLICIT DOUBLE PRECISION (a-h,o-z). Apologies to all the strong typing types, but that’s the rules. All integer variable names will begin with i-n, and no variables beginning with i-n will be reals. All real variable names will begin with a-h,o-z), and no variables beginning with a-h,o-z will be integers. Use of IMPLICIT NONE is permitted/encouraged in new routines, but all variable types will conform to the above naming conventions.
  6. NO include statements. SeqQuest has only one include statement, in the main program to bring in dimension parameters, and there will be no other include statements anywhere in the code.
  7. NO creation of O(N**2) arrays anywhere in the program, where O(N) parameters are: natmd, norbd, nkd, nlatd. Space for all such large arrays will be taken from central workspace in main program, an array called wk().
  8. Original declaration of O(N) arrays occurs in main program only. None of dimensioning parameters in the parameter file may be used to create arrays in any subroutine, i.e., all O(N) arrays used in subroutines must be passed in from main program.
  9. NO new common blocks. Data is to be communicated to subroutines via passed arguments.
  10. NO file unit numbers done by explicit integers: all file access shall be via integer variables. E.g. "write(6,fmt)" must be replaced by "write(IWR,fmt)", where IWR is an integer variable that has been set to refer to this file.
  11. NO direct use of Fortran OPEN or CLOSE. All file manipulation will be done using FLxxxx routines (that manage unit numbers).
  12. NO passed real constants. E.g., "call FOO(1.0)" should be written as "call FOO( one )" where "one" is a real variable set earlier in the code. Passing integers is ok.
  13. NO statement functions, beyond the few that already exist.
  14. Comments only on lines beginning with "c", not "!" or other funky characters. White space should begin with "c", i.e. no completely blank lines within a routine.
  15. NO tabs; indenting is done using spaces.
  16. Test thoroughly.
  17. Only those who write the rules can change the rules.

The Basics

Many of the conventions described in this section will be apparent from inspecting the code itself. Hence, the best approach is, to the extent possible, use existing code as a template/model for new development. This section will highlight some of the less obvious conventions, and reiterate the more important ones. Details will be expanded on in the next section.

  • Document carefully. Describe the purpose of larger modules, and explain complicated coding sections that may not be obvious to someone looking at the code years from now after you have left to go on to bigger and better things. Focus on the why (and why not).
  • Use generic calls to library routines where possible. There are exceptions. Use DERF/ERF explicitly as there are no safe generics for the error function, and IABS for integers and ABS for reals.
  • Avoid using complex variables. Use doubly dimensioned reals instead where practical.
  • Avoid using ENTRY except in special cases.
  • All I/O of longer records (orbital matrices and grid fields) is done through special "big" routines that break the records into smaller pieces. This is because write/read of long records can be faulty on some machines.
  • Do not use BACKSPACE().
  • Avoid using explicit constants in the executable code. Use named constants (e.g. "zero", "half", "one") instead that have been set in declarations via parameter or data statements.
  • Use careful indenting to format source code (two spaces).
  • When writing data to output listing file, always include descriptive statement in the output line. Something that can be found with a "grep".
  • Looping is by do … continue or do … enddo only. NO do while loops.
  • Line continuation character is a "$"
  • Labels should be properly sequenced, except for emphasis.
  • All labels are right justified, NOT left justified.
  • Use "call MKZERO( len, arr )" to clear (set to zeroes) an array "arr(len)"
  • Branching by if/elseif/else/endif preferred.
  • Avoid compound calls. E.g., avoid "call FOO( BAR( x ) )"

Conventions

Units

The program does its internal work in:

  • Rydberg energy units (1 Ry = 0.5 Hartree = 13.605 eV)
  • Bohr distance units (1 bohr = 0.52918 Ang = 0.052918 nm)

Back to Top

Main program

The main program concentrates on memory management and flow control. It calls a sequence of subroutines, controlled by various flags, and manages the memory required by the large arrays used by SeqQuest. The main routine is special, and looks very different from any subroutine.

  • Computation should be in subroutines, not the main routine.
  • do-loops are heartily discouraged in the main routine.
  • write statements are self-documenting (no orphaned numbers in output!).
  • Memory is tightly controlled
  • File open/close done using FLxxxx routines, NOT the fortran primitive OPEN/CLOSE.
  • Unit numbers for file i/o are named integer variables. No undirected write(*,*) and no write(14,*).
  • All data communicated to subroutines through passed arguments

Back to Top

Memory

The principal limiting factor in SeqQuest is usually the amount of memory needed to run a problem, rather than the amount of time it takes to run a problem. I.e., SeqQuest is more memory-bound than cpu-bound. Hence, the use of memory is very tightly controlled in SeqQuest.

All large-scale memory is taken from a single large workspace in wk(maxwkd), where maxwkd is a parameter set in the main parameter file, using pointer-like integers. The routine WKMEM is the most important routine in the code: it partitions memory within wk(), and checks for memory sufficiency FOR THE ENTIRE CODE. Hence, WKMEM should be consulted before attempting any use of space within wk().

The wk() array is sectioned into pieces at the beginning of the code by pointer-like integer variables i01-i12, with spacing dictated by the size of big arrays. The first four, i01-i04, reserve enough space for either orbital matrices (nmat=nk*norb**2) or grid fields (nptr=n1r*n2r*n3r). (NB: this assumption is subject to change, check WKMEM for latest). The last eight spaces, i05-i12, only guarantee enough space for grid fields. Each of these spaces may be used as temporary space, and the intermittent comments "MEM" enumerate the contents of all the active "pointers" at that point in the code.

  • No dynamic memory.
  • Orbital matrices/grid fields must come from main workspace wk() and be given descriptive local names (and documented in main program).
  • O(N) work arrays should be passed from wksmX() in main program.
  • Memory is tightly scheduled – be careful not to overstep boundaries.

Back to Top

File I/O 

The user listing output file should be self-documenting. It is not just the source code, but the output file the user sees which needs to have documentation in order to be readable and understandable by humans.

I/O to all files is carefully structured to try and reduce the size of files, and to make data more easily accessible to the program without churning up disk. The FLxxxx routines have been set up to manage files. They allocate available (free) unit numbers (unit numbers are not to be hard-coded!), connect to correct directory structure (as needed), and complete the file names.

  • Files will be explicitly opened using FLxxxx routines, not a fortran OPEN/CLOSE.
  • I/O is always to named units (IRD,IWR,isetdat,ianalyz, etc.) and not to explicit numbers. I.e. write(6,*) is forbidden. Sole exception: exception/error handling in a deep routine that does not have IWR locally can write diagnostics using "write(*,fmt)".
  • NO "print" statements, use "write" instead.
  • standard input unit is IRD [from call FLGETIRD( IRD )].
  • standard output unit is IWR [from call FLGETIWR( IWR )].
  • Format labels are 8xxx for a read, 9xxx for a write.
  • A write to the user output files should be carefully formatted and documented with a descriptive (preferably uniquely "grep-"able string) comment. For example, write(IWR,*) engyshould be replaced with something more descriptive: write(IWR,9020) ‘TOTAL energy (Ry) =’,engy 9020 format(1x,a,f20.10)
  • The preference is to do I/O from the top-level program, rather than called routines.
  • Read/write of long records (grid fields, orbital matrices) should be done through a special set of routines in the code (WRITBIG/READBIG/BACKBIG/READSKP) that break up the long record into smaller pieces (The read/write of long records has been a problem on some Alpha-based machines).

Back to Top

Variable names

Consistent naming conventions are important to being able to follow code easily. The style used in SeqQuest is rather old-fashioned fortran: short, but descriptive names, with few underscores and using lower case (very specific exceptions are all upper case). The virtue is that it leads to code, that while quite dense, is rather readable. In general, single character or double character variable names are used only as very local temporary variables (notable exception is that "i" and "j" are used frequently to index basis functions in the big orbital matrices), such as the loop index in a very short do-loop. Keep names short, to preserve code-space on a fortran 72-character line, and to conform to naming conventions used in other routines.

  • All lower case.
  • Restrict to eight characters or less (this convention may be relaxed at a future time, but, for now, keep them short).
  • All variables with names beginning with i-n are integers, and v.v.
    Exception: logicals may begin with a capital "L".
  • Dimensioning parameters end in "d", e.g. nkd, norbd, natmd.
  • For passed variables, use the same name in the subroutine as was used in the calling routine, except for big arrays from main workspace wk(), which should be given a descriptive local name.
  • Local variables will have the same names, where available, as they do in other routines. E.g., iatm and jatm are used to index atoms, not ia and ja, because the former are conventionally used elsewhere in the code. Always best to use existing routines as a model.
  • No underscore characters, except for logicals.
  • Logicals should have names that lead to descriptive use in an "if" statement. E.g. "if( do_neb )then" or "if( EVEN )then".

Back to Top

Upper/lower case

  • Upper case used for:
    • subroutine and functions names (call FOO( x ))
    • file manipulations (e.g.,REWIND)
    • declarations (DIMENSION/DATA/PARAMETER/CHARACTER, etc.)
    • routine exits (STOP/RETURN/END)
    • standard unit numbers (IRD,IWR being standard in/out)
    • first character "L" in logical variable ("Loptall")
  • Lower case used:
    • everything else

Back to Top

Subroutines/function calls

  • Subroutine and function names are all upper case.
  • Subroutine and function names try to be eight characters or less.
  • Use generics for math calls where possible. For the error function, there is not a safe generic, so use ERF or DERF as necessary. Also, IABS and ABS to distinguish integer encouraged.
  • Use "call MKZERO( len, arr )" to clear array arr(len).
  • Use explicit arithmetic for 3-vectors and 2-vectors (complex) rather than function/subroutine calls. E.g., prefer this: adotb = a(1)*b(1) + a(2)*b(2) + a(3)*b(3)rather than adotb = DDOT( 3, a,1, b,1 )
  • Communicate data through passed arguments rather than common.
  • Avoid compound calls. E.g., replace "call FOO( BAR(x) )" with "call FOO( barx )" where "barx = BAR( x )".
  • Should be space on the inside of parentheses of arguments.

Back to Top

Routine structure

Subroutine and function source code is very highly structured. Use an existing routine as a template if building a new routine, as it is easier to ensure conformance to style that way than to build on the basis of what will be necessarily incomplete instructions about style. The idea is to make all the routines look as similar to one another as possible, and, therefore, by inspection, notice bugs because patterns are violated. Deviation from any pattern should be done only with compelling reason, and should be documented.

  • The typical routine has the following elements:
    • name banner (five comments lines, middle line >>>>> {name}
    • the routine declaration: subroutine/function FOO( …. )
    • main documentation section: purpose/author/revision history
    • more detailed comments/notes/variable descriptor comments
    • IMPLICIT DOUBLE PRECISION (a-h,o-z) or IMPLICIT NONE
    • variable declarations, by kind (input/output/scratch/local).
    • DATA declarations
    • statement functions (HIGHLY discouraged)
    • c >>>> EXECUTABLE CODE:
    • executable part of code
    • c That’s all, Folks!
    • RETURN/END
  • These elements are formatted in detail, use an existing routine for a model.
  • Passed arguments take the same name as in the calling routine, except that big arrays passed from big workspace "wk()" in main routine should be given names descriptive of their use.
  • I/O units should be passed first, dimension variables next, data arrays next, and arrays from workspace wk() last. Try to conform to order seen in other calls. Arrays out of workspace should be documented in calling program as to name and type (i-input, o-output, s-scratch). E.g.: call VESSLO( IWR, ibndpot, n1r,n2r,n3r,nptr, weight, $ ws(is1),ws(is2),ws(is3),ftarray, $ wk(i11), wk(i06), wk(i08) ) c –> rhoslo-io espot-2os gvecmag-i where i11=slow density, i06=electrostatic pot’l, i08=g-vectors mags.
  • RETURN from anywhere other than the end of a routine is discouraged.

Back to Top

Routine families

Certain sets of subroutines have similar internal structure, and code is written to emphasize the similarities where possible. Two major distinct families exist. First, the analytic "two-center" (SIJ=overlap, TIJ=kinetic, FRC2CTR=forces) and "three-center" routines (VLOCMAT, VLOCMII, VLOCFRC, NLOCMAT, NLOCFRC) for local and non-local integrals share much internal structure in common. For example, label numbers are used to emphasize the similarities between routines rather than to have strict numerical ordering within routines. Second, the grid matrix element routines (GRDOVLP, GRIDRHO, VSLOMAT, VSLOFRC, ESLOFRC) share much common internal structure. There are smaller sets of similar routines. If developing a new routine, try to follow model of existing routine in a family.

Back to Top

Loops

Looping should be done through do-loops, rather than do-while or if/goto constructs or other code construct.

  • Loops are via do/continue or do/enddo, not do while
  • Do NOT pass loop index into subroutine, e.g., do i=1,n call VERYBAD( i ) enddo is to be avoided at all costs. Some optimizing compilers will get this wrong.
  • No compound label-sharing loops. E.g., do 10 j=1,n do 10 i=1,n s1 10 continue should be written as two loops: do 20 j=1,n do 10 i=1,n s1 10 continue 20 continue
  • do/enddo used for short loops without jumps to outside of loops.
  • labelled loops otherwise (long loops, branches, complicated). This is fine: do i=1,n s1 enddo But, this should be changed: do i=1,n if( foobar ) goto 11 enddo 11 continue to a labelled loop: do 10 i=1,n if( foobar ) goto 11 10 continue 11 continue
  • Note that labels are right-justified, and two-space indenting.
  • Note that exception goto label is indexed 11 to the loop’s 10.
  • Labelled do-loop ends with "continue" statement, not active statement.

Back to Top

Labels

Use of labels is highly conventional, and there are some special numbers to respect. Code clarity is the goal. "Big numbers" such as 1000,2000 or 100,200, etc should be use to denote "important" branch points or do-loops, with smaller increments for less important loops/branch points. Use labels to highlight code, not simply to denote sequence.

  • Labels are right-justified, NOT left-justified.
  • Special label numbers reserved for specific purposes:
    • 8xxx reserved for "read" format statements
    • 9xxx reserved for "write" format statements
    • "13" reserved for error/exception handling (e.g., 13xx, xx13).
    • 999 reserved for final jump location before routine exit
    • 99x reserved for exit with some final processing
  • Labels should be in numerical order, but with special treatment of I/O labels, and other exceptional labels.
  • read/write labels 8xxx/9xxx should be in numerical order as well, where the "xxx" suggests the location within routine.

Back to Top

Indenting

Consistent indenting leads to more readable code. Everyone has their own conventions; the convention for SeqQuest is as follows:

  • Code indent is by TWO spaces.
  • NO tabs
  • Open the first line after a "do" statement, close last line before the final continue in a do-loop: do 10 i=1,n s1 … 10 continue
  • Open within the branch points of an if-block: if( log1 )then s1 elseif( log2 )then s2 else s3 endif
  • Indent in continuation is one space: call FOO( arga,argb, … ,argn, $ argo, … ,argz )
  • Note: continuation character is a "$"
  • On (very) rare occasions, loops get deep enough that there is little code space left on a fortran 72-character line. In those cases, forego indenting in top level loops, or, better yet, try to use subroutine calls. Use good judgment, with code clarity being the goal rather than mindless adherence to a rule.

Back to Top

Space

Use of space in source code is highly conventional, and is designed for ease in reading code. While not strictly obeyed, the conventions are followed rather closely barring some compelling reason not to. The following lists common cases, but as always, it is better to inspect existing code and conform to conventions seen in the code.

  • Two space indenting for code, one space for continuation
  • One space inside (but not outside) parentheses around routine arguments: "call FOO( bar )" and not "call FOO (bar)".
  • One space inside (but not outside) if-statements: "if( log )then" and not "if (log) then"
  • However, "if( log ) s1", when not an if-block.
  • Single space around labels in do-statement: "do 10 i=1,n"
  • Two spaces inside no-label do-statement: "do^^i=1,n"
  • Eliminate internal spaces in fortran terms, i.e.,
    USE: / goto / enddo / elseif / endif /,
    NOT: / go to / end do / else if / end if /.
  • One space on either side of "=", e.g. "a = b", not "a=b" EXCEPT for do-statement, where there are no spaces: "do i=1,n"
  • One space either side of "+" or "-": "a = b + c"
  • No space around "*" or "/", except for emphasis: "a = b*c"
  • Space after read/write: "read(IRD,8000) label"
  • No space around array index: "wk(i01)"
  • When in doubt, check existing code for examples.
  • These rules can be bent when confronted with the 72-character line limit in fortran. However, preference is to break into a continued line rather than trying to squeeze everything onto one line.