TFLOPS demo system


Compile,run a f77 hello world on cougar compute nodes

You must already have accounts and be behind the firewall. We are already logged into sasn100.sandia.gov.

The preferred compile method is to compile on sasn100 with the cross compilers and then switch over to janus to run 'yod' your program onto the compute nodes. See my .cshrc for a .cshrc which works right for sasn100 and janus, including setting up the cross-compiler environment. To cross-compile, you need to setup the cross-compiler environment.

The cougar f77 compiler is named cif77. The cougar c compiler is cicc. The cougar c++ compiler is ciCC. cif77 invokes the Portland Group F77 compiler (pgf77) with the -cougar switch which sets up the right compile environment for a program that will be run on a cougar compute node.

In this example we assume you are doing the compile on sasn100 and the jump to a window where you are logged into janus.

sasn100 92 > cat tst.f
       include 'mpif.h'
       integer*4 ierr,num_pe,my_pe
       call mpi_init(ierr)
       call mpi_comm_size(MPI_COMM_WORLD,num_pe,ierr)
       call mpi_comm_rank(MPI_COMM_WORLD,my_pe,ierr)
       call mpi_barrier(MPI_COMM_WORLD,ierr)
       print *,'hello from node ', my_pe,' of ',num_pe
       call mpi_finalize(ierr)
       call exit
       end




Acknowledgement and Disclaimer
Now to compile it:
sasn100 93 > cif77 tst.f -o tstf -lmpi
Linking:
Note that the compiler name is pgf77 but a script 'cif77' sets up the correct syntax for calling pgf77. It adds the -cougar switch.

To actually launch a job on the compute nodes, you use the job command. Use 'yod -h' for a quick list of yod options.

We need to login now to janus (above assumes you did the compile on sasn100). So jump to a window where you are logged into janus or do 'ssh janus'.

Now we run the program:

janus 94 > yod -sz 4 tstf  &

 hello from node            0 of            4
 hello from node            2 of            4
 hello from node            1 of            4
 hello from node            3 of            4

Now we will run the utility to display the activity on the machine.
janus 94 > showmesh

You will see something like the display below (with janus 'big')
janus ~ 101 > /usr/community/bin/shmesh

Current mesh allocation for janus (last booted Thu Jul 01 12:24:10 1999)

                  1         2         3         4         5     
        01234567890123456789012345678901234567890123456789012345
       +--------------------------------------------------------+
     20|ddddddddddddddddddddddddddddddddddddddddddddddddddddDDDD|   75
     96|ddddddddddddddddddddddddddddddddddddddddddddddddddddDDDD|  151
    172|dddddddddddddddddddddddddddddddddddddddddddddddddddddddd|  227
    248|ddddddddddddddddddddddddddddddddd dddddddddddddddddddddd|  303
    324|dddddddddddddddddddddddddddddddd  dddddddddddddddddddddd|  379
    400|dddddddddddddddddddddddddddddddd  dddddddddddddddddddddd|  455
    476|dddddddddddddddddddddddddddddddd  dddddddddddddddddddddd|  531
    552|dddddddddddddddddddddddddddddddd  dddddddddddddddddddddd|  607
    628|dddddddddddddddddddddddddddddddd  ddddddddddddddDDDDdddd|  683
    704|dddddddddddddddddddddddddddddddd  ddddddddddddddDDDDdddd|  759
    780|dddddddddddddddddddddddddddddddd  dddddddddddddddddddddd|  835
    856|dddddddddddddddddddddddddddddddd  dddddddddddddddddddddd|  911
    932|dddddddddddddddddddddddddddddddd  dddddddddddddddddddddd|  987
   1008|ddddddddddddddggffffffffgggggggg  ggggggjjjjjjjjjjjjjjjj| 1063
   1084|eeeeeeeeeeeeeeeeffffffffgggggggg  ggggggggjjjjjjjjjjjjjj| 1139
   1160|eeeeeeeeeeeeeeeeffffffffgggggggg  ggggggggjjjjjjjjjjjjjj| 1215
   1236|eeeeeeeeeeeeeeeeffffffffgggggggg  ggggggSDDDDDDDEEEggjjj| 1291
   1312|eeeeeeeeeeeeeeeeffffffffgggggggg  ggggggSDDDDDgDEEEgjjjj| 1367
   1388|eeeeeeeeeeeeeeeeffffffffgggggggg  ggggggggiiiiiiiijjjjjj| 1443
   1464|eeeeeeeeeeeeeeeeffffffffgggggggg  ggggggggiiiiiiiijjjjjj| 1519
   1540|eeeeeeeeeeeeeeeeffffffffgggggggg  ggggggggiiiiiiiijjjjjj| 1595
   1616|eeeeeeeeeeeeeeeegggggggggggggggg  jjjjjjjjiiiiiiiijjjjjj| 1671
   1692|eeeeeeeeeeeeeeeegggggggggggggggg  jjjjjjjjiiiiiiiijjjjjj| 1747
   1768|eeeeeeeeeeeeeeeegggggggggggggggg  jjjjjjjjiiiiiiiijjjjjj| 1823
   1844|eeeeeeeeeeeeeeeegggggggggggggggg  jjBDDDSjjjjjjjjjjjjjjj| 1899
   1920|eeeeeeeeeeeeeeeegggggggggggggggg  kkDDDDSkkkkkcccccaaaaa| 1975
   1996|eeeeeeeeeeeeeeeegggggggggggggggg  kkSSSSSkkkkkccccccbbbb| 2051
   2072|eeeeeeeeeeeeeeeegggggggggggggggg  kkkkkkkkkkkkccccccbbbb| 2127
   2148|eeeeeeeeeeeeeeeegggggggggggggggg  kkkkkkkkkkkkccccccbbbb| 2203
   2224|eeeeeeeeeeeeeeeekkkkkkkkkkkkkkkk  kkkkkkkkkk||ccccccbbbb| 2279
   2300|hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh  ||||||||||||ccc:::::::| 2355
   2376|hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh  ||||||||||||::::::::::| 2431
       +--------------------------------------------------------+

Legend:
B    boot node                D    TOS disk node (RAID)
     non-existant node        S    TOS service node
T    node booting OS          E    TOS ethernet node
F    other TOS node           X    failed Cougar node(s)
.    free Cougar eagle node   :    free Cougar kestrel nodes
|    free Cougar NQS node(s)

free/total open nodes = 34/140, free/total nqs nodes = 52/3226

Job ID  User      Base OS     Size   Start           Partition_name/Command line
--- --- --------- ---- ------ ------ --------------- ---------------------------
 a  23  wryxxxg   1971 CGR    10     Jul 01 13:56:42 yod -size 10 co 
 b  29  jfdxxxx   2048 CGR    32     Jul 01 14:43:46 yod -comm 2M -sz 32 cayakiote3_mp.x 
 c  30  crxxxxx   1966 CGR    64     Jul 01 14:59:11 yod -sz 64 -comm 1M -stack 1M r2d2_and_c3po
 d  20  ksxxxxx   20   CGR    1414   Jul 01 12:37:09 yod -sz 1414 my-parallel-exe 
 e  26  kexxxxx   1084 CGR    512    Jul 01 12:37:15 yod -sz 512 -stack 2M -comm 2M practiko3d 
 f  27  tvxxxxx   1024 CGR           Jul 01 12:37:19 /cougar/bin/yod quest_for_camelot
 g  22  hmxxxxx   1022 CGR           Jul 01 12:37:11 /cougar/bin/yod -masync -proc 2 -comm 50M mykeep
 h  21  fmxxxxx   2300 CGR    128    Jul 01 12:37:11 yod -sz 128 -masync -fyod 1 do_red_dp3 
 i  25  mkxxxxx   1430 CGR    96     Jul 01 12:37:15 yod -munix -comm 0M -sz 96 btriste
 j  28  hmxxxxx   1048 CGR           Jul 01 13:10:48 /cougar/bin/yod -masync -proc 2 -comm 50M mykup 
 k  24  hmxxxxx   1954 CGR           Jul 01 14:05:22 /cougar/bin/yod -masync -proc 2 -comm 50M yrkup 

The showmesh command still says 'puma' instead of 'cougar' nodes. Each ':' on the mesh is 2 nodes (one board with 2 nodes. Each node has 2 cpus). /usr/community/shmesh drops the empty spot when the middle section is switched to the classified side.

Note that 'call exit' instead of 'stop' gets rid of all the 'FORTRAN STOP' messages.

That's it!

janus 97 > logout