Truly Parallel Game-of-Life

** Due:** Tuesday, October 7

** Goals:**

- Introduce the concept of spatial decomposition.
- Inspire further thought about load balancing techniques.
- Demonstrate the utility of ghost cells.
- Use new MPI functionality: data packing, sendrecv.

** Background:**

Despite your best efforts, your thesis advisor isn't impressed. The
previous simulations of the vegetation of the Gila Wilderness proved
to be insufficiently detailed. With the threat of field work looming,
you are getting desperate. While watching reruns of Wrestlemania IX
and Monster Trucks, you realize your previous simulations weren't big
enough to resolve all the subtle features. Unfortunately, a
sufficiently large simulation will consume too much time and memory to
be feasible on a single processor. Instead, you need to parallelize
each instance of the game-of-life. When Dennis Rodman enters the
wrestling arena you go back to your computer.

** Assignment:**

This time you will code a truly parallel version of the game-of-life (GOL). The rules for initializing and updating grid cells remain unchanged, but you will now run on a much larger grid. Your program should divide the simulation grid among processors using a two-dimensional decomposition. It should prompt for the number of processors in each dimension (PX by PY) and give the same answers for any PX and PY, including the ``degenerate'' case of PX and/or PY = 1. Your code should also work when one or both of the grid dimensions NX,NY are not integer multiples of PX,PY. Finally, your code should keep track of how much time is spent in computation and in communication.

** What you hand in:**

- A program listing of your code. Again, only hand in the
routines you change or add. This time you will need to change
the routines that actually perform the GOL.
- Statistical results for 2 problems, each run on various numbers
of processors, including (if your machine supports it) some runs
on non-power-of-two P. Also, for problem (1) show some results
for runs with the same P, but where you vary PX and PY,
including the case where PX or PY = 1.
(1) Fixed-size problem:

NX = NY = 240, PROB = 0.337, N = 50, SEED = 314159(2) Scaled-size problem:

NX = 50*PX, NY = 30*PY, PROB = 0.5, N = 50, SEED = 271828As before, we are looking to see that you get the correct answers and that (for problem 1) your results do not depend on P or on PX or PY.

- A simple table of timings for different P and PX, PY values that
lists the total run-time, computation time, communication time,
and overall speed-up and parallel efficiency.
- A simple log-log plot of total CPU time versus P for both of
these problems. Include lines that show what ``perfect''
speed-up would be, along with data points for your actual
timings. Tell us why you think the two problems scaled as they
did and any ideas you have as to how you might improve the
scalability.
- Brief (!) answers to the Open Questions below.

** Open Questions:**

- Is it better to use a one-dimensional or two-dimensional
decomposition of the grid? Why?
- What if it was more expensive to perform the update calculation
on some grid cells than on others? How might this impact your
code's performance and what could you do to take this into account?
- Imagine your GOL computation wasn't performed on a regular grid,
but on an irregular grid of triangles? By irregular we mean
that each triangle could have a variable number of neighboring
triangles. What parts of your program would have to change to
accommodate this generality?
- (Optional) If Hulk Hogan drove a monster truck through the Gila,
would there be any vegetation left?