libperfmon Tutorial: Example Instructions

First, collect the source files for the examples.

Compile the source code for the first example as follows:

cc -o sample -cougar sample.c -lperfmon
Run the resulting executable on one process:
yod -sz 1 sample
You should see the following output:
 9.3939 MFlops: 200 floating point operations in 0.0000213 seconds
 0.1899 MFlops: 479 floating point operations in 0.0025224 seconds

Explanation of Output:

The first call to printmflops() prints out the expected number of floating point operations. These operations result from converting the integer i into a floating point number to store in the a array.

At first, the second call to printmflops() appears to produce unexpected results, but the results really are valid. The first printmflops() function did not stop the Mflops counter or the time counter. It merely performed the calculation and displayed it for that moment in execution. The time it took to do the print of the first Mflops is thus included in the result of the second printing of Mflops.

Printing is a rather slow operation. Add to that the extremely short execution time of this test program, and the result is that the print time severely perturbs the time of the overall program execution.

Also, it might be expected that the second printmflops() call would print out 400 floating point operations, rather than 485. However, the printmflops() routine itself performs floating point operations and the printf routine does some more. Thus, while the printmflops() routine correctly samples the Mflops counter before any of these internal floating point operations are performed, they are still accumulated and added to the overall sum if the counter is not stopped. The second printmflops() call is simply reporting on all floating point operations between the beginmflops() and endmflops().

Typically, a program would be doing large amounts of work between the beginmflops() and printmflops() calls, so the overhead of its floating point work and output time would not be noticeable. However, sample2.c was rewritten to eliminate the problem by always doing the print after stopping the counter with the end call.

Compile and run the second sample program as follows to see the expected results:

cc -o sample2 -cougar sample2.c -lperfmon
yod -sz 1 sample2
You should see the following output:
 9.4116 MFlops: 200 floating point operations in 0.0000213 seconds
 19.3339 MFlops: 200 floating point operations in 0.0000103 seconds

Intel Tutorial Notes
Acknowledgement and Disclaimer