Compile the source code for the first example as follows:
cc -o sample -cougar sample.c -lperfmonRun the resulting executable on one process:
yod -sz 1 sampleYou should see the following output:
9.3939 MFlops: 200 floating point operations in 0.0000213 seconds 0.1899 MFlops: 479 floating point operations in 0.0025224 seconds
Explanation of Output:
The first call to printmflops() prints out the expected number
of floating point operations. These operations result from converting the
integer i into a floating point number to store in the a array.
At first, the second call to printmflops() appears to produce
unexpected results, but the results really are valid.
The first printmflops()
function did not stop the Mflops counter or the time counter.
It merely performed the calculation and displayed it for that moment in
execution. The time it took to do the print of the first Mflops is thus
included in the result of the second printing of Mflops.
Printing is a rather slow operation. Add to that the extremely short
execution time of this test program, and the result is that the print time
severely perturbs the time of the overall program execution.
Also, it might be expected that the second printmflops() call
would print out 400 floating point operations, rather than 485. However,
the printmflops() routine itself performs floating point operations
and the printf routine does some more. Thus, while the
printmflops() routine correctly samples the Mflops counter before any of
these internal floating point operations are performed, they are still
accumulated and added to the overall sum if the counter is not
stopped. The second printmflops() call is simply reporting on all
floating point operations between the beginmflops() and
endmflops().
Typically, a program would be doing large amounts of work between the
beginmflops() and printmflops() calls, so the overhead of
its floating point work and output time would not be noticeable. However,
sample2.c was rewritten to eliminate the problem by always doing the
print after stopping the counter with the end call.
Compile and run the second sample program as follows to see the
expected results:
cc -o sample2 -cougar sample2.c -lperfmon
You should see the following output:
yod -sz 1 sample2
9.4116 MFlops: 200 floating point operations in 0.0000213 seconds 19.3339 MFlops: 200 floating point operations in 0.0000103 seconds
| Intel Tutorial Notes |