p2-cv
Perf Tutorial
Perf is a tool used to profile code. Profiling code involves dynamically monitoring code execution to measure time, space, number of function calls, or usage of particular instructions. It is usually used after program completeness and correctness to analyze efficiency. Perf is a tool that is used to do just that. This tutorial will walk through steps to profile code in the space of the image processing project in EECS 280 with the solution to the project. You can use this as a starting point for comparison if your project is taking too long.
Do not profile your code until it gives the correct output.
Step 1: Compile code for profiling by using -g
. Our Makefile does
this by default.
$ make processing_public_tests.exe
Step 2: Use Perf’s record command with the -g
flag. Pass in any
command line arguments accordingly. The results below may vary. This
command should create a file called “perf.data
”.
$ perf record -g ./processing_public_tests.exe crabster
Testing crabster rotate left...PASS
Testing crabster rotate right...PASS
Testing crabster energy...PASS
Testing crabster cost...PASS
Testing crabster find seam...PASS
Testing crabster remove seam...PASS
Testing crabster seam carve 50x45...PASS
Testing crabster seam carve 70x35...PASS
crabster tests PASS
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.032 MB perf.data (162 samples) ]
Step 3: Use Perf’s report
command to generate a call graph of the
execution of the code.
$ perf report
These results show the percentage of execution time for each function.
In the above image you can see that execution spends 14.55% of the
time in the symbol known as compute_energy_matrix
. These percentages
will vary between runs. The command that generated this report is
processing_public_tests.exe
.
Step 4: Navigate through the call trees of functions with the arrow keys by highlighting a function in question and pressing enter.
compute_energy_matrix
is spending the majority of its execution by
calling the function Image_get_pixel
. This may or may not be
problematic. Our job now becomes investigating certain functions that
could be “bottlenecks” in the execution of our code.
Step 5: Given the output of Perf, determine which functions are possibly taking too much of the execution time.
Compare these results from the solution with yours. If any functions are near the top that aren’t in the solution, that might be a good place to start looking to optimize. Remember that your percentages will likely never be the same as the solution’s. Things to look for are unnecessary loops, function calls, or objects passed by copy. Again, don’t use this tool until your code gives the correct output!