SoFunction
Updated on 2025-05-20

A detailed guide to Linux using perf tool for performance analysis

1. Introduction to perf

perf is a performance analysis tool that comes with the Linux kernel. It can collect and analyze performance data of systems and applications. perf is implemented based on performance Counters of the Linux kernel, through which you can collect a large amount of information about CPU, memory, I/O, etc. perf supports multiple analysis modes, such as sampling, tracking, event counting, etc. The key features of the perf tool are as follows:

  • CPU performance count: collects information such as CPU cycles, instructions, cache access, etc.
  • Call graph analysis: analyzes the frequency, call chain and performance bottlenecks of function calls.
  • Time tracking: Perform accurate analysis of program execution time.
  • Memory access: analyzes memory access modes, such as cache hit rate, memory bandwidth usage, etc.
  • Event tracking: Supports tracking of different events, such as system calls, process scheduling, etc.

2. Perf installation

Most modern Linux distributions have the perf tool pre-installed. If you do not have perf installed on your system, you can install it through the following command:

sudo apt update
sudo apt install linux-tools-common linux-tools-$(uname -r)

3. Basic use of perf

3.1. Check the CPU performance counter

One of the easiest perf commands is to view CPU performance counter information. You can use the perf stat command to collect some basic statistics:

perf stat ls

The above command will execute the ls command and output CPU usage, such as the number of cycles, instructions, cache hit rate, etc.

 Performance counter stats for 'ls':

        1.615207      task-clock (msec)         #    0.999 CPUs utilized          
        1,234,568      context-switches          #    0.764 K/sec                  
          567,876      CPU-migrations            #    0.351 K/sec                  
        100,056,789    page-faults               #    61.92 K/sec                  
       2,456,789,123  cycles                    #    1.517 GHz                   
       1,234,567,890  instructions              #    0.50  insns per cycle        
         345,678,901  branches                  #    213.12 M/sec                 
          123,456,789  branch-misses             #    35.66% of all branches       
       
       0.001500123 seconds time elapsed

Common statistics include:

  • task-clock: task execution time
  • cycles: CPU cycles
  • instructions: number of instructions
  • branches: number of branch instructions
  • branch-misses: Number of times branch prediction failed
  • page-faults: Number of page faults

3.2. View system calls and events

If you want to view a program's system calls, you can use the perf trace command. For example:

perf trace ./my_program

This command lists the system calls when the my_program program is executed, similar to strace, but perf trace provides more performance analysis information.

3.3. Calling the diagram

perf also supports generating call graphs, which can help us understand the situation of function calls. Use perf record to sample and then view the call graph via perf report.

perf record -g ./my_program
perf report

The -g option enables sampling of the call graph. After executing the perf report, you can see the function call graph and find out possible performance bottlenecks

3.4. Analyze hotspot functions

Suppose we need to analyze the most time-consuming function in a program. It can be obtained through perf record and perf report:

perf record -e cycles -a -- sleep 10
perf report

The above command records all cycles on the CPU and generates a report after 10 seconds of execution. In the report, you can see which functions consume the most CPU cycles.

4. Advanced use of perf

4.1. Track specific events

perf supports a variety of hardware and software events, and the event of interest can be specified through the -e parameter. For example, monitor instruction count and cache hit rate:

perf stat -e instructions,cache-references,cache-misses ls

Common performance events include:

  • instructions: Number of instructions executed
  • cycles: CPU cycles
  • cache-references: cache access count
  • cache-misses: cache misses

4.2. CPU-level performance analysis

Sometimes, performance issues on the CPU can affect the performance of the entire system. perf can help us analyze CPU-level events. For example, view CPU usage, context switching, etc.:

perf stat -e cpu-clock,task-clock,cpu-migrations,context-switches -a

This command displays system-level CPU performance data in real time, including context switching and CPU migration.

4.3. Analyze multi-process

perf also supports multi-process performance analysis. For example, analyze the performance of all processes in the entire system:

perf stat -a -e cycles,instructions,cache-references,cache-misses

Through the above commands, perf will display the performance data of all processes in the system, including CPU cycles, instruction counts, and cache access.

5. Perf output analysis

The output of perf usually contains a lot of details, and understanding this data is crucial for performance analysis. We can analyze the output results from the following aspects:

CPU cycles and instructions, the execution efficiency of instructions can be calculated by comparing cycles and instructions. If instructions are much less than cycles, it means that the CPU utilization rate is not high, which may be due to branch prediction failure, memory latency and other problems.

Cache hit rate, by looking at cache-references and cache-misses, you can judge the cache hit rate. If there are too many cache misses, it means that the program's memory access mode is not friendly enough, which may lead to performance bottlenecks
Context switch and CPU migration, frequent context switch and CPU migration often lead to performance degradation. These problems may be caused by lock competition, IO blocking, etc.

6. Summary

perf is a powerful performance analysis tool that helps developers perform performance analysis of systems and applications from multiple dimensions. By mastering the basic commands and advanced functions of perf, developers can more efficiently locate performance bottlenecks and optimize the system's operating efficiency.

This is the article about this detailed guide on Linux's performance analysis using perf tool. For more related Linux perf performance analysis content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!