阅读更多

1 What is GDB

一个支持包括c以及c++等众多语言的debugger工具
它允许您检查程序在执行期间的某个时刻正在做什么
能够定位诸如segmentation faults等错误的具体原因

对c/c++程序的调试，需要在编译前就加上-g选项：

1 2	gcc -g hello.c -o hello g++ -g hello.cpp -o hello

2 How to use GDB

gdb提供了一个交互式的shell，能够通过↑查询历史命令，可以通过tab进行命令行补全，可以通过help [command]查询帮助文档

进入gdb交互界面的几种方式：

gdb <binary_with_-g>：调试可执行文件
gdb <binary_with_-g> core.xxx：分析coredump
gdb <binary_with_-g> <pid_without_-g>：以可执行文件为元数据，调试指定进程
- <binary>需要用-g参数编译出来，否则指定该文件没有意义
- <pid>对应的进程可以是不带-g参数编译出来的版本，只要保证源码一样即可
gdb -p <pid_with_-g>：调试指定进程
- 若<pid>对应的进程如果是用-g参数编译出来，那么等效于gdb <binary_with_-g> + run

下面演示一下使用gdb <binary_with_-g> <pid_without_-g>这种方式进入gdb shell

# 源码
cat > main.cpp << 'EOF'
#include<iostream>
#include<thread>
#include<chrono>

int main() {
    std::cout << "hello, world!" <<std::endl;

    int cnt = 0;
    while(true) {
        ++cnt;
        std::this_thread::sleep_for(std::chrono::seconds(1));
	std::cout << "cnt=" << cnt << std::endl;
    }
}
EOF

# 编译两个版本，一个带-g参数，一个不带
g++ -o main_without_debug main.cpp -std=gnu++11
g++ -o main_with_debug main.cpp -std=gnu++11 -g

1 2	# 运行非debug版本 ./main_without_debug

# 查询进程id
pid=$(ps -ef | grep main_without_debug | grep -v grep | awk '{print $2}')

# 进入gdb shell
gdb main_with_debug ${pid}

2.1 Symbol Mismatch

Symbol mismatch can occur when you use a binary compiled on one machine and attempt to run it on a different machine, especially if the two machines have different configurations or architectures. Here’s why this can happen:

Library Dependencies: Binaries often rely on dynamic link libraries (shared libraries) or other system libraries. If the target machine doesn’t have the same versions of these libraries or they are missing altogether, you can encounter symbol mismatch errors.
Architecture Differences: If the two machines have different CPU architectures (e.g., x86 vs. ARM), binaries compiled for one architecture may not run on the other. This is a fundamental incompatibility.
Operating System Differences: Even if two machines have the same architecture, they may have different operating systems with different system calls and ABI (Application Binary Interface) specifications. This can lead to symbol mismatches.
Compiler and Compiler Options: The compiler used to build the binary can affect symbol compatibility. Different compiler versions or options might generate different symbol names or behaviors.
Bitness: Some operating systems and architectures have both 32-bit and 64-bit versions. Trying to run a binary compiled for one bitness on a machine of a different bitness can result in symbol mismatches.

To avoid symbol mismatch issues when moving binaries between machines:

Use Static Linking: Consider statically linking libraries into your binary when compiling. This bundles the necessary libraries into the binary, reducing dependencies on external libraries.
Build on the Target Machine: Whenever possible, compile your code on the machine where you intend to run it. This ensures that the binary is built with the correct dependencies and configurations.
Cross-Compilation: If you must build on one machine and run on another, use cross-compilation tools to generate binaries specifically tailored for the target machine’s architecture and operating system.
Package Managers: If you’re working with package managers (e.g., apt, yum, brew), use them to manage library dependencies and ensure compatibility between systems.
Containerization: Consider using containerization technologies like Docker to package your application along with its dependencies, ensuring portability across different environments.

When you generate a core dump file (usually named “core”) on one machine (Machine B in your scenario) and attempt to debug it using GDB on the same machine, symbol mismatch should not be a significant issue. Here’s why:

Binary Compatibility: The core dump file contains information about the state of the program at the moment it crashed or was interrupted. This includes the memory addresses, registers, and symbol names relevant to the binary that generated the core dump. Since you’re using GDB on the same machine where the binary was executed (Machine B), there should be no symbol mismatch problems related to the architecture or libraries of Machine A.
GDB Compatibility: GDB is designed to work with core dump files generated by the same binary or a compatible binary. It will use the debugging information (symbols) embedded in the binary to analyze the core dump. As long as the binary and the core dump are compatible in terms of architecture, compiler options, and library versions, you should be able to use GDB without significant issues.
Symbol Resolution: GDB uses the symbol information present in the binary (if it was compiled with debugging symbols) to resolve symbols during debugging. It doesn’t rely on external symbol files or libraries when debugging a core dump on the same machine where the program was running.

3 Command

3.1 Run Program

当我们通过gdb <binary>这种方式进入gdb shell后，程序不会立即执行，需要通过run或者start命令触发程序的执行

run：开始执行程序，直到碰到第一个断点或者程序结束
start：开始执行程序，在main函数第一行停下来

如果程序有异常（比如包含段错误），那么我们将会得到一些有用的信息，包括：程序出错的行号，函数的参数等等信息

# c++源文件
cat > segment_fault.cpp << 'EOF'
int main() {
    int *num = nullptr;
    *num = 100;
    return 0;
}
EOF

# 编译（可以试下不加-g参数）
g++ -o segment_fault segment_fault.cpp -std=gnu++11 -g

# 进入gdb shell
gdb segment_fault

# 执行程序，程序会出现段错误而退出，并输出相关的错误信息
# 如果编译时没有加-g参数，输出的信息就会少很多（比如行号和具体的代码就没有了）
(gdb) run

Starting program: xxx/segment_fault

Program received signal SIGSEGV, Segmentation fault.
0x000000000040051d in main () at segment_fault.cpp:3
3	    *num = 100;

3.1.1 set args

The set args command in GDB allows you to specify or change the command-line arguments for the program you are debugging during an active GDB session. This can be particularly useful if you want to test your program with different arguments without restarting GDB.

1	set args [arguments]

Examples:

set args -l a -C abc
set args --gtest_filter=TestXxx.caseX

3.1.2 --args

The --args option in GDB allows you to specify the program and its arguments directly from the command line when starting GDB. This can be very convenient for debugging programs that require command-line arguments.

1	gdb --args program [arguments]

Examples:

gdb --args ls -al

3.2 Attach Program

gdb -p 12345

3.3 Break Point

break：用于设置断点
- break <line_num>
- break <func_name>
- break <file_name>:<line_num>
- break <file_name>:<func_name>
info break：用于查看断点
delete：用于删除断点
- delete <break_id>：删除指定断点
- delete：删除所有断点
enable：用于启用断点
- enable <break_id>
disable：用于停用断点
- disable <break_id>

3.3.1 demo

# c++源文件
cat > set_break.cpp << 'EOF'
#include <iostream>

void funcA() {
    std::cout << "invoke funcA()" << std::endl;
}

int main() {
    std::cout << "hello world" << std::endl;

    int num = 0;

    int *num_ptr = &num;

    funcA();

    for(int i=0; i < 10; i++) {
        ++(*num_ptr);
    }

    std::cout << "num: " << *num_ptr << std::endl;

    return 0;
}
EOF

# 编译（可以试下不加-g参数）
g++ -o set_break set_break.cpp -std=gnu++11 -g

# 进入gdb shell
gdb set_break

# 通过list查看源码
(gdb) list 0

1	#include <iostream>
2
3	void funcA() {
4	    std::cout << "invoke funcA()" << std::endl;
5	}
6
7	int main() {
8	    std::cout << "hello world" << std::endl;
9
10	    int num = 0;

1 2	# 回车，继续输出下10行 (gdb)

11
12	    int *num_ptr = &num;
13
14	    funcA();
15
16	    for(int i=0; i < 10; i++) {
17	        ++(*num_ptr);
18	    }
19
20	    std::cout << "num: " << *num_ptr << std::endl;

1 2	# 回车，继续输出下10行 (gdb)

1
2
3

21
22	    return 0;
23	}

# 在行号为8的位置打断点
(gdb) break 8
Breakpoint 1 at 0x400848: file set_break.cpp, line 8.

# 在行号为10的位置打断点
(gdb) break set_break.cpp:10
Breakpoint 2 at 0x400864: file set_break.cpp, line 10.

# 在行号为12的位置打断点
(gdb) break 12
Breakpoint 3 at 0x40086b: file set_break.cpp, line 12.

# 在行号为4的位置打断点
(gdb) break 4
Breakpoint 4 at 0x400821: file set_break.cpp, line 4.

# 在行号为17的位置打断点
(gdb) break 17
Breakpoint 5 at 0x400881: file set_break.cpp, line 17.

# 在行号为20的位置打断点
(gdb) break 20
Breakpoint 6 at 0x40089a: file set_break.cpp, line 20.

# 在函数funcA处打断点，发现该断点已经重复了
(gdb) break funcA
Note: breakpoint 4 also set at pc 0x400821.
Breakpoint 7 at 0x400821: file set_break.cpp, line 4.

# 查看所有断点
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x0000000000400848 in main() at set_break.cpp:8
2       breakpoint     keep y   0x0000000000400864 in main() at set_break.cpp:10
3       breakpoint     keep y   0x000000000040086b in main() at set_break.cpp:12
4       breakpoint     keep y   0x0000000000400821 in funcA() at set_break.cpp:4
5       breakpoint     keep y   0x0000000000400881 in main() at set_break.cpp:17
6       breakpoint     keep y   0x000000000040089a in main() at set_break.cpp:20
7       breakpoint     keep y   0x0000000000400821 in funcA() at set_break.cpp:4

# 执行命令run开始运行程序，发现现在程序卡在了行号为8的位置
(gdb) run
Starting program: xxx/set_break

Breakpoint 1, main () at set_break.cpp:8
8	    std::cout << "hello world" << std::endl;

3.4 Debugging

continue：继续运行直至程序结束或者遇到下一个断点
step：源码级别的单步调试，会进入方法，另一种说法是step into
next：源码级别的单步调试，不会进入方法，将方法调用视为一步，另一种说法是step over
stepi：指令级别的单步调试，会进入方法，另一种说法是step into
nexti：指令级别的单步调试，不会进入方法，将方法调用视为一步，另一种说法是step over
until：退出循环
finish：结束当前函数的执行
display <variable>：跟踪查看某个变量，每次停下来都显示它的值
undisplay <display_id>：取消跟踪
watch：设置观察点。当被设置观察点的变量发生修改时，打印显示
thread <id>：切换调试的线程为指定线程
up [<n>]：沿着栈往上走一层或n层
down [<n>]：沿着栈网下走一层或n层
frame：显示当前的栈信息，包括当前的源码
frame <n>：跳转到栈的指定层
attach <pid>：重新连接到某个进程

3.5 Display Information

bt、backtrace、where：查看当前调用堆栈
- bt 3：最上面3层
- bt -3：最下面3层
disassemble：查看当前的汇编指令
- disassemble：当前函数的汇编指令
- disassemble <function>：指定函数的汇编指令
- set disassembly-flavor intel：汇编风格指定为Intel Syntax
- set disassembly-flavor att：汇编风格指定为AT&T Syntax，该风格为默认风格
list：查看源码
- list：紧接着上一次的输出，继续输出后10行源码
- list -：紧接着上一次的输出，继续输出前10行源码
- list <linenumber>：输出当前文件指定行号开始的10行源码
- list <linenumber>, <end_linenumber>：输出指定行号区间的源码
- list <function>：输出指定函数的10行源码
- list <filename:linenum>：输出指定文件指定行号开始的10行源码
- list <filename:function>：输出指定文件指定函数的10行源码
- set substitute-path /root/starrocks /other/path/starrocks：修改源码索引路径。当二进制在A机器或者docker内编译，但是在机器B上分析core文件，源码路径通常是对不上的，因此需要用这个命令来调整一下
info用于查看各种调试相关的信息
- info break：查看断点
- info reg：查看寄存器
- info all-reg：查看所有寄存器，包括浮点寄存器和向量寄存器
- info stack：查看堆栈
- info thread：查看线程
- info locals：查看本地变量
- info args：查看参数
- info symbol <address>：查看指定内存地址所对应的符号信息（如果有的话）
print：用于查看变量
- print <variable>
- print <variable>.<field>
- print (<type>)*<address>：查看地址指向的对象，需要转型
- print *(<type>*)<address>：查看地址指向的对象，需要转型
- print <array>[0]@5：查看从下标0开始，长度为5的子集
- 查看、设置属性：show print <property>、set print <property> on/off，下面列出几个常用的属性名（全部属性可以通过show print [tab][tab]或者help show print来查看）
  - address：当程序显式函数信息时，显示函数地址，默认开启
  - array：当显示数组时，每个元素占一行，默认关闭
  - elements：数组的最大长度，超过该长度的元素就不再显示了，0表示无限制
  - raw-values：打印原始内容。在GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1上，打印标准库对象时，默认会打印优化后的内容（容器元素详情），而非容器本身的详细字段
  - pretty：是否以优雅的方式显式（分行、缩进等等，便于人阅读），默认关闭
x/<count><format><size> <addr>：以指定格式打印内存信息
- <count>：正整数，表示需要显示的内存单元的个数，即从当前地址向后显示<count>个内存单元的内容，一个内存单元的大小由第三个参数<size>定义
- <format>：表示addr指向的内存内容的输出格式
  - o：octal
  - x：hex
  - d：decimal
  - u：unsigned decimal
  - t：binary
  - f：float
  - a：address
  - i：instruction
  - c：char
  - s：string
  - z：hex, zero padded on the left
- <size>：就是指以多少个字节作为一个内存单元，默认为4
  - b：1 byte
  - h：2 bytes
  - w：4 bytes
  - g：8 bytes
- 示例：
  - x/1ug $rbp-0x4：查看寄存器rbp存储的内容减去0x4得到的地址中所存储的内容
  - x/1xg $rsp：查看寄存器rsp存储的地址中所存储的内容

info reg会显示所有寄存器的内容，其中内容会打印两列，第一列是以16进制的形式输出（raw format），另一列是以原始形式输出（natural format），下面显式了所有寄存器的大小以及类型

类型为int64的寄存器，natural format用十进制表示
类型为data_ptr以及code_ptr的寄存器，natural format仍然以十六进制表示，所以你会看到两列完全一样的值

<reg name="rax" bitsize="64" type="int64"/>
<reg name="rbx" bitsize="64" type="int64"/>
<reg name="rcx" bitsize="64" type="int64"/>
<reg name="rdx" bitsize="64" type="int64"/>
<reg name="rsi" bitsize="64" type="int64"/>
<reg name="rdi" bitsize="64" type="int64"/>
<reg name="rbp" bitsize="64" type="data_ptr"/>
<reg name="rsp" bitsize="64" type="data_ptr"/>
<reg name="r8" bitsize="64" type="int64"/>
<reg name="r9" bitsize="64" type="int64"/>
<reg name="r10" bitsize="64" type="int64"/>
<reg name="r11" bitsize="64" type="int64"/>
<reg name="r12" bitsize="64" type="int64"/>
<reg name="r13" bitsize="64" type="int64"/>
<reg name="r14" bitsize="64" type="int64"/>
<reg name="r15" bitsize="64" type="int64"/>

<reg name="rip" bitsize="64" type="code_ptr"/>
<reg name="eflags" bitsize="32" type="i386_eflags"/>
<reg name="cs" bitsize="32" type="int32"/>
<reg name="ss" bitsize="32" type="int32"/>
<reg name="ds" bitsize="32" type="int32"/>
<reg name="es" bitsize="32" type="int32"/>
<reg name="fs" bitsize="32" type="int32"/>
<reg name="gs" bitsize="32" type="int32"/>

3.5.1 Tips for debugging std::vector

print sizeof(*v._M_impl._M_start): Check element size.
print v._M_impl._M_finish - v._M_impl._M_start: Get element number.
print v._M_impl._M_start[i]: Get the ith element.
print &v._M_impl._M_start[i]: Get address of the ith element.
print v._M_impl._M_start[i]@j：Print item with offset from i to j.

3.5.2 Tips for debugging std::shared_ptr

print *p._M_ptr: Show details, it may print something like <vtable for Derive+16>.
print *(<type>*)p._M_ptr: Show details of derived type.
x/1a p._M_ptr: Get first item of vtable.
x/10a *(void**)p._M_ptr: Get first 10 items of vtable.
- *(void**) dereferences the void** pointer, effectively accessing the first entry in the vtable, which is a pointer to another void*.

3.5.3 demo of print

# c++源文件
cat > print.cpp << 'EOF'
struct Person {
    const char* name;
    const char* phone_num;
    const int age;
};

int main() {
    Person p {"张三", "123456789", 18};
    return 0;
}
EOF

# 编译（可以试下不加-g参数）
g++ -o print print.cpp -std=gnu++11 -g

# 进入gdb shell
gdb print

# 查看源码
(gdb) list

1	struct Person {
2	    const char* name;
3	    const char* phone_num;
4	    const int age;
5	};
6
7	int main() {
8	    Person p {"张三", "123456789", 18};
9	    return 0;
10	}

# 设置断点
(gdb) break 9
Breakpoint 1 at 0x400528: file print.cpp, line 9.

# 运行程序，会停在断点处
(gdb) run
Starting program: xxx/print

Breakpoint 1, main () at print.cpp:9
9	    return 0;

# 查看相关信息
(gdb) print p
$1 = {name = 0x4005c0 "张三", phone_num = 0x4005c7 "123456789", age = 18}
(gdb) print p.name
$2 = 0x4005c0 "张三"
(gdb) print p.phone_num
$3 = 0x4005c7 "123456789"
(gdb) print p.age
$4 = 18
(gdb) print &p
$5 = (Person *) 0x7fffffffe0c0

3.6 Load Symbol Table

symbol-file /path/to/binary_file.debuginfo

3.7 Execute outside commands

格式：!<command> [params]

1 2	(gdb) !pwd xxx/gdb_tutorial

3.8 Handle Signal

1	(gdb) handle <signal> <action>

<signal>: The name or number of the signal (e.g., SIGINT, SIGSEGV)
<action>: One or more actions to specify how GDB should handle the signal. The actions can include:
- nostop: GDB should not stop the program when this signal is received.
- stop: GDB should stop the program when this signal is received.
- noignore: GDB should not ignore the signal (default action for most signals).
- ignore: GDB should ignore the signal.
- noprint: GDB should not print a message when the program receives this signal.
- print: GDB should print a message when the program receives this signal.

3.9 Tips

3.9.1 Redirect source file path

The set substitute-path command is used in GDB (GNU Debugger) to remap source paths. This is useful when the source code was compiled on one machine with a different directory structure and you need to debug it on another machine where the directory structure is different.

1	(gdb) set substitute-path <original-path> <new-path>

3.9.2 Redirect Thread Info to File

(gdb) set pagination off
(gdb) set logging file /tmp/threads.txt
(gdb) set logging on
(gdb) info threads
(gdb) set logging off

3.9.3 Redirect Thread Stack to File

(gdb) set pagination off
(gdb) set logging file /tmp/threads.txt
(gdb) set logging on
(gdb) thread apply all bt
(gdb) set logging off

3.9.4 Print all Threads Stack

1	gdb -ex "set pagination 0" -ex "thread apply all bt" -batch

4 Tips

4.1 How to Analyze a Core File

Here are some of tips:

info threads: The default thread may not be where the crash occurred.
thread <n>: Switch to the specific thread where there may be something wrong.
bt <n>: List the call stack.
frame <n>: Go to the specific call frame.
info locals、info args: See local variables and arguments.
print: See details of something.

4.2 Debugging an x86 application in Rosetta for Linux

I’m runing a centos7.9/amd64 docker container on my mac(M3), and fail to debug a core file with the error message below:

$ gdb main core

GNU gdb (GDB) Red Hat Enterprise Linux 10.2-6.el7
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main...

warning: Can't open file /run/rosetta/rosetta during file-backed mapping note processing

warning: core file may not match specified executable file.
[New LWP 17530]

warning: Selected architecture i386:x86-64 is not compatible with reported target architecture aarch64

warning: Architecture rejected target-supplied description

warning: Unexpected size of section `.reg/17530' in core file.

warning: Unexpected size of section `.reg2/17530' in core file.
Core was generated by `/run/rosetta/rosetta ./main ./main'.
Program terminated with signal SIGABRT, Aborted.

warning: Unexpected size of section `.reg/17530' in core file.

warning: Unexpected size of section `.reg2/17530' in core file.
#0  0x0000effff7dfba50 in ?? ()

And there’s a mechanism called Rosetta:

This is a software bridge that allows applications compiled for one instruction set architecture (such as Intel x86) to run on a different architecture (like Apple’s ARM-based processors). Apple has used two versions: Rosetta for the transition from PowerPC to Intel processors, and Rosetta 2 for the transition from Intel to Apple Silicon.

According to Debugging an x86 application in Rosetta for Linux, I can debug the program by:

# This will hang
ROSETTA_DEBUGSERVER_PORT=1234 ./main &

# Enter Gdb
gdb
(gdb) set architecture i386:x86-64
(gdb) file main
(gdb) target remote localhost:1234
(gdb) continue

Or for lldb

# This will hang
ROSETTA_DEBUGSERVER_PORT=1234 ./main &

# Enter lldb
lldb
(lldb) platform select remote-linux
(lldb) target create ./main
(lldb) gdb-remote localhost:1234
(lldb) continue

4.3 How to ignore interrupt of sepcific signal

For gdb:

1 2	(gdb) handle SIGSEGV pass noprint nostop (gdb) handle all pass noprint nostop

For lldb:

-p <boolean> ( --pass <boolean> ): Whether or not the signal should be passed to the process.
-s <boolean> ( --stop <boolean> ): Whether or not the process should be stopped if the signal is received.
-n <boolean> ( --notify <boolean> ): Whether or not the debugger should notify the user if the signal is received.
When debugging process with JNI, you may receive many interruptions from libjvm.so, like stop reason = signal SIGSEGV: address access protected, which is truly annoying, so you can use this to ignore them

1
2
3

(lldb) process handle -p true -s false -n true SIGSEGV

(lldb) process handle

4.4 How to print env

For environment variables set up before starting program, we can check them by /proc/<pid>/environ. But for environment variables set up at runtime, we can check them using the following approach:

1	(gdb) call (char *)getenv("TERM")

5 gdb-dashboard

gdb-dashboard在gdb的基础之上，提供了一个更加友好的格式化界面

help dashboard：查看帮助手册
dashboard thread：启用/禁用线程信息（大型工程，线程比较多的话，一般会禁用）
dashboard：刷新，通常在print查看一些变量信息后，需要刷新一下重新显示详情

6 LLDB

Tutorial

LLDB is similar to GDB in most operations, but there are some differences (help for more details):

There is no start command
frame select <id>/f <id>: select a frame
lldb -c <core> <binary>: Analyze core file
list -<count>: Print previous <count> lines
lldb <binary> -- <args>

Liuye Notebook

Cpp-Tools-GDB