0%

Linux-Kernel

阅读更多

1 How to compile kernel

准备环境:这里我安装的系统是CentOS-7-x86_64-Minimal-1908.iso

第一步:安装编译相关的软件

1
2
3
4
yum makecache
yum -y install ncurses-devel make gcc bc openssl-devel
yum -y install elfutils-libelf-devel
yum -y install rpm-build

第二步:下载内核源码并解压

1
2
3
yum install -y wget
wget -O ~/linux-4.14.134.tar.gz 'http://ftp.sjtu.edu.cn/sites/ftp.kernel.org/pub/linux/kernel/v4.x/linux-4.14.134.tar.gz'
tar -zxvf ~/linux-4.14.134.tar.gz -C ~

第三步:配置内核编译参数

1
2
3
4
5
cd ~/linux-4.14.134
cp -v /boot/config-$(uname -r) .config

# 直接save然后exit即可
make menuconfig

第四步:编译内核

1
2
cd ~/linux-4.14.134
make rpm-pkg

第五步:更新内核

1
2
3
4
# 查看rpm包的信息
rpm2cpio kernel-4.14.134.x86_64.rpm | cpio -div

rpm -iUv ~/rpmbuild/RPMS/x86_64/*.rpm

1.1 Reference

2 Dynamic Kernel Module Loading

Dynamic Kernel Module Loading (DKML) is a mechanism in operating systems, particularly in Unix-like systems such as Linux, that allows for the loading and unloading of kernel modules at runtime. Kernel modules are pieces of code that can be loaded into the kernel to extend its functionality without the need to reboot the system. This feature is particularly useful for adding support for new hardware, filesystems, or other features without requiring a full kernel rebuild or system restart.

Here are the key aspects of Dynamic Kernel Module Loading:

  • Modularity: DKML enhances the modularity of the kernel. Instead of having a monolithic kernel with all functionalities built-in, functionalities can be separated into modules that are loaded as needed.
  • Flexibility: It allows for greater flexibility in managing system resources. Modules can be loaded when their functionality is required and unloaded when they’re no longer needed, freeing up memory and other resources.
  • Ease of Updates and Maintenance: Updating or adding new features to the kernel becomes easier. Instead of recompiling and rebooting the entire kernel, only the relevant modules need to be updated.
  • On-Demand Loading: Many modules are loaded automatically by the system in response to detected hardware or filesystems. This on-demand loading simplifies configuration and ensures that only necessary modules are loaded.
  • Commands for Module Management: In Linux, commands like insmod, rmmod, modprobe, and lsmod are used to insert, remove, manage, and list kernel modules, respectively.
  • Dependencies Handling: The system handles dependencies between modules, loading any required supporting modules automatically.
  • Security Considerations: Loading modules into the kernel space can have security implications, as malicious or faulty modules could affect the stability and security of the system.
  • Performance Impacts: While dynamic loading offers flexibility, it can have performance impacts due to the overhead of loading and unloading modules.
  • Usage in Various Systems: Beyond Linux, other systems like FreeBSD and Solaris also support dynamic kernel module loading, though with different implementations and utilities.

2.1 Tools

In Unix-like operating systems, several command-line tools are used to manage dynamic kernel module loading. These tools allow users to insert, remove, and manage kernel modules while the system is running. Here’s an introduction to some of the most commonly used tools:

insmod:

  • Purpose: This command is used to insert a module into the Linux kernel.
  • Usage: insmod [module_name]
  • Details: When you use insmod, you must specify the full path to the module if it is not in the default directory. It does not resolve dependencies, meaning you need to load any dependent modules beforehand.

rmmod:

  • Purpose: This command is used to remove a module from the Linux kernel.
  • Usage: rmmod [module_name]
  • Details: It will only remove the module if it is not in use and if no other modules depend on it.

modprobe:

  • Purpose: This command adds or removes modules from the Linux kernel.
  • Usage: To insert a module, use modprobe [module_name]. To remove a module, use modprobe -r [module_name].
  • Details: Unlike insmod, modprobe automatically handles dependencies. It checks the module dependencies listed in /lib/modules/$(uname -r)/modules.dep file and loads them as needed.

lsmod:

  • Purpose: This command is used to show the status of modules in the Linux kernel.
  • Usage: lsmod
  • Details: It displays a list of all currently loaded modules, along with module size and information about what other modules are using them.

depmod:

  • Purpose: This tool creates a dependency file for modules.
  • Usage: depmod
  • Details: Generally run automatically when installing new modules, it analyzes the modules and builds a list of dependencies, which is then used by modprobe.

modinfo:

  • Purpose: Provides detailed information about a kernel module.
  • Usage: modinfo [module_name]
  • Details: It displays information such as module description, author, license, and parameters that can be set.

3 systemtap

3.1 How to install

准备环境:这里我安装的系统是CentOS-7-x86_64-Minimal-1810.iso

第一步:安装systemtap以及其他相关依赖

1
2
3
4
yum makecache
yum install -y systemtap systemtap-runtime systemtap-devel --enablerepo=base-debuginfo
yum install -y yum-utils
yum install -y gcc

第二步:下载并安装跟当前内核版本匹配的rpm包,包括kernel-devel-$(uname -r).rpmkernel-debuginfo-$(uname -r).rpmkernel-debuginfo-common-x86_64-$(uname -r).rpm,我的内核版本是3.10.0-957.el7.x86_64

1
2
3
4
5
wget "ftp://ftp.pbone.net/mirror/ftp.scientificlinux.org/linux/scientific/7.6/x86_64/os/Packages/kernel-devel-$(uname -r).rpm"
wget "http://debuginfo.centos.org/7/x86_64/kernel-debuginfo-$(uname -r).rpm"
wget "http://debuginfo.centos.org/7/x86_64/kernel-debuginfo-common-x86_64-$(uname -r).rpm"

rpm -ivh kernel-devel-$(uname -r).rpm kernel-debuginfo-$(uname -r).rpm kernel-debuginfo-common-x86_64-$(uname -r).rpm

第三步:验证

1
2
3
4
5
6
7
8
9
10
11
12
13
stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'

#-------------------------↓↓↓↓↓↓-------------------------
Pass 1: parsed user script and 473 library scripts using 271776virt/69060res/3500shr/65708data kb, in 680usr/60sys/890real ms.
Pass 2: analyzed script: 1 probe, 1 function, 7 embeds, 0 globals using 436952virt/232648res/4856shr/230884data kb, in 2560usr/760sys/3456real ms.
Pass 3: translated to C into "/tmp/stapYnJEvY/stap_0969603f9a0fb68895de95cd2ffea0a4_2770_src.c" using 436952virt/232904res/5112shr/230884data kb, in 10usr/80sys/86real ms.
Pass 4: compiled C into "stap_0969603f9a0fb68895de95cd2ffea0a4_2770.ko" in 8930usr/1690sys/10746real ms.
Pass 5: starting run.
ERROR: module version mismatch (#1 SMP Tue Oct 30 14:13:26 CDT 2018 vs #1 SMP Thu Nov 8 23:39:32 UTC 2018), release 3.10.0-957.el7.x86_64
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run completed in 20usr/40sys/271real ms.
Pass 5: run failed. [man error::pass5]
#-------------------------↑↑↑↑↑↑-------------------------

可以看到报错信息ERROR: module version mismatch (#1 SMP Tue Oct 30 14:13:26 CDT 2018 vs #1 SMP Thu Nov 8 23:39:32 UTC 2018), release 3.10.0-957.el7.x86_64,这是由于compile.h文件中的时间与uname -a中的时间不一致

其中,compile.h的文件路径为/usr/src/kernels/$(uname -r)/include/generated/compile.h,我们将该文件中的时间修改为uname -a中的时间信息。编辑compile.h文件,将#define UTS_VERSION "#1 SMP Tue Oct 30 14:13:26 CDT 2018"修改为#define UTS_VERSION "#1 SMP Thu Nov 8 23:39:32 UTC 2018"

再次尝试验证

1
2
3
4
5
6
7
8
9
10
11
12
13
stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'

#-------------------------↓↓↓↓↓↓-------------------------
Pass 1: parsed user script and 473 library scripts using 271696virt/69056res/3500shr/65628data kb, in 640usr/40sys/687real ms.
Pass 2: analyzed script: 1 probe, 1 function, 7 embeds, 0 globals using 436944virt/230840res/4804shr/230876data kb, in 1930usr/440sys/2376real ms.
Pass 3: using cached /root/.systemtap/cache/09/stap_0969603f9a0fb68895de95cd2ffea0a4_2770.c
Pass 4: using cached /root/.systemtap/cache/09/stap_0969603f9a0fb68895de95cd2ffea0a4_2770.ko
Pass 5: starting run.
ERROR: module version mismatch (#1 SMP Tue Oct 30 14:13:26 CDT 2018 vs #1 SMP Thu Nov 8 23:39:32 UTC 2018), release 3.10.0-957.el7.x86_64
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run completed in 0usr/40sys/259real ms.
Pass 5: run failed. [man error::pass5]
#-------------------------↑↑↑↑↑↑-------------------------

发现还是报同样的信息,这是因为我们的修改尚未生效,系统读取的是缓存数据,将Pass 3Pass 4中提到的两个缓存文件删除,再重新执行即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
rm -f /root/.systemtap/cache/09/stap_0969603f9a0fb68895de95cd2ffea0a4_2770.c
rm -f /root/.systemtap/cache/09/stap_0969603f9a0fb68895de95cd2ffea0a4_2770.ko

stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'

#-------------------------↓↓↓↓↓↓-------------------------
Pass 1: parsed user script and 473 library scripts using 271696virt/69056res/3500shr/65628data kb, in 660usr/40sys/699real ms.
Pass 2: analyzed script: 1 probe, 1 function, 7 embeds, 0 globals using 436944virt/230052res/4804shr/230876data kb, in 1920usr/400sys/2333real ms.
Pass 3: translated to C into "/tmp/stappTXBiJ/stap_0969603f9a0fb68895de95cd2ffea0a4_2770_src.c" using 436944virt/230316res/5068shr/230876data kb, in 10usr/80sys/88real ms.
Pass 4: compiled C into "stap_0969603f9a0fb68895de95cd2ffea0a4_2770.ko" in 1540usr/370sys/1927real ms.
Pass 5: starting run.
read performed
Pass 5: run completed in 10usr/50sys/327real ms.
#-------------------------↑↑↑↑↑↑-------------------------

3.2 Syntax

3.2.1 Probe Types

  1. begin:探测开始的地方
  2. end:探测结束的地方
  3. kernel.function("sys_open"):指定系统调用的入口处
  4. syscall.close.return:系统调用close返回处
  5. module("ext3").statement(0xdeadbeef):文件系统ext3驱动的指定位置
  6. timer.ms(200):定时器,单位毫秒
  7. timer.profile:每个CPU时钟都会触发
  8. process("a.out").statement("*@main.c:200"):二进制程序a.out的200行的位置

3.3 Reference

4 ftrace

本小节转载摘录自Linux ftrace框架介绍及运用

在日常工作中,经常会需要对内核进行debug、或者进行优化工作。一些简单的问题,可以通过dmesg/printk查看,优化借助一些工具进行。但是当问题逻辑复杂,优化面宽泛的时候,往往无从下手。需要从上到下、模块到模块之间分析,这时候就不得不借助于Linux提供的静态(trace event)动态(各种tracer)工具进行分析。同时还不得不借助工具、或者编写脚本进行分析,以缩小问题范围、发现问题。简单的使用tracepoint已经不能满足需求,因此就花点精力进行梳理

ftracefunction trace的意思,最开始主要用于记录内核函数运行轨迹,随着功能的逐渐增加,演变成一个跟踪框架。包含了静态tracepoint,针对不同subsystem提供一个目录进行开关。还包括不同的动态跟踪器,functionfunction_graphwakeup等等

ftrace的帮助文档在Documentation/traceftrace代码主要在kernel/traceftrace相关头文件在include/trace

4.1 ftrace Framework

整个ftrace框架可以分为几部分:

  1. ftrace核心框架:整个ftrace功能的纽带,包括对内和的修改,tracer的注册,ring的控制等等
  2. ring buffer:静态动态ftrace的载体
  3. debugfs:提供了用户空间对ftrace设置接口
  4. tracepoint:静态trace
    • 他需要提前编译进内核
    • 可以定制打印内容,自由添加
    • 内核对主要subsystem提供了tracepoint
  5. tracer:包含如下几类
    • 函数类functionfunction_graphstack
    • 延时类irqsoffpreemptoffpreemptirqsoffwakeupwakeup_rtwaktup_dl
    • 其他nopmmiotraceblk

4.2 How to use ftrace

/sys/kernel/debug/tracing目录下提供了ftrace的设置和属性接口,对ftrace的配置可以通过echo。了解每个文件的作用和如何设置对于理解整个ftrace框架很有作用

kernel很贴心的在这个目录下准备了一个README文档,查看这个文档就可以看到所有文件的使用方式和具体含义

1
cat /sys/kernel/debug/tracing/README

通用配置:

  • available_tracers当前编译及内核的跟踪器列表,current_tracer必须是这里面支持的跟踪器
  • current_tracer:用于设置或者显示当前使用的跟踪器列表。系统启动缺省值为nop,使用echo将跟踪器名字写入即可打开。可以通过写入nop重置跟踪器
  • buffer_size_kb:用于设置单个CPU所使用的跟踪缓存的大小。跟踪缓存为ring buffer形式,如果跟踪太多,旧的信息会被新的跟踪信息覆盖掉。需要先将current_trace设置为nop才可以
  • buffer_total_size_kb:显示所有的跟踪缓存大小,不同之处在于buffer_size_kb是单个CPU的,buffer_total_size_kb是所有CPU的和
  • free_buffer:此文件用于在一个进程被关闭后,同时释放ring buffer内存,并将调整大小到最小值
  • hwlat_detector/
  • instances/:创建不同的trace buffer实例,可以在不同的trace buffers中分开记录
  • tracing_cpumask:可以通过此文件设置允许跟踪特定CPU,二进制格式
  • per_cpu:CPU相关的trace信息,包括statstracetrace_pipetrace_pipe_raw
  • printk_formats:提供给工具读取原始格式trace的文件
  • saved_cmdlines:存放pid对应的comm名称作为ftracecache,这样ftrace中不光能显示pid还能显示comm
  • saved_cmdlines_sizesaved_cmdlines的数目
  • snapshot:是对trace的snapshot
    • echo 0:清空缓存,并释放对应内存
    • echo 1:进行对当前trace进行snapshot,如没有内存则分配
    • echo 2:清空缓存,不释放也不分配内存
  • trace:查看获取到的跟踪信息的接口,echo > trace可以清空当前ring buffer
  • trace_pipe:输出和trace一样的内容,但是此文件输出trace同时将ring buffer中的内容删除,这样就避免了ring buffer的溢出。可以通过cat trace_pipe > trace.txt &保存文件
  • trace_clock:显示当前tracetimestamp所基于的时钟,默认使用local时钟
    • local:默认时钟;可能无法在不同CPU间同步
    • global:不同CUP间同步,但是可能比local慢
    • counter:这是一个跨CPU计数器,需要分析不同CPU间event顺序比较有效
  • trace_marker:从用户空间写入标记到trace中,用于用户空间行为和内核时间同步
  • trace_marker_raw:以二进制格式写入到trace中
  • trace_options:控制trace打印内容或者操作跟踪器,可以通过trace_options添加很多附加信息
  • optionstrace选项的一系列文件,和trace_options对应
  • trace_stat/:每个CPU的trace统计信息
  • tracing_max_latency:记录tracer的最大延时
  • tracing_on:用于控制跟踪打开或停止
    • echo 0:停止跟踪
    • echo 1:继续跟踪
  • tracing_thresh:延时记录trace的阈值,当延时超过此值时才开始记录trace。单位是ms,只有非0才起作用

events配置:

  • available_events:列出系统中所有可用的trace events,分两个层级,用冒号隔开
  • events/:系统trace events目录,在每个events下面都有enablefilterfotmatenable是开关;formatevents的格式,然后根据格式设置filter
  • set_event:将trace events名称直接写入set_event就可以打开
  • set_event_pid:指定追踪特定进程的events

function配置:

  • available_filter_functions:记录了当前可以跟踪的内核函数,不在该文件中列出的函数,无法跟踪其活动
  • dyn_ftrace_total_info:显示available_filter_functins中跟中函数的数目,两者一致
  • enabled_functions:显示有回调附着的函数名称
  • function_profile_enabled:打开此选项,在trace_stat中就会显示function的统计信息
  • set_ftrace_filter:用于显示指定要跟踪的函数
  • set_ftrace_notrace:用于指定不跟踪的函数,缺省为空
  • set_ftrace_pid:用于指定要追踪特定进程的函数

function graph配置:

  • max_graph_depth:函数嵌套的最大深度
  • set_graph_function:设置要清晰显示调用关系的函数,在使用function_graph跟踪器是使用,缺省对所有函数都生成调用关系
  • set_graph_notrace:不跟踪特定的函数嵌套调用

Stack trace设置:

  • stack_max_size:当使用stack跟踪器时,记录产生过的最大stack size
  • stack_trace:显示stackback trace
  • stack_trace_filter:设置stack tracer不检查的函数名称

4.3 trace-cmd

1
2
3
4
5
6
# 该命令会在当前目录下生成一个trace.dat文件
# (注意,不要在/sys/kernel/debug/tracing这个目录下使用这个命令,因为无法在这些目录中创建trace.dat文件)
trace-cmd record -e irq

# 该命令会分析当前目录下的trace.dat文件
trace-cmd report

4.4 Reference

5 dump

5.1 kdump

如何模拟内核crash?执行下面这个命令即可

1
2
# 执行完后,会在/var/crash目录下生成dump文件,并会重启机器
echo c > /proc/sysrq-trigger

下载分析crash文件所需的rpm包

1
2
3
4
5
6
7
8
9
10
# 首先,我们需要下载带有完整调试信息的内核映像文件(编译时带-g选项),内核调试信息包kernel-debuginfo有两个
# 1. kernel-debuginfo
# 2. kernel-debuginfo-common

wget http://debuginfo.centos.org/7/x86_64/kernel-debuginfo-common-x86_64-`uname -r`.rpm
wget http://debuginfo.centos.org/7/x86_64/kernel-debuginfo-`uname -r`.rpm

# 安装,之后我们就可以在/lib/debug/lib/modules/`uname -r`目录下看到vmlinux内核映像文件
rpm -ivh *.rpm
ll /lib/debug/lib/modules/`uname -r`

如何分析系统crash文件

1
crash /lib/debug/lib/modules/`uname -r`/vmlinux /var/crash/127.0.0.1-2021-07-24-22\:59\:34/vmcore
  • bt:backtrace打印内核栈回溯信息,bt pid打印指定进程栈信息
    • 最重要的是RIP信息,指出了发生crash的function以及offset
  • log:打印vmcore所在的系统内核日志信息
  • dis:反汇编出指令所在代码开始,dis -l (function+offset),其中function+offset可以是bt中RIP对应的信息
    • 示例:dis -l sysrq_handle_crash+22
  • mod:查看当时内核加载的所有内核模块信息
  • sym:将地址转换为符号信息,其中地址可以是bt中RIP对应的信息
    • 示例:sym ffffffff8d26d9b6
  • ps:打印内核崩溃时,正常的进程信息
  • filesfiles pid打印指定进程所打开的文件信息
  • vmvm pid打印某指定进程当时虚拟内存基本信息
  • task:查看当前进程或指定进程task_structthread_info的信息
  • kmem:查看当时系统内存使用信息
  • 上述命令的详细用法可以通过help <cmd>
  • 其他命令可以通过help查看

5.2 coredump

core dump又叫核心转储,当程序运行过程中发生异常,程序异常退出时, 由操作系统把程序当前的内存状况存储在一个core文件中,叫core dump

产生core dump的可能原因

  1. 内存访问越界
  2. 多线程程序使用了线程不安全的函数
  3. 多线程读写的数据未加锁保护
  4. 非法指针
  5. 使用空指针
  6. 随意使用指针转换
  7. 堆栈溢出

core dump相关的配置项

  • ulimit -c:若是0,则不支持,可以通过ulimit -c unlimited或者ulimit -c <size>来开启
    • 或者通过编辑/etc/security/limits.conf文件来使配置永久生效
    • echo "* soft core unlimited" >> /etc/security/limits.conf
    • echo "* hard core unlimited" >> /etc/security/limits.conf
  • /proc/sys/kernel/core_patterncore dump的存储路径
    • echo "core" > /proc/sys/kernel/core_pattern:默认是core,若程序产生core dump,那么其存放路径位于当前路径
    • echo "core.%e.%p" > /proc/sys/kernel/core_pattern:若程序产生core dump,那么其存放路径位于当前路径。其中%e表示二进制名称,%p表示进程id(或线程名)
    • echo "/data/coredump/core.%e.%p" > /proc/sys/kernel/core_pattern:也可以使用绝对路径,统一存放到指定目录
  • /proc/sys/kernel/core_pipe_limit
  • /proc/sys/kernel/core_uses_pid:如果这个文件的内容被配置成1,那么即使core_pattern中没有设置%p,最后生成的core dump文件名仍会加上进程id

MacOS平台下,相关的配置项

  • ulimit -c:若是0,则不支持,可以通过ulimit -c unlimited或者ulimit -c <size>来开启
  • sudo sysctl -w kern.corefile=core.%N.%P
    • 默认的配置是/cores/core.%P,无法正常工作

如何分析

1
2
# 其中可执行程序<binary>需要通过-g参数编译而来,这样会带上debug信息,才能分析core dump文件
gdb <binary> <core dump file>

5.3 Reference

6 Source Code Analysis

6.1 syscall

系统调用的声明位于include/linux/syscall.h文件中,但是通过vs code等文本编辑工具无法跳转到定义处,这是因为系统调用的定义使用了非常多的宏

如何找到系统调用的定义:举个例子,对于系统调用open,它有3个参数,那么就全局搜索SYSCALL_DEFINE3(open;对于系统调用openat,它有4个参数,那么就全局搜索SYSCALL_DEFINE4(openat

6.1.1 Reference

6.2 network

6.2.1 tcp

socket对应的file_operations对象为socket_file_ops
tcp对应的proto_ops对象为inet_stream_ops

6.2.1.1 create socket

1
2
3
4
5
6
sys_socket | net/socket.c SYSCALL_DEFINE3(socket
sock_create | net/socket.c
__sock_create | net/socket.c
pf->create | net/socket.c
⬇️ socket --> af_inet
inet_create | net/ipv4/af_inet.c

6.2.1.2 write socket

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# syscall
sys_write | fs/read_write.c SYSCALL_DEFINE3(write
vfs_write | fs/read_write.c
do_sync_write | fs/read_write.c
filp->f_op->aio_write
⬇️ file --> socket
# socket
socket_file_ops.aio_write ==> sock_aio_write | net/socket.c
do_sock_write | net/socket.c
__sock_sendmsg | net/socket.c
__sock_sendmsg_nosec | net/socket.c
sock->ops->sendmsg
⬇️ socket --> inet_stream
# inet_stream
inet_stream_ops.sendmsg ==> inet_sendmsg | net/ipv4/af_inet.c
sk->sk_prot->sendmsg
⬇️ inet_stream --> tcp
# tcp
tcp_prot.sendmsg ==> tcp_sendmsg | net/ipv4/tcp.c
tcp_push | net/ipv4/tcp.c
__tcp_push_pending_frames | net/ipv4/tcp_output.c
tcp_write_xmit | net/ipv4/tcp_output.c
tcp_transmit_skb | net/ipv4/tcp_output.c
icsk->icsk_af_ops->queue_xmit
⬇️ tcp -> ip
# ip
ipv4_specific.queue_xmit ==> ip_queue_xmit | net/ipv4/tcp_ipv4.c
ip_local_out | net/ipv4/ip_output.c
dst_output | include/net/dst.h
skb_dst(skb)->output(skb) ==> ip_output | net/ipv4/ip_output.c
ip_finish_output | net/ipv4/ip_output.c
ip_finish_output2 | net/ipv4/ip_output.c
dst_neigh_output | include/net/dst.h
⬇️ ip -> dev(layer 2)
# link
dev_queue_xmit | net/core/dev.c
dev_hard_start_xmit | net/core/dev.c
ops->ndo_start_xmit
⬇️ dev --> driver
# dev driver
e1000_netdev_ops.ndo_start_xmit ==> e1000_xmit_frame | drivers/net/ethernet/intel/e1000/e1000_main.c

6.2.1.3 read socket

数据从网卡设备流入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# link
deliver_skb | net/core/dev.c
pt_prev->func
⬇️ dev --> ip
# ip
ip_packet_type.func ==> ip_rcv | net/ipv4/ip_input.c
ip_rcv_finish | net/ipv4/ip_input.c
dst_input | include/net/dst.h
skb_dst(skb)->input ==> ip_local_deliver | net/ipv4/ip_input.c
ip_local_deliver_finish | net/ipv4/ip_input.c
ipprot->handler
⬇️ ip --> tcp
# tcp
tcp_protocol.handler ==> tcp_v4_rcv | net/ipv4/tcp_ipv4.c
tcp_v4_do_rcv | net/ipv4/tcp_ipv4.c
tcp_rcv_established | net/ipv4/tcp_ipv4.c

通过系统调用阻塞读取到达的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# syscall
sys_read | fs/read_write.c SYSCALL_DEFINE3(read
vfs_read | fs/read_write.c
do_sync_read | fs/read_write.c
filp->f_op->aio_read
⬇️ file --> socket
# socket
socket_file_ops.aio_read ==> sock_aio_read | net/socket.c
do_sock_read | net/socket.c
__sock_recvmsg | net/socket.c
__sock_recvmsg_nosec | net/socket.c
sock->ops->recvmsg
⬇️ socket --> inet_stream
# inet_stream
inet_stream_ops.recvmsg ==> inet_recvmsg | net/ipv4/af_inet.c
sk->sk_prot->recvmsg
⬇️ inet_stream --> tcp
# tcp
tcp_prot.recvmsg ==> tcp_recvmsg | net/ipv4/tcp_ipv4.c

6.2.2 ip

1
2
3
4
5
net/ipv4/ip_input.c
ip_rcv
NF_HOOK
NF_HOOK_THRESH
nf_hook_thresh

6.2.3 Reference

6.3 file

6.3.1 Reference

linux文件系统四 VFS数据读取vfs_read

7 Assorted

7.1 哪里下载rpm包