Linux 上 `PATH` 环境变量的真相
[正文内容]
在全新安装的 Debian 12 (bookworm) 系统上,执行 echo $PATH
会显示以下输出:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/usr/bin
包含了 /usr/bin/cat
,并且 /usr/bin
也在 PATH
中,所以直接输入 cat
就能运行 /usr/bin/cat
。但具体是谁在进行这个查找呢?
通过运行 strace cat
,可以看到 Linux 系统调用:
execve("/usr/bin/cat", ["cat"], 0x7ffdfb2367a0 /* 63 vars */) = 0
Linux 内核已经拥有完整的路径 (/usr/bin/cat
),那么 /usr/bin
是从哪里来的呢?
阅读源码
在 Debian 上,使用 /bin/sh
的 shell 脚本会使用 dash,它负责解释和执行命令。在包含入口点的 main.c
文件中:
/*
* Main routine. We initialize things, parse the arguments, execute
* profiles if we're a login shell, and then call cmdloop to execute
* commands. The setjmp call sets up the location to jump to when an
* exception occurs. When an exception occurs the variable "state"
* is used to figure out how far we had gotten.
*/
// ...
static int
cmdloop(int top)
{
// ...
for (;;) {
// ...
n = parsecmd(inter);
// ...
i = evaltree(n, 0);
}
}
evaltree
(位于 eval.c
中) 负责执行命令:
int
evaltree(union node *n, int flags)
{
// ...
case NCMD:
evalfn = evalcommand;
checkexit:
checkexit = EV_TESTED;
goto calleval;
// ...
calleval:
status = evalfn(n, flags);
break;
然后,当命令只是执行一个程序时,evalcommand
作为最后一步被使用:
STATIC int
// ...
evalcommand(union node *cmd, int flags, struct backcmd *backcmd)
{
// ...
default:
flush_input();
/* Fork off a child process if necessary. */
if (!(flags & EV_EXIT) || have_traps()) {
INTOFF;
jp = vforkexec(cmd, argv, path, cmdentry.u.index);
break;
}
shellexec(argv, path, cmdentry.u.index);
// ...
}
shellexec
(位于 exec.c
中) 调用 padvance
:
void
shellexec(char **argv, const char *path, int idx)
{
// ...
while (padvance(&path, argv[0]) >= 0) {
cmdname = stackblock();
if (--idx < 0 && pathopt == NULL) {
tryexec(cmdname, argv, envp);
if (errno != ENOENT && errno != ENOTDIR)
e = errno;
}
}
// ...
}
但是 padvance
是什么呢?进一步查看 exec.c
:
/*
* Do a path search. The variable path (passed by reference) should be
* set to the start of the path before the first call; padvance will update
* this value as it proceeds. Successive calls to padvance will return
* the possible path expansions in sequence. If an option (indicated by
* a percent sign) appears in the path entry then the global variable
* pathopt will be set to point to it; otherwise pathopt will be set to
* NULL.
*
* If magic is 0 then pathopt recognition will be disabled. If magic is
* 1 we shall recognise %builtin/%func. Otherwise we shall accept any
* pathopt.
*/
const char *pathopt;
int padvance_magic(const char **path, const char *name, int magic)
{
原来是 shell,而不是 Linux 内核,负责在 PATH
中搜索可执行文件!
其他代码呢?
Python 的 subprocess
可以这样使用:
subprocess.run(["ls", "-l"])
这会调用 /usr/bin/ls
,因为 /usr/bin
在 PATH
中。但谁在进行路径查找?
CPython 包含以下 subprocess
代码:
# This matches the behavior of os._execvpe().
executable_list = tuple(
os.path.join(os.fsencode(dir), executable)
for dir in os.get_exec_path(env))
这段代码在调用 Linux 的 execve
之前,直接在 Python 中搜索 PATH
。
Go 语言类似,lp_unix.go
包含它自己的实现来搜索 PATH
:
// LookPath searches for an executable named file in the
// directories named by the PATH environment variable.
// If file contains a slash, it is tried directly and the PATH is not consulted.
// Otherwise, on success, the result is an absolute path.
//
// In older versions of Go, LookPath could return a path relative to the current directory.
// As of Go 1.19, LookPath will instead return that path along with an error satisfying
// [errors.Is](err, [ErrDot]). See the package documentation for more details.
func LookPath(file string) (string, error) {
Rust 的 Command::spawn
最终调用 libc::execvp
,它会搜索 PATH
:
/* Execute FILE, searching in the `PATH' environment variable if it contains
no slashes, with arguments ARGV and environment from `environ'. */
int
execvp (file, argv)
const char *file;
char *const argv[];
{
事实上,Linux 根本不知道 PATH
! 使用 shebang 在可执行文本文件中指定程序需要绝对路径:
#!/bin/sh
这是有效的,但是
#!sh
是无效的。 这也是许多程序使用以下技巧的原因:
#!/usr/bin/env python
print('Hello world')
因为 /usr/bin/env
调用 execvp
,它将搜索 PATH
。