[正文内容]

在全新安装的 Debian 12 (bookworm) 系统上,执行 echo $PATH 会显示以下输出:

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

/usr/bin 包含了 /usr/bin/cat,并且 /usr/bin 也在 PATH 中,所以直接输入 cat 就能运行 /usr/bin/cat。但具体是谁在进行这个查找呢?

通过运行 strace cat,可以看到 Linux 系统调用:

execve("/usr/bin/cat", ["cat"], 0x7ffdfb2367a0 /* 63 vars */) = 0

Linux 内核已经拥有完整的路径 (/usr/bin/cat),那么 /usr/bin 是从哪里来的呢?

阅读源码

在 Debian 上,使用 /bin/sh 的 shell 脚本会使用 dash,它负责解释和执行命令。在包含入口点的 main.c 文件中:

/*
 * Main routine. We initialize things, parse the arguments, execute
 * profiles if we're a login shell, and then call cmdloop to execute
 * commands. The setjmp call sets up the location to jump to when an
 * exception occurs. When an exception occurs the variable "state"
 * is used to figure out how far we had gotten.
 */
// ...
static int
cmdloop(int top)
{
  // ...
  for (;;) {
    // ...
    n = parsecmd(inter);
    // ...
    i = evaltree(n, 0);
  }
}

evaltree (位于 eval.c 中) 负责执行命令:

int
evaltree(union node *n, int flags)
{
  // ...
  case NCMD:
  		evalfn = evalcommand;
checkexit:
  		checkexit = EV_TESTED;
  		goto calleval;
  // ...
calleval:
  		status = evalfn(n, flags);
  		break;

然后,当命令只是执行一个程序时,evalcommand 作为最后一步被使用:

STATIC int
// ...
evalcommand(union node *cmd, int flags, struct backcmd *backcmd)
{
  // ...
  default:
  		flush_input();
  		/* Fork off a child process if necessary. */
  		if (!(flags & EV_EXIT) || have_traps()) {
  			INTOFF;
  			jp = vforkexec(cmd, argv, path, cmdentry.u.index);
  			break;
  		}
  		shellexec(argv, path, cmdentry.u.index);
  // ...
}

shellexec (位于 exec.c 中) 调用 padvance

void
shellexec(char **argv, const char *path, int idx)
{
  // ...
  while (padvance(&path, argv[0]) >= 0) {
			cmdname = stackblock();
			if (--idx < 0 && pathopt == NULL) {
				tryexec(cmdname, argv, envp);
				if (errno != ENOENT && errno != ENOTDIR)
					e = errno;
			}
		}
  // ...
}

但是 padvance 是什么呢?进一步查看 exec.c

/*
 * Do a path search. The variable path (passed by reference) should be
 * set to the start of the path before the first call; padvance will update
 * this value as it proceeds. Successive calls to padvance will return
 * the possible path expansions in sequence. If an option (indicated by
 * a percent sign) appears in the path entry then the global variable
 * pathopt will be set to point to it; otherwise pathopt will be set to
 * NULL.
 *
 * If magic is 0 then pathopt recognition will be disabled. If magic is
 * 1 we shall recognise %builtin/%func. Otherwise we shall accept any
 * pathopt.
 */
const char *pathopt;
int padvance_magic(const char **path, const char *name, int magic)
{

原来是 shell,而不是 Linux 内核,负责在 PATH 中搜索可执行文件!

其他代码呢?

Python 的 subprocess 可以这样使用:

subprocess.run(["ls", "-l"])

这会调用 /usr/bin/ls,因为 /usr/binPATH 中。但谁在进行路径查找?

CPython 包含以下 subprocess 代码:

# This matches the behavior of os._execvpe().
executable_list = tuple(
  os.path.join(os.fsencode(dir), executable)
  for dir in os.get_exec_path(env))

这段代码在调用 Linux 的 execve 之前,直接在 Python 中搜索 PATH

Go 语言类似,lp_unix.go 包含它自己的实现来搜索 PATH

// LookPath searches for an executable named file in the
// directories named by the PATH environment variable.
// If file contains a slash, it is tried directly and the PATH is not consulted.
// Otherwise, on success, the result is an absolute path.
//
// In older versions of Go, LookPath could return a path relative to the current directory.
// As of Go 1.19, LookPath will instead return that path along with an error satisfying
// [errors.Is](err, [ErrDot]). See the package documentation for more details.
func LookPath(file string) (string, error) {

Rust 的 Command::spawn 最终调用 libc::execvp,它会搜索 PATH

/* Execute FILE, searching in the `PATH' environment variable if it contains
  no slashes, with arguments ARGV and environment from `environ'. */
int
execvp (file, argv)
   const char *file;
   char *const argv[];
{

事实上,Linux 根本不知道 PATH! 使用 shebang 在可执行文本文件中指定程序需要绝对路径:

#!/bin/sh

这是有效的,但是

#!sh

是无效的。 这也是许多程序使用以下技巧的原因:

#!/usr/bin/env python
print('Hello world')

因为 /usr/bin/env 调用 execvp,它将搜索 PATH