一起分析Linux經典技巧之程序ID號

本篇文章給大家帶來了linux中程序ID號分析的相關知識，Linux程序總是會分配一個號碼用於在其名稱空間中唯一地標識它們。該號碼被稱作程序ID號，簡稱PID，下面就一起來看一下相關問題，希望對大家有幫助。

本文中的程式碼摘自 Linux核心5.15.13版本。

Linux程序總是會分配一個號碼用於在其名稱空間中唯一地標識它們。該號碼被稱作程序ID號，簡稱PID。用fork或clone產生的每個程序都由核心自動地分配了一個新的唯一的PID值。

一、程序ID

1.1、其他ID

每個程序除了PID這個特徵值之外，還有其他的ID。有下列幾種可能的型別

1、處於某個執行緒組（在一個程序中，以標誌CLONE_THREAD來呼叫clone建立的該程序的不同的執行上下文，我們在後文會看到）中的所有程序都有統一的執行緒組ID（ TGID）。如果程序沒有使用執行緒，則其PID和TGID相同。執行緒組中的主程序被稱作組長（ group leader）。通過clone建立的所有執行緒的task_struct的group_leader成員，會指向組長的task_struct範例。

2、另外，獨立程序可以合併成行程群組（使用setpgrp系統呼叫）。行程群組成員的task_struct的pgrp屬性值都是相同的，即行程群組組長的PID。行程群組簡化了向組的所有成員傳送訊號的操作，這對於各種系統程式設計應用（參見系統程式設計方面的文獻，例如［ SR05］）是有用的。請注意，用管道連線的程序包含在同一個行程群組中。

3、幾個行程群組可以合併成一個對談。對談中的所有程序都有同樣的對談ID，儲存在task_struct的session成員中。 SID可以使用setsid系統呼叫設定。它可以用於終端程式設計。

1.2、全域性ID和區域性ID

名空間增加了PID管理的複雜性。 PID名稱空間按層次組織。在建立一個新的名稱空間時，該名稱空間中的所有PID對父名稱空間都是可見的，但子名稱空間無法看到父名稱空間的PID。但這意味著某些程序具有多個PID，凡可以看到該程序的名稱空間，都會為其分配一個PID。這必須反映在資料結構中。我們必須區分區域性ID和全域性ID。

1、全域性ID是在核心本身和初始名稱空間中的唯一ID號，在系統啟動期間開始的init程序即屬於初始名稱空間。對每個ID型別，都有一個給定的全域性ID，保證在整個系統中是唯一的。

2、區域性ID屬於某個特定的名稱空間，不具備全域性有效性。對每個ID型別，它們在所屬的名稱空間內部有效，但型別相同、值也相同的ID可能出現在不同的名稱空間中。

1.3、ID實現

全域性PID和TGID直接儲存在task_struct中，分別是task_struct的pid和tgid成員，在sched.h檔案裡：

struct task_struct {...pid_t pid;pid_t tgid;...}

這兩項都是pid_t型別，該型別定義為__kernel_pid_t，後者由各個體系結構分別定義。通常定義為int，即可以同時使用232個不同的ID。

二、管理PID

一個小型的子系統稱之為PID分配器（ pid allocator）用於加速新ID的分配。此外，核心需要提供輔助函數，以實現通過ID及其型別查詢程序的task_struct的功能，以及將ID的核心表示形式和使用者空間可見的數值進行轉換的功能。

2.1、PID名稱空間的表示方式

在pid_namespace.h檔案內有如下定義：

struct pid_namespace {
	struct idr idr;
	struct rcu_head rcu;
	unsigned int pid_allocated;
	struct task_struct *child_reaper;
	struct kmem_cache *pid_cachep;
	unsigned int level;
	struct pid_namespace *parent;#ifdef CONFIG_BSD_PROCESS_ACCT
	struct fs_pin *bacct;#endif
	struct user_namespace *user_ns;
	struct ucounts *ucounts;
	int reboot;	/* group exit code if this pidns was rebooted */
	struct ns_common ns;} __randomize_layout;

每個PID名稱空間都具有一個程序，其發揮的作用相當於全域性的init程序。 init的一個目的是對孤兒程序呼叫wait4，名稱空間區域性的init變體也必須完成該工作。 child_reaper儲存了指向該程序的task_struct的指標。

parent是指向父名稱空間的指標， level表示當前名稱空間在名稱空間層次結構中的深度。初始名稱空間的level為0，該名稱空間的子空間level為1，下一層的子空間level為2，依次遞推。level的計算比較重要，因為level較高的名稱空間中的ID，對level較低的名稱空間來說是可見的。從給定的level設定，核心即可推斷程序會關聯到多少個ID。

2.2、PID的管理

2.2.1、PID的資料結構

PID的管理圍繞兩個資料結構展開： struct pid是核心對PID的內部表示，而struct upid則表示特定的名稱空間中可見的資訊。兩個結構的定義在檔案pid.h內，分別如下：

/*
 * What is struct pid?
 *
 * A struct pid is the kernel's internal notion of a process identifier.
 * It refers to inpidual tasks, process groups, and sessions.  While
 * there are processes attached to it the struct pid lives in a hash
 * table, so it and then the processes that it refers to can be found
 * quickly from the numeric pid value.  The attached processes may be
 * quickly accessed by following pointers from struct pid.
 *
 * Storing pid_t values in the kernel and referring to them later has a
 * problem.  The process originally with that pid may have exited and the
 * pid allocator wrapped, and another process could have come along
 * and been assigned that pid.
 *
 * Referring to user space processes by holding a reference to struct
 * task_struct has a problem.  When the user space process exits
 * the now useless task_struct is still kept.  A task_struct plus a
 * stack consumes around 10K of low kernel memory.  More precisely
 * this is THREAD_SIZE + sizeof(struct task_struct).  By comparison
 * a struct pid is about 64 bytes.
 *
 * Holding a reference to struct pid solves both of these problems.
 * It is small so holding a reference does not consume a lot of
 * resources, and since a new struct pid is allocated when the numeric pid
 * value is reused (when pids wrap around) we don't mistakenly refer to new
 * processes.
 *//*
 * struct upid is used to get the id of the struct pid, as it is
 * seen in particular namespace. Later the struct pid is found with
 * find_pid_ns() using the int nr and struct pid_namespace *ns.
 */struct upid {
	int nr;
	struct pid_namespace *ns;};struct pid{
	refcount_t count;
	unsigned int level;
	spinlock_t lock;
	/* lists of tasks that use this pid */
	struct hlist_head tasks[PIDTYPE_MAX];
	struct hlist_head inodes;
	/* wait queue for pidfd notifications */
	wait_queue_head_t wait_pidfd;
	struct rcu_head rcu;
	struct upid numbers[1];};

對於struct upid， nr表示ID的數值， ns是指向該ID所屬的名稱空間的指標。所有的upid範例都儲存在一個雜湊表中。 pid_chain用核心的標準方法實現了雜湊溢位連結串列。struct pid的定義首先是一個參照計數器count。 tasks是一個陣列，每個陣列項都是一個雜湊表頭，對應於一個ID型別。這樣做是必要的，因為一個ID可能用於幾個程序。所有共用同一給定ID的task_struct範例，都通過該列表連線起來。 PIDTYPE_MAX表示ID型別的數目：

enum pid_type{
	PIDTYPE_PID,
	PIDTYPE_TGID,
	PIDTYPE_PGID,
	PIDTYPE_SID,
	PIDTYPE_MAX,};

2.2.2、PID與程序的聯絡

一個程序可能在多個名稱空間中可見，而其在各個名稱空間中的區域性ID各不相同。 level表示可以看到該程序的名稱空間的數目（換言之，即包含該程序的名稱空間在名稱空間層次結構中的深度），而numbers是一個upid範例的陣列，每個陣列項都對應於一個名稱空間。注意該陣列形式上只有一個陣列項，如果一個程序只包含在全域性名稱空間中，那麼確實如此。由於該陣列位於結構的末尾，因此只要分配更多的記憶體空間，即可向陣列新增附加的項。

由於所有共用同一ID的task_struct範例都按程序儲存在一個雜湊表中，因此需要在struct task_struct中增加一個雜湊表元素在sched.h檔案內程序的結構頭定義內有

struct task_struct {...
	/* PID/PID hash table linkage. */
	struct pid			*thread_pid;
	struct hlist_node		pid_links[PIDTYPE_MAX];
	struct list_head		thread_group;
	struct list_head		thread_node;...};

將task_struct連線到表頭在pid_links中的雜湊表上。

2.2.3、查詢PID

假如已經分配了struct pid的一個新範例，並設定用於給定的ID型別。它會如下附加到task_struct，在kernel/pid.c檔案內：

static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type){
	return (type == PIDTYPE_PID) ?
		&task->thread_pid :
		&task->signal->pids[type];}/*
 * attach_pid() must be called with the tasklist_lock write-held.
 */void attach_pid(struct task_struct *task, enum pid_type type){
	struct pid *pid = *task_pid_ptr(task, type);
	hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]);}

這裡建立了雙向連線： task_struct可以通過task_struct->pids[type]->pid存取pid範例。而從pid範例開始，可以遍歷tasks[type]雜湊表找到task_struct。 hlist_add_head_rcu是遍歷雜湊表的標準函數。

三、生成唯一的PID

除了管理PID之外，核心還負責提供機制來生成唯一的PID。為跟蹤已經分配和仍然可用的PID，核心使用一個大的點陣圖，其中每個PID由一個位元標識。 PID的值可通過對應位元在點陣圖中的位置計算而來。因此，分配一個空閒的PID，本質上就等同於尋找點陣圖中第一個值為0的位元，接下來將該位元設定為1。反之，釋放一個PID可通過將對應的位元從1切換為0來實現。在建立一個新程序時，程序可能在多個名稱空間中是可見的。對每個這樣的名稱空間，都需要生成一個區域性PID。這是在alloc_pid中處理的，在檔案kernel/pid.c內有：

struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
		      size_t set_tid_size){
	struct pid *pid;
	enum pid_type type;
	int i, nr;
	struct pid_namespace *tmp;
	struct upid *upid;
	int retval = -ENOMEM;

	/*
	 * set_tid_size contains the size of the set_tid array. Starting at
	 * the most nested currently active PID namespace it tells alloc_pid()
	 * which PID to set for a process in that most nested PID namespace
	 * up to set_tid_size PID namespaces. It does not have to set the PID
	 * for a process in all nested PID namespaces but set_tid_size must
	 * never be greater than the current ns->level + 1.
	 */
	if (set_tid_size > ns->level + 1)
		return ERR_PTR(-EINVAL);

	pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
	if (!pid)
		return ERR_PTR(retval);

	tmp = ns;
	pid->level = ns->level;

	for (i = ns->level; i >= 0; i--) {
		int tid = 0;

		if (set_tid_size) {
			tid = set_tid[ns->level - i];

			retval = -EINVAL;
			if (tid < 1 || tid >= pid_max)
				goto out_free;
			/*
			 * Also fail if a PID != 1 is requested and
			 * no PID 1 exists.
			 */
			if (tid != 1 && !tmp->child_reaper)
				goto out_free;
			retval = -EPERM;
			if (!checkpoint_restore_ns_capable(tmp->user_ns))
				goto out_free;
			set_tid_size--;
		}

		idr_preload(GFP_KERNEL);
		spin_lock_irq(&pidmap_lock);

		if (tid) {
			nr = idr_alloc(&tmp->idr, NULL, tid,
				       tid + 1, GFP_ATOMIC);
			/*
			 * If ENOSPC is returned it means that the PID is
			 * alreay in use. Return EEXIST in that case.
			 */
			if (nr == -ENOSPC)
				nr = -EEXIST;
		} else {
			int pid_min = 1;
			/*
			 * init really needs pid 1, but after reaching the
			 * maximum wrap back to RESERVED_PIDS
			 */
			if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS)
				pid_min = RESERVED_PIDS;

			/*
			 * Store a null pointer so find_pid_ns does not find
			 * a partially initialized PID (see below).
			 */
			nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
					      pid_max, GFP_ATOMIC);
		}
		spin_unlock_irq(&pidmap_lock);
		idr_preload_end();

		if (nr < 0) {
			retval = (nr == -ENOSPC) ? -EAGAIN : nr;
			goto out_free;
		}

		pid->numbers[i].nr = nr;
		pid->numbers[i].ns = tmp;
		tmp = tmp->parent;
	}

	/*
	 * ENOMEM is not the most obvious choice especially for the case
	 * where the child subreaper has already exited and the pid
	 * namespace denies the creation of any new processes. But ENOMEM
	 * is what we have exposed to userspace for a long time and it is
	 * documented behavior for pid namespaces. So we can't easily
	 * change it even if there were an error code better suited.
	 */
	retval = -ENOMEM;

	get_pid_ns(ns);
	refcount_set(&pid->count, 1);
	spin_lock_init(&pid->lock);
	for (type = 0; type < PIDTYPE_MAX; ++type)
		INIT_HLIST_HEAD(&pid->tasks[type]);

	init_waitqueue_head(&pid->wait_pidfd);
	INIT_HLIST_HEAD(&pid->inodes);

	upid = pid->numbers + ns->level;
	spin_lock_irq(&pidmap_lock);
	if (!(ns->pid_allocated & PIDNS_ADDING))
		goto out_unlock;
	for ( ; upid >= pid->numbers; --upid) {
		/* Make the PID visible to find_pid_ns. */
		idr_replace(&upid->ns->idr, pid, upid->nr);
		upid->ns->pid_allocated++;
	}
	spin_unlock_irq(&pidmap_lock);

	return pid;out_unlock:
	spin_unlock_irq(&pidmap_lock);
	put_pid_ns(ns);out_free:
	spin_lock_irq(&pidmap_lock);
	while (++i <= ns->level) {
		upid = pid->numbers + i;
		idr_remove(&upid->ns->idr, upid->nr);
	}

	/* On failure to allocate the first pid, reset the state */
	if (ns->pid_allocated == PIDNS_ADDING)
		idr_set_cursor(&ns->idr, 0);

	spin_unlock_irq(&pidmap_lock);

	kmem_cache_free(ns->pid_cachep, pid);
	return ERR_PTR(retval);}