【原創】xenomai核心解析-xenomai實時執行緒建立流程

版權宣告：本文為本文為博主原創文章，未經同意，禁止轉載。如有錯誤，歡迎指正，部落格地址：https://www.cnblogs.com/wsg1100/

問題概述
1 libCobalt中呼叫非實時POSIX介面
2 階段1 linux執行緒建立
3 階段2 Cobalt核心建立執行緒
4 總結

問題概述

3年前，在文章【原創】xenomai核心解析--雙核系統呼叫(一) 中我們提出了兩個問題：

雙核共存時，如何區分應用程式發起的系統呼叫是xenomai核心呼叫還是linux核心呼叫？

一個xenomai實時任務既可以呼叫xenomai核心服務，也可以呼叫linux核心服務，這是如何做到的？

本文通過分析原始碼為你解答問題1，對於問題2，涉及雙核間的排程，本文暫不涉及，後面的文章揭曉答案。

當時解答了問題1，本文將繼續探討雙核間的排程問題，重點分析pthread_creta()介面的底層實現。我們知道，一個xenomai任務既可以在cobalt核心中執行，也可以在linux核心中執行，這就要求兩個核心都有對應的排程實體來管理這個任務。那麼，pthread_creta()介面是如何建立這樣一個雙重身份的任務的呢？讓我們一起來揭開它的神祕面紗吧。

注意：本文是幾年前基於原始碼的分析記錄，質量可能會略差，因為它是原始碼分析時的流水記錄，沒有經過精心的整理和修改，所以可能存在一些不足之處。如果你只想看結論，可以直接跳到文章的最後部分。希望本文能對你有所啟發。

下面是與本文有上下文聯絡的文章，看完後應該會對xenomai任務管理有整體的認識：

【原創】xenomai核心解析--雙核系統呼叫(一) 該文章以X86處理器為例，解析了一個應用程式發起核心系統呼叫時，xenomai核心呼叫的流程。

【原創】xenomai核心解析--雙核系統呼叫(二)--應用如何區分xenomai/linux系統呼叫或服務該文分析了應用程式發起核心系統呼叫後，是如何區分一個介面是該linux提供服務還是xenomai提供服務。

1 libCobalt中呼叫非實時POSIX介面

xenomai通過標準POSIX API建立的實時任務來衍生自己的實時執行緒，因此，xenomai執行緒繼承了Linux任務在非關鍵時間模式下呼叫常規Linux服務的能力。
當升級到實時應用程式時，Linux任務將附加到稱為實時shadow的特殊xenomai擴充套件。一個實時shadow允許xenomai協同核心在實時模式下執行時，設定已配對的Linux任務。

拿POSIX標準函數來說，pthread_creta()不是一個系統呼叫，由NPTL（Native POSIX Threads Library）實現（NPTL是Linux 執行緒實現的現代版，由UlrichDrepper 和Ingo Molnar 開發，以取代LinuxThreads），NPTL負責一個使用者執行緒的使用者空間棧建立、記憶體分配、初始化等工作，與linux核心配合完成執行緒的建立。每一執行緒對映一個單獨的核心排程實體（KSE，Kernel Scheduling Entity）。核心分別對每個執行緒做排程處理。執行緒同步操作通過核心系統呼叫實現。

xenomai coblat作為實時任務的排程器，每個實時執行緒需要對應到 coblat排程實體，如果建立實時執行緒需要像linux那樣NPTL與linux 核心深度結合，那麼coblat與libcoblat實現將會變得很複雜，在這裡，xenomai使用了一種方式，由NPTL方式去完成實時執行緒實體的建立，在普通執行緒的基礎上附加一些屬性，對應到核心實體時能被實時核心排程。所以libcoblat庫中的實時執行緒建立函數pthread_creta最後還是需要使用 NPTL的pthread_creta函數，xenomai只是去擴充套件NPTL pthread_creta建立的執行緒，使這個執行緒在實時核心排程。

建立一個實時執行緒的時候，應用程式呼叫libcobalt實現的pthread_creta函數，做一些初始工作,libcobalt最後會去呼叫NPTL的pthread_creta來建立執行緒，同一個函數pthread_creta,三者之間是怎樣區分的？下面一一解析

以pthread_creta函數開始解析cobalt核心執行緒建立流程。pthread_creta在pthread.h檔案中定義如下：

COBALT_DECL(int, pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg));

COBALT_DECL宏在wrappers.h中如下,展開上面宏，會為pthread_create生成三個型別函數：

#define __WRAP(call)		__wrap_ ## call
#define __STD(call)		__real_ ## call
#define __COBALT(call)		__cobalt_ ## call
#define __RT(call)		__COBALT(call)
#define COBALT_DECL(T, P)	\
	__typeof__(T) __RT(P);	\
	__typeof__(T) __STD(P); \
	__typeof__(T) __WRAP(P)
	
int __cobalt_pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg);
int __wrap_pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg);
int __real_pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg);

這三種型別的pthread_create函數為:

__RT(P):__cobalt_pthread_create 表示Cobalt實現的POSIX函數

__STD(P):__real_pthread_create表示原始的POSIX函數（glibc實現)，cobalt庫內部用來呼叫原始的POSIX函數(glibc NPTL)

__WRAP(P)：__wrap_pthread_create是__cobalt_pthread_create 的弱別名，可以被覆蓋

最後一種，外部庫應提供其自己的__wrap_pthread_create()實現，來覆蓋Cobalt實現的pthread_create （）版本。原始的Cobalt實現仍可以參照為__COBALT（pthread_create ）。由宏COBALT_IMPL來定義別名：
#define COBALT_IMPL(T, I, A)								\
__typeof__(T) __wrap_ ## I A __attribute__((alias("__cobalt_" __stringify(I)), weak));	\
__typeof__(T) __cobalt_ ## I A

最後cobalt庫函數pthread_create主體為（xenomai3.0.8\lib\cobalt\thread.c）：

COBALT_IMPL(int, pthread_create, (pthread_t *ptid_r,
				  const pthread_attr_t *attr,
				  void *(*start) (void *), void *arg))
{
	pthread_attr_ex_t attr_ex;
	......
	return pthread_create_ex(ptid_r, &attr_ex, start, arg);
}

COBALT_IMPL定義了__cobalt_pthread_create 函數及該函數的一個弱別名__wrap_pthread_create,呼叫這兩個函數執行的是同一個函數體。

對於 NPTL函數pthread_create,在Cobalt庫裡被定義為__real_pthread_create，其實只是NPTL pthread_create的封裝，__real_pthread_create直接呼叫 NPTL pthread_create,在lib\cobalt\wrappers.c實現如下：

__weak
int __real_pthread_create(pthread_t *ptid_r,
			  const pthread_attr_t * attr,
			  void *(*start) (void *), void *arg)
{
	return pthread_create(ptid_r, attr, start, arg);
}

libcobalt呼叫NPTL的pthread_create完成執行緒建立時，使用_STD宏就可以，如下：

ret = __STD(pthread_create(&lptid, &attr, cobalt_thread_trampoline, &iargs));
if (ret) {
	__STD(sem_destroy(&iargs.sync));
	return ret;
}

2 階段1 linux執行緒建立

pthread_create 不是一個系統呼叫，是實時執行緒庫libcobalt的一個函數，為方便區分，對於一個POSIX函數 func，libCobalt實現的POSIX函數用__RT(func)表示，libc中的實現使用__STD(func)表示（xenomai3.0.8\lib\cobalt\thread.c）：

COBALT_IMPL(int, pthread_create, (pthread_t *ptid_r,
				  const pthread_attr_t *attr,
				  void *(*start) (void *), void *arg))
{
	pthread_attr_ex_t attr_ex;
	struct sched_param param;
	int policy;

	if (attr == NULL)
		attr = &default_attr_ex.std;

	memcpy(&attr_ex.std, attr, sizeof(*attr));
	pthread_attr_getschedpolicy(attr, &policy);
	attr_ex.nonstd.sched_policy = policy;
	pthread_attr_getschedparam(attr, &param);
	attr_ex.nonstd.sched_param.sched_priority = param.sched_priority;
	attr_ex.nonstd.personality = 0; /* Default: use Cobalt. */

	return pthread_create_ex(ptid_r, &attr_ex, start, arg);
}

首先處理的是執行緒的屬性引數attr。如果沒有傳入執行緒屬性，就取預設值。

attr_ex表示Cobalt執行緒的屬性，是pthread_attr_t 的擴充套件.

typedef struct pthread_attr_ex {
pthread_attr_t std;
struct {
	int personality;
	int sched_policy;
	struct sched_param_ex sched_param;
} nonstd;
} pthread_attr_ex_t;

根據執行緒屬性attr獲取Cobalt中對應的非標準policy。對排程引數也是同樣。儲存在attr_ex.nonstd中.attr_ex.nonstd.personality設定為0表示Cobalt.

根據attr獲取到擴充套件的attr_ex後，呼叫pthread_create_ex進一步處理，從這裡開始使用的都是attr_ex。那個標準的pthread_attr_t儲存在attr_ex.std中，使用者空間執行緒的建立還需要呼叫NTPL的pthread_create去完成，attr還需要用到。

為方便下面解析，說一下xenomai如何通過__STD（pthread_create）達到建立由Cobalt排程的執行緒的：首先通過__STD（pthread_create）建立一個普通執行緒，但其執行緒函數不是呼叫__RT(pthread_create)時傳入的start函數，而是xenomai的設計的cobalt_thread_trampoline，當__STD（pthread_create）結合linux建立出執行緒後，該執行緒得到執行時就會執行cobalt_thread_trampoline，再由cobalt_thread_trampoline發起Cobalt核心系統呼叫sc_cobalt_thread_create，來完成Cobalt實時執行緒建立，並在實時核心上排程，當系統呼叫返回後真正從start函數開始執行。

在pthread_create_ex()中,用於給cobalt_thread_trampoline傳遞引數的結構體變數為struct pthread_iargs iargs。

struct pthread_iargs {
	struct sched_param_ex param_ex;
	int policy; //排程策略
	int personality; //
	void *(*start)(void *);//執行緒執行函數
	void *arg;//函數引數指標
	int parent_prio;//父程序的優先順序
	sem_t sync;//執行緒建立完成同步訊號
	int ret;
};

在呼叫__STD（pthread_create）之前主要填充iargs成員變數，首先通過系統呼叫去獲取當前執行緒在Cobalt核中的擴充套件排程策略。

pthread_getschedparam_ex(pthread_self(), &iargs.policy, &iargs.param_ex);

int pthread_getschedparam_ex(pthread_t thread,
int *restrict policy_r,
struct sched_param_ex *restrict param_ex)
{
struct sched_param short_param;
int ret;

ret = -XENOMAI_SYSCALL3(sc_cobalt_thread_getschedparam_ex,
thread, policy_r, param_ex);
if (ret == ESRCH) {
ret = __STD(pthread_getschedparam(thread, policy_r, &short_param));
if (ret == 0)
param_ex->sched_priority = short_param.sched_priority;
}

return ret;

}

如果發起建立執行緒的已經是一個Cobalt的實時執行緒，那麼系統呼叫sc_cobalt_thread_getschedparam_ex會拷貝一份該任務的排程引數，否則這個任務只是普通的linux任務，就需要通過NTPL的pthread_getschedparam來獲取。

    iargs.start = start;
	iargs.arg = arg;
	iargs.ret = EAGAIN;
	__STD(sem_init(&iargs.sync, 0, 0));

	ret = __STD(pthread_create(&lptid, &attr, cobalt_thread_trampoline, &iargs));/*__STD 呼叫標準庫的函數*/
	if (ret) {
		__STD(sem_destroy(&iargs.sync));
		return ret;
	}
		__STD(clock_gettime(CLOCK_REALTIME, &timeout));
	timeout.tv_sec += 5;
	timeout.tv_nsec = 0;

	for (;;) {
		ret = __STD(sem_timedwait(&iargs.sync, &timeout));/*等待實時執行緒建立完成*/
		if (ret && errno == EINTR)
			continue;
		if (ret == 0) {
			ret = iargs.ret;
			if (ret == 0)
				*ptid_r = lptid;/*傳出執行緒ID*/
			break;
		} else if (errno == ETIMEDOUT) {
			ret = EAGAIN;
			break;
		}
		ret = -errno;
		panic("regular sem_wait() failed with %s", symerror(ret));
	}

	__STD(sem_destroy(&iargs.sync));/*銷燬號誌*/

	cobalt_thread_harden(); /* May fail if regular thread. */
	return ret;

先初始化同步訊號iargs.sync，當呼叫__STD（pthread_create）後父執行緒繼續執行，等待實時執行緒建立完畢，實時執行緒建立完成時會釋放iargs.sync號誌，並通過iargs.ret傳出返回值。

__STD(pthread_create(&lptid, &attr, cobalt_thread_trampoline, &iargs))先在使用者態分配執行緒棧後發起linux 的clone系統呼叫進行核心態排程實體建立。完成建立後核心發生排程，當該執行緒得到執行時，開始執行cobalt_thread_trampoline函數。另linux執行緒與程序建立流程區別如下（下圖來源於網路）;

3 階段2 Cobalt核心建立執行緒

I-pipe促進了實時核心細粒度的管理每執行緒，而不是每個程序。由於這個原因，實時核心至少應該實現一種機制，將常規任務轉換為具有擴充套件功能的實時執行緒，並將其繫結到Cobalt。

下面開始在cobalt核心建立實時執行緒排程實體。普通執行緒建立完成後，cobalt_thread_trampoline得到得到執行，根據傳入的iargs，進一步發起Cobalt核心系統呼叫,由於從root域發起系統呼叫，通過ipipe 慢速系統呼叫入口ipipe_syscall_hook()進入，檢查該系統呼叫的控制許可權，允許非實時任務從linux域直接呼叫，然後執行Cobalt核心建立實時執行緒排程實體函數cobalt_thread ，關於系統呼叫7. Linux核心系統呼叫與實時核心系統呼叫

ipipe_handle_syscall()
	__ipipe_notify_syscall()
		ipipe_syscall_hook()
			handle_head_syscall()
				cobalt_search_process()/**/
		ipipe_syscall_hook()
			CoBaLt_thread_create()
/*
policy ：排程策略
param_ex：擴充套件引數
    struct sched_param_ex {
        int sched_priority;   //優先順序
        union {
            struct __sched_ss_param ss; //SPORADIC排程類引數ss
            struct __sched_rr_param rr; //排程類rr
            struct __sched_tp_param tp; //排程類 tp
            struct __sched_quota_param quota;//排程類quota
        } sched_u;
    };
personality:cobalt
*/
ret = -XENOMAI_SYSCALL5(sc_cobalt_thread_create, ptid,
				policy, &param_ex, personality, &u_winoff);

該系統呼叫位於kernel\xenomai\posix\thread.c:

COBALT_SYSCALL(thread_create, init,
	       (unsigned long pth, int policy,
		struct sched_param_ex __user *u_param,
		int xid,
		__u32 __user *u_winoff))
{
	struct sched_param_ex param_ex;

	ret = cobalt_copy_from_user(&param_ex, u_param, sizeof(param_ex));
	......

	return __cobalt_thread_create(pth, policy, &param_ex, xid, u_winoff);
}

將排程引數從使用者空間拷貝到param_ex，接著呼叫__cobalt_thread_create進行建立。

int __cobalt_thread_create(unsigned long pth, int policy,
			   struct sched_param_ex *param_ex,
			   int xid, __u32 __user *u_winoff)
{
	struct cobalt_thread *thread = NULL;
	struct task_struct *p = current;
	struct cobalt_local_hkey hkey;
	int ret;
	/*
	 * We have been passed the pthread_t identifier the user-space
	 * Cobalt library has assigned to our caller; we'll index our
	 * internal pthread_t descriptor in kernel space on it.
	 */
	hkey.u_pth = pth;
	hkey.mm = p->mm;

	ret = pthread_create(&thread, policy, param_ex, p);/*建立執行緒*/
......
	ret = cobalt_map_user(&thread->threadbase, u_winoff);/*在使用者任務上建立影子執行緒上下文。*/
......
	if (!thread_hash(&hkey, thread, task_pid_vnr(p))) {
		goto fail;
	}

	thread->hkey = hkey;

	if (xid > 0 && cobalt_push_personality(xid) == NULL) {
		goto fail;
	}

	return xnthread_harden();
fail:
	xnthread_cancel(&thread->threadbase);

	return ret;
}

系統呼叫由該執行緒發起，所以核心中current指向該執行緒的task_struct。首先用hkey來儲存該執行緒的使用者空間執行緒ID pthread_t、該執行緒的記憶體管理結構current->mm，執行緒ID時整個系統中唯一不能重複的;

struct cobalt_local_hkey {
	/** pthread_t from userland. */
	unsigned long u_pth;
	/** kernel mm context.*/
	struct mm_struct *mm; 
};

hkey是用來做hash查詢的，用hkey來快速查詢對應的實時執行緒實體cobalt_thread 。舉個例子，有個簡單的需求，一個實時執行緒正執行在實時核心上，現需要修改執行緒的name，如果呼叫非實時的thread_setname來修改,發起系統呼叫時ipipe發現這是一個linux的系統呼叫，需要呼叫linux的服務，就會觸發雙核間遷移，先遷移到linux核心，然後通過linux實現的thread_setname服務修改task_struct中的comm，修改完後再遷移到實時核心，整個過程代價就非常大。

避免這樣的事發生，實時核心實現了核心呼叫sc_cobalt_thread_setname,及libcobalt的__RT(thread_setname)，libcobalt會先獲取執行緒ID作為第一個引數來發起系統呼叫sc_cobalt_thread_setname，系統呼叫前後都是實時上下文，無需核心間切換，實時核心直接根據hkey快速得到實時核心的排程實體cobalt_thread，再得到host_task，接著修改host_task的comm成員。

//xenomai3.0.8\lib\cobalt\thread.c
COBALT_IMPL(int, pthread_setname_np, (pthread_t thread, const char *name))
{
return -XENOMAI_SYSCALL2(sc_cobalt_thread_setname, thread, name);
}

COBALT_SYSCALL(thread_setname, current,
	       (unsigned long pth, const char __user *u_name))
{
	struct cobalt_local_hkey hkey;
	struct cobalt_thread *thread;
	char name[XNOBJECT_NAME_LEN];
	struct task_struct *p;
......
	if (cobalt_strncpy_from_user(name, u_name,
				     sizeof(name) - 1) < 0)
......
	name[sizeof(name) - 1] = '\0';
	hkey.u_pth = pth;
	hkey.mm = current->mm;
......
	thread = thread_lookup(&hkey);
......
	p = xnthread_host_task(&thread->threadbase);
......
	knamecpy(p->comm, name);
......
	return 0;
}

3.1 初始化cobalt_thread->threadbase

接下來呼叫pthread_create(&thread, policy, param_ex, p)進行實時核心排程實體cobalt_thread 的建立。

static int pthread_create(struct cobalt_thread **thread_p,
			  int policy,
			  const struct sched_param_ex *param_ex,
			  struct task_struct *task)
{
	struct xnsched_class *sched_class;
	union xnsched_policy_param param;
	struct xnthread_init_attr iattr;
	struct cobalt_thread *thread;
	xnticks_t tslice;
	int ret, n;
	spl_t s;

	thread = xnmalloc(sizeof(*thread));
......
	tslice = cobalt_time_slice; /*1000us *1000 */
	sched_class = cobalt_sched_policy_param(&param, policy,
						param_ex, &tslice);/*根據引數獲取排程類，設定排程引數*/
......
  
	iattr.name = task->comm;
	iattr.flags = XNUSER|XNFPU;
	iattr.personality = &cobalt_personality;   /*cobalt執行緒*/
	iattr.affinity = CPU_MASK_ALL;	
	ret = xnthread_init(&thread->threadbase, &iattr, sched_class, &param);/*初始化xnthread*/

	thread->magic = COBALT_THREAD_MAGIC;
	xnsynch_init(&thread->monitor_synch, XNSYNCH_FIFO, NULL);

	xnsynch_init(&thread->sigwait, XNSYNCH_FIFO, NULL);
	sigemptyset(&thread->sigpending);
	for (n = 0; n < _NSIG; n++)
		INIT_LIST_HEAD(thread->sigqueues + n);

	xnthread_set_slice(&thread->threadbase, tslice);/*設定執行緒時間切片資訊*/
	cobalt_set_extref(&thread->extref, NULL, NULL);

	/*
	 * We need an anonymous registry entry to obtain a handle for
	 * fast mutex locking.
	*/
	ret = xnthread_register(&thread->threadbase, "");
    
	xnlock_get_irqsave(&nklock, s);
	list_add_tail(&thread->next, &cobalt_thread_list);/*新增到連結串列 cobalt_thread_list*/
	xnlock_put_irqrestore(&nklock, s);

	thread->hkey.u_pth = 0;
	thread->hkey.mm = NULL;

	*thread_p = thread;

	return 0;
}

首先分配一個cobalt_thread，分配是從cobalt_heap中分配，cobalt_heap時Cobalt核心管理的一片記憶體空間。xenomai初始化時從linux分配而來。關於cobalt_heap，後面解析。

接下來根據使用者設定的優先順序，來決定排程類，預設只有xnsched_class_rt。其餘排程類需核心編譯時設定，詳見11.2 排程策略與排程類小節。

21-24行iattr 先設定執行緒的屬性attr；

struct xnthread_init_attr {
struct xnthread_personality *personality;
cpumask_t affinity;
int flags;
const char *name;
};

該結構的成員定義如下：

name：代表執行緒符號名稱的ASCII字串。。
flags：影響操作的一組建立標誌。以下標誌可以是此位掩碼的一部分：
- XNSUSP建立處於掛起狀態的執行緒。在這種情況下，除了為它呼叫xnthread_start（）之外，還應使用xnthread_resume（）服務顯式恢復該執行緒開始執行。呼叫xnthread_start(）作為啟動模式時，也可以指定此標誌。
- XNUSER 如果執行緒將對映到現有的使用者空間任務，則應設定XNUSER。否則，將建立一個新的核心任務。
- XNFPU（啟用FPU）告訴Cobalt新執行緒可能使用浮點單元。即使未設定，也會隱式假設使用者空間執行緒使用XNFPU。
affinity：此執行緒的處理器親和性。傳遞CPU_MASK_ALL意味著允許核心將其分配到任意CPU上執行。傳遞空集無效。

xnthread_init->__xnthread_init()主要初始化結構體cobalt_thread各成員變數。

int __xnthread_init(struct xnthread *thread,
		    const struct xnthread_init_attr *attr,
		    struct xnsched *sched,
		    struct xnsched_class *sched_class,
		    const union xnsched_policy_param *sched_param)
{
	int flags = attr->flags, ret, gravity;
    ......
thread->personality = attr->personality;/* xenomai_personality */
	cpumask_and(&thread->affinity, &attr->affinity, &cobalt_cpu_affinity);
	thread->sched = sched;
	thread->state = flags;/*(XNROOT | XNFPU)*//*XNUSER|XNFPU*/
	thread->info = 0;
	thread->local_info = 0;
	thread->lock_count = 0;
	thread->rrperiod = XN_INFINITE;//0
	thread->wchan = NULL;
	thread->wwake = NULL;
	thread->wcontext = NULL;
	thread->res_count = 0;
	thread->handle = XN_NO_HANDLE;
	memset(&thread->stat, 0, sizeof(thread->stat));
	thread->selector = NULL;
	INIT_LIST_HEAD(&thread->claimq);
	INIT_LIST_HEAD(&thread->glink);
	/* These will be filled by xnthread_start() */
	thread->entry = NULL;
	thread->cookie = NULL;
	init_completion(&thread->exited);
	memset(xnthread_archtcb(thread), 0, sizeof(struct xnarchtcb));

	/*初始化sched->rootc中xnthread裡的定時器b*/
	gravity = flags & XNUSER ? XNTIMER_UGRAVITY : XNTIMER_KGRAVITY;
	xntimer_init(&thread->rtimer, &nkclock, timeout_handler,
		     sched, gravity);   /*建立執行緒定時器*/
	xntimer_set_name(&thread->rtimer, thread->name);
	xntimer_set_priority(&thread->rtimer, XNTIMER_HIPRIO);
	xntimer_init(&thread->ptimer, &nkclock, periodic_handler,
		     sched, gravity);   /*建立執行緒週期定時器*/
	xntimer_set_name(&thread->ptimer, thread->name);
	xntimer_set_priority(&thread->ptimer, XNTIMER_HIPRIO);/*設定定時器優先順序*/

	thread->base_class = NULL; /* xnsched_set_policy() will set it. */
	ret = xnsched_init_thread(thread);/**/
	if (ret)
		goto err_out;

初始化sched為當前cpu的xnsched，affinity為attr->affinity，flags為XNUSER|XNFPU；以及兩個xntimer 。接下來進行排程相關初始化。

ret = xnsched_set_policy(thread, sched_class, sched_param);

/* Must be called with nklock locked, interrupts off. */
int xnsched_set_policy(struct xnthread *thread,
		       struct xnsched_class *sched_class,
		       const union xnsched_policy_param *p)
{
	int ret;
	/*
	 * Declaring a thread to a new scheduling class may fail, so
	 * we do that early, while the thread is still a member of the
	 * previous class. However, this also means that the
	 * declaration callback shall not do anything that might
	 * affect the previous class (such as touching thread->rlink
	 * for instance).
	 */
	if (sched_class != thread->base_class) {
		ret = xnsched_declare(sched_class, thread, p);
		......
	}
	/*
	 * As a special case, we may be called from __xnthread_init()
	 * with no previous scheduling class at all.
	 */
	if (likely(thread->base_class != NULL)) {
		if (xnthread_test_state(thread, XNREADY))
			xnsched_dequeue(thread);

		if (sched_class != thread->base_class)
			xnsched_forget(thread);
	}

	thread->sched_class = sched_class;
	thread->base_class = sched_class;
	xnsched_setparam(thread, p);
	thread->bprio = thread->cprio;
	thread->wprio = thread->cprio + sched_class->weight;

	if (xnthread_test_state(thread, XNREADY))
		xnsched_enqueue(thread);

	if (!xnthread_test_state(thread, XNDORMANT))
		xnsched_set_resched(thread->sched);

	return 0;
}

如果將設定的sched_class與base_class不相同，則將該執行緒放到新的sched_class上。接下來如果已經屬於某個排程類也就是base_classs不為空，而且處於就緒狀態，則把該執行緒從base_classs的就緒佇列中取下；接著如果sched_class與base_class不相同呼叫base_class的xnsched_forget將thread從排程類中刪除。從base_classs刪除後，32-33行就可以設定新的sched_class了。

34行根據新的sched_class 設定該thread新的優先順序及加權優先順序,並將thead的狀體位新增XNWEAK。

static inline void xnsched_setparam(struct xnthread *thread,
				    const union xnsched_policy_param *p)
{
	struct xnsched_class *sched_class = thread->sched_class;

	if (sched_class != &xnsched_class_idle)
		__xnsched_rt_setparam(thread, p);
	else
		__xnsched_idle_setparam(thread, p);

	thread->wprio = thread->cprio + sched_class->weight;
}

static inline void __xnsched_rt_setparam(struct xnthread *thread,
					 const union xnsched_policy_param *p)
{
	thread->cprio = p->rt.prio;
	if (!xnthread_test_state(thread, XNBOOST)) {
		if (thread->cprio)
			xnthread_clear_state(thread, XNWEAK);
		else
			xnthread_set_state(thread, XNWEAK);
	}
}

初始化完成後，42行設定thread所屬的那個xnsched重新排程標誌XNRESCHED。

回到pthread_create()函數，接著初始化cobalt_thread訊號相關成員sigpending和sigwait，同步資源xnsynch monitor_synch，關於同步資源13 xenomai執行緒間同步詳細分析,設定預設時間片並啟動迴圈定時器rrbtimer。

將cobalt_thread新增到全域性連結串列cobalt_thread_list。

3.2 使用者任務shadow執行緒上下文建立。

通過核心的pthread_create函數已經基本將實時排程實體初始化完畢，但還沒有與linux的排程實體聯絡起來，也就是說雖然在實時核心已經建立了排程實體但是具體的實時程式的使用者程式碼在哪實時核心一無所知。並且當該實時任務在實時核心執行時，需要將該任務的執行狀態反映到linux空間。這樣使用者才能查詢到實時任務的執行狀態。

Cobalt中排程的實體稱為linux空間的一個影子（show），cobalt_map_user函數將Xenomai執行緒對映到在使用者空間中執行的常規Linux任務。底層Linux任務的優先順序和排程類不受影響。

int cobalt_map_user(struct xnthread *thread, __u32 __user *u_winoff)

該函數接收兩個引數，thread表示要對映到current的新影子執行緒的描述符地址，也就是xnthread，thread必須先前已通過呼叫xnthread_init（）進行初始化。u_winoff是與thread關聯的「u_window」結構在全域性記憶體池(cobalt_kernel_ppd.umm.heap)中的與記憶體池起始地址的偏移量（關於xenomai xnheap詳見14 xenomai記憶體池管理），libcobalt會將核心中cobalt_kernel_ppd.umm.heap起始地址對映到使用者空間的cobalt_umm_shared，使用者空間通過cobalt_umm_shared + u_winoff就可以存取該執行緒核心中的「u_window」結構。從使用者空間可見的執行緒狀態資訊通過此「u_window」結構通過共用記憶體方式獲取。

	if (!xnthread_test_state(thread, XNUSER))
		return -EINVAL;

	if (xnthread_current() || xnthread_test_state(thread, XNMAPPED))
		return -EBUSY;

	if (!access_wok(u_winoff, sizeof(*u_winoff)))
		return -EFAULT;

首先判讀該執行緒是不是使用者執行緒，如果不是則報錯。接著判斷thread是否已經對映到一個執行緒任務，不能重複對映。接著判斷使用者空間地址u_winoff是否正常，否則發生錯誤。

    umm = &cobalt_kernel_ppd.umm;
	u_window = cobalt_umm_alloc(umm, sizeof(*u_window));
	if (u_window == NULL)
		return -ENOMEM;

	thread->u_window = u_window;
	__xn_put_user(cobalt_umm_offset(umm, u_window), u_winoff);

從cobalt_kernel_ppd管理的一片與使用者空間共用的記憶體umm裡分配u_window結構，將該結構地址給thread->u_window，並且算出改地址到umm的基地址的偏移，將偏移值儲存到使用者空間地址u_winoff處。接下來處理task_struct。

	xnthread_init_shadow_tcb(thread);

xnthread_init_shadow_tcb(thread)，將linux管理的task_struct相關變數儲存到thread->tcb,tcb結構如下

struct xntcb {
	struct task_struct *host_task; /*指向linux 管理task_struct*/
	struct thread_struct *tsp; /*task_struct->thread執行緒切換時需要切換的暫存器*/
	struct mm_struct *mm; 		/*使用者空間任務記憶體管理 task_struct->mm*/
	struct mm_struct *active_mm;
	struct thread_struct ts;
#ifdef CONFIG_XENO_ARCH_WANT_TIP
	struct thread_info *tip;   /*thread_info*/
#endif
#ifdef CONFIG_XENO_ARCH_FPU
	struct task_struct *user_fpu_owner;/*浮點上下文*/
#endif
};

struct xnarchtcb {
	struct xntcb core;
#if LINUX_VERSION_CODE < KERNEL_VERSION(4,8,0)
	unsigned long sp;	
	unsigned long *spp;	
	unsigned long ip;
	unsigned long *ipp;
#endif  
#ifdef IPIPE_X86_FPU_EAGER
	struct fpu *kfpu;
#else
	x86_fpustate *fpup;
	unsigned int root_used_math: 1;
	x86_fpustate *kfpu_state;
#endif
	unsigned int root_kfpu: 1;
	struct {
		unsigned long ip;
		unsigned long ax;
	} mayday;
};

在 task_struct 裡面，有一個成員變數 thread。這裡面保留了要切換程序的時候需要修改的暫存器。core.host_task指向task_struct，core.tsp指向task_struct裡的thread，core.active_mm與core.mm都指向task_struct裡的mm，core.tip指向task_struct中的thread_info.

	xnthread_suspend(thread, XNRELAX, XN_INFINITE, XN_RELATIVE, NULL);
	init_uthread_info(thread);
	xnthread_set_state(thread, XNMAPPED);/*XNMAPPED 執行緒是對映到linux的任務 */
	xnthread_run_handler(thread, map_thread);/*cobalt_thread_map*/
	ipipe_enable_notifier(current);/*thread_info ->flags置位 TIP_NOTIFY*/

thread_info ->flags置位 TIP_NOTIFY.

下面啟動啟動執行緒,呼叫 xnthread_start(thread, &attr)啟動執行緒.

int xnthread_start(struct xnthread *thread,
		   const struct xnthread_start_attr *attr)
{
	spl_t s;
....
	thread->entry = attr->entry;
	thread->cookie = attr->cookie;
   .......
	if (xnthread_test_state(thread, XNUSER))
		enlist_new_thread(thread);/*新增到連結串列 nkthreadq */

	xnthread_resume(thread, XNDORMANT);
	xnsched_run();
	return 0;
}

設定執行緒入口entry與引數cookie，將thre新增到全域性佇列nkthreadq，接下來呼叫xnthread_resume()和xnsched_run()，根據標誌位，均未進行任何操作。

返回到__cobalt_thread_create()函數接著處理。

3.3 繫結到Cobalt 核心

	if (!thread_hash(&hkey, thread, task_pid_vnr(p))) {
		ret = -EAGAIN;
		goto fail;
	}

	thread->hkey = hkey;/*核心mm*/

	return xnthread_harden();

將hkey加入local_thread_hash與global_thread_hash，並將該hkey儲存到cobalt_thread->hkey。

到此全都初始化完畢，可以在xenomai域排程，由於是實時執行緒，優先順序比linux高，建立完成應該先跑起來，呼叫xnthread_harden()遷移到head域執行,，在12 雙核間任務遷移詳細分析。

xnthread_harden()返回後，返回使用者空間libCobalt中的函數cobalt_thread_trampoline繼續執行執行。

ret = -XENOMAI_SYSCALL5(sc_cobalt_thread_create, ptid,
				policy, &param_ex, personality, &u_winoff);
	if (ret == 0)
		cobalt_set_tsd(u_winoff);
	/*
	 * We must access anything we'll need from *iargs before
	 * posting the sync semaphore, since our released parent could
	 * unwind the stack space onto which the iargs struct is laid
	 * on before we actually get the CPU back.
	*/
sync_with_creator:
	iargs->ret = ret;
	__STD(sem_post(&iargs->sync));
	if (ret)
		return (void *)ret;

	/*
	 * If the parent thread runs with the same priority as we do,
	 * then we should yield the CPU to it, to preserve the
	 * scheduling order.
	 */
	if (param_ex.sched_priority == parent_prio)
		__STD(sched_yield());

	cobalt_thread_harden();

	retval = start(arg);/*開始執行真正的使用者執行緒函數*/

	pthread_setmode_np(PTHREAD_WARNSW, 0, NULL);

	return retval;
}

系統呼叫返回0表示實時執行緒建立成功，cobalt_set_tsd設定執行緒資料tsd（TSD: Thread-Specific Data）

在單執行緒的程式裡，有兩種基本的資料：全域性變數和區域性變數。但在多執行緒程式裡，還有第三種資料型別：執行緒資料（TSD: Thread-Specific Data）。它和全域性變數很象，線上程內部，各個函數可以象使用全域性變數一樣呼叫它，但它對執行緒外部的其它執行緒是不可見的。例如我們常見的變數 errno，它返回標準的出錯資訊。它顯然不能是一個區域性變數，幾乎每個函數都應該可以呼叫它；

cobalt_set_tsd使用系統呼叫sc_cobalt_get_current獲取核心中的xnthread.handle結合u_winoff來設定，具體流程不展開。注意此時該執行緒處於head域。

接著呼叫glibc中的sem_post，發起linux系統呼叫，釋放iargs->sync訊號讓阻塞的父執行緒繼續執行。呼叫linux系統服務會發生head->root遷移，執行，後從Linux呼叫返回，此時執行緒處於root域。

由於處於root域，所以接著呼叫cobalt_thread_harden();發起Cobalt核心sc_cobalt_migrate系統呼叫（實時核心公開的專用系統呼叫），將執行緒切換至Cobalt排程（繫結到cobalt核心），到此該執行緒建立完畢，待cobalt排程後得到執行，返回使用者空間以Cobalt執行緒的身份開始執行使用者指定的執行緒函數start(arg)。

使用者程式碼中會可能呼叫linux服務，這樣還會發生很多次head>root->head的遷移。

4 總結

到此整個cobalt執行緒建立主流程如下：

先通過標準pthread_creta()建立linux任務，任務執行入口為cobalt_thread_trampoline()；
cobalt_thread_trampoline()中發起cobalt核心系統呼叫，建立cobalt排程任務實體；
通過cobalt_thread_harden()遷移到cobalt核心排程；
執行真正的使用者任務入口start()函數。