MIT6.828學習筆記3(Lab3)

2022-12-08 06:00:22

Lab 3: User Environments

在這個lab中我們需要建立一個使用者環境(UNIX中的程序,它們的介面和實現不同),載入一個程式並執行,並使核心能夠處理一些常用的中斷請求。

Part A: User Environments and Exception Handling

kern/env.c中可以找到核心維護的三個全域性變數:

struct Env *envs = NULL;		// All environments
struct Env *curenv = NULL;		// The current env
static struct Env *env_free_list;	// Free environment list
  • envs是一個陣列,每個元素都是一個Env結構體。這個陣列儲存了所有的環境對應的結構體。
  • curenv指的是當前環境對應的Env
  • env_free_list儲存空閒的環境,可以通過env_free_list->env_link來存取下一項。

Environment State

inc/env.h中可以找到struct Env的定義:

struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run

	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};
  • env_tf當此環境未執行時,它儲存此環境的暫存器的值,在恢復執行時恢復暫存器的值。
  • env_link指向下一個空閒的環境。
  • env_id在環境建立時分配的獨有的識別符號。
  • env_parent_id在一個環境中建立一個新的環境時,當前環境就視為新環境的「parent」,新環境的env_parent_id就等於其「parent」的env_id
  • env_type環境的型別,大多數是ENV_TYPE_USER,在之後的lab中會出現更多型別。
  • env_status環境的狀態。
  • env_runs環境執行的次數。
  • env_pgdir環境對應的頁目錄的核心虛擬地址。
env_type:

ENV_FREE:
    Indicates that the Env structure is inactive, and therefore on the env_free_list.
ENV_RUNNABLE:
    Indicates that the Env structure represents an environment that is waiting to run on the processor.
ENV_RUNNING:
    Indicates that the Env structure represents the currently running environment.
ENV_NOT_RUNNABLE:
    Indicates that the Env structure represents a currently active environment, but it is not currently ready to run: for example, because it is waiting for an interprocess communication (IPC) from another environment.
ENV_DYING:
    Indicates that the Env structure represents a zombie environment. A zombie environment will be freed the next time it traps to the kernel. We will not use this flag until Lab 4. 

Allocating the Environments Array

envs陣列分配空間。
kern/pmap.c中:

	// Map the 'envs' array read-only by the user at linear address UENVS
	// (ie. perm = PTE_U | PTE_P).
	// Permissions:
	//    - the new image at UENVS  -- kernel R, user R
	//    - envs itself -- kernel RW, user NONE
	// LAB 3: Your code here.
	boot_map_region(kern_pgdir, UENVS, NENV * sizeof(struct Env), PADDR(envs), PTE_U | PTE_P);

Creating and Running Environments

由於當前還沒有檔案系統,因此二進位制檔案是以ELF可執行映像格式嵌入在核心中的。
我們需要建立好使用者環境,載入映像檔案並執行。

kern/env.c:

env_init()
    Initialize all of the Env structures in the envs array and add them to the env_free_list. Also calls env_init_percpu, which configures the segmentation hardware with separate segments for privilege level 0 (kernel) and privilege level 3 (user).
env_setup_vm()
    Allocate a page directory for a new environment and initialize the kernel portion of the new environment's address space.
region_alloc()
    Allocates and maps physical memory for an environment
load_icode()
    You will need to parse an ELF binary image, much like the boot loader already does, and load its contents into the user address space of a new environment.
env_create()
    Allocate an environment with env_alloc and call load_icode to load an ELF binary into it.
env_run()
    Start a given environment running in user mode. 

env_init:

// Mark all environments in 'envs' as free, set their env_ids to 0,
// and insert them into the env_free_list.
// Make sure the environments are in the free list in the same order
// they are in the envs array (i.e., so that the first call to
// env_alloc() returns envs[0]).
//
void
env_init(void)
{
	// Set up envs array
	// LAB 3: Your code here.
	env_free_list = NULL;
	for (int i = NENV - 1; i >= 0; i--) {
		envs[i].env_id = 0;
		envs[i].env_link = env_free_list;
		env_free_list = &envs[i];
	}
	// Per-CPU part of the initialization
	env_init_percpu();
}

初始化envs中所有的元素,注意連結串列的方向。

env_setup_vm:

// Initialize the kernel virtual memory layout for environment e.
// Allocate a page directory, set e->env_pgdir accordingly,
// and initialize the kernel portion of the new environment's address space.
// Do NOT (yet) map anything into the user portion
// of the environment's virtual address space.
//
// Returns 0 on success, < 0 on error.  Errors include:
//	-E_NO_MEM if page directory or table could not be allocated.
//
static int
env_setup_vm(struct Env *e)
{
	int i;
	struct PageInfo *p = NULL;

	// Allocate a page for the page directory
	if (!(p = page_alloc(ALLOC_ZERO)))
		return -E_NO_MEM;

	// Now, set e->env_pgdir and initialize the page directory.
	//
	// Hint:
	//    - The VA space of all envs is identical above UTOP
	//	(except at UVPT, which we've set below).
	//	See inc/memlayout.h for permissions and layout.
	//	Can you use kern_pgdir as a template?  Hint: Yes.
	//	(Make sure you got the permissions right in Lab 2.)
	//    - The initial VA below UTOP is empty.
	//    - You do not need to make any more calls to page_alloc.
	//    - Note: In general, pp_ref is not maintained for
	//	physical pages mapped only above UTOP, but env_pgdir
	//	is an exception -- you need to increment env_pgdir's
	//	pp_ref for env_free to work correctly.
	//    - The functions in kern/pmap.h are handy.

	// LAB 3: Your code here.
	p->pp_ref += 1;
	e->env_pgdir = (pde_t *)page2kva(p);
	memcpy(e->env_pgdir, kern_pgdir, PGSIZE);
	// UVPT maps the env's own page table read-only.
	// Permissions: kernel R, user R
	e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;

	return 0;
}

為環境的頁目錄分配一個記憶體頁。然後增加該記憶體頁的參照,將該頁的虛擬地址作為環境的頁目錄地址。然後初始化該頁目錄。UVPT對映到當前環境的頁目錄起始地址處。

region_alloc:

// Allocate len bytes of physical memory for environment env,
// and map it at virtual address va in the environment's address space.
// Does not zero or otherwise initialize the mapped pages in any way.
// Pages should be writable by user and kernel.
// Panic if any allocation attempt fails.
//
static void
region_alloc(struct Env *e, void *va, size_t len)
{
	// LAB 3: Your code here.
	// (But only if you need it for load_icode.)
	//
	struct PageInfo *pp = NULL;
	void *up = ROUNDUP(va + len, PGSIZE);
	void *down = ROUNDDOWN(va, PGSIZE);
	while (down < up) {
		if((pp = page_alloc(0)) != NULL) {
			if(page_insert(e->env_pgdir, pp, down, PTE_U | PTE_W) < 0){
				panic("page_insert failed\n");
			}
		}
		else {
			panic("page_alloc failed\n");
		}
		down += PGSIZE;
	}
	// Hint: It is easier to use region_alloc if the caller can pass
	//   'va' and 'len' values that are not page-aligned.
	//   You should round va down, and round (va + len) up.
	//   (Watch out for corner-cases!)
}

為環境e從虛擬地址va起分配len位元組空間,需要注意的是地址對齊。

load_icode:

// Set up the initial program binary, stack, and processor flags
// for a user process.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
//
// This function loads all loadable segments from the ELF binary image
// into the environment's user memory, starting at the appropriate
// virtual addresses indicated in the ELF program header.
// At the same time it clears to zero any portions of these segments
// that are marked in the program header as being mapped
// but not actually present in the ELF file - i.e., the program's bss section.
//
// All this is very similar to what our boot loader does, except the boot
// loader also needs to read the code from disk.  Take a look at
// boot/main.c to get ideas.
//
// Finally, this function maps one page for the program's initial stack.
//
// load_icode panics if it encounters problems.
//  - How might load_icode fail?  What might be wrong with the given input?
//
static void
load_icode(struct Env *e, uint8_t *binary)
{
	// Hints:
	//  Load each program segment into virtual memory
	//  at the address specified in the ELF segment header.
	//  You should only load segments with ph->p_type == ELF_PROG_LOAD.
	//  Each segment's virtual address can be found in ph->p_va
	//  and its size in memory can be found in ph->p_memsz.
	//  The ph->p_filesz bytes from the ELF binary, starting at
	//  'binary + ph->p_offset', should be copied to virtual address
	//  ph->p_va.  Any remaining memory bytes should be cleared to zero.
	//  (The ELF header should have ph->p_filesz <= ph->p_memsz.)
	//  Use functions from the previous lab to allocate and map pages.
	//
	//  All page protection bits should be user read/write for now.
	//  ELF segments are not necessarily page-aligned, but you can
	//  assume for this function that no two segments will touch
	//  the same virtual page.
	//
	//  You may find a function like region_alloc useful.
	//
	//  Loading the segments is much simpler if you can move data
	//  directly into the virtual addresses stored in the ELF binary.
	//  So which page directory should be in force during
	//  this function?
	//
	//  You must also do something with the program's entry point,
	//  to make sure that the environment starts executing there.
	//  What?  (See env_run() and env_pop_tf() below.)

	// LAB 3: Your code here.
	struct Elf *ELFHDR = (struct Elf *)binary;
	struct Proghdr *ph = NULL, *eph = NULL;
	if (ELFHDR->e_magic != ELF_MAGIC) {
		panic("ELFHDR->e_magic != ELF_MAGIC\n");
	}
	ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
	eph = ph + ELFHDR->e_phnum;
	lcr3(PADDR(e->env_pgdir));
	for(; ph < eph; ph++){
		if(ph->p_type == ELF_PROG_LOAD){
			if(ph->p_filesz > ph->p_memsz){
				panic("ph->filesz > ph->memsz\n");
			}
			region_alloc(e, (void *)ph->p_va, ph->p_memsz);
			memcpy((void *)ph->p_va, (void *)(binary + ph->p_offset), ph->p_filesz);
			memset((void *)(ph->p_va + ph->p_filesz), 0, ph->p_memsz - ph->p_filesz);
		}

	}
	lcr3(PADDR(kern_pgdir));
	e->env_tf.tf_eip = ELFHDR->e_entry;
	// Now map one page for the program's initial stack
	// at virtual address USTACKTOP - PGSIZE.

	// LAB 3: Your code here.
	region_alloc(e, (void *)USTACKTOP - PGSIZE, PGSIZE);
}	

給定二進位制檔案的起始地址,我們需要將它載入進當前環境中。主要參考boot/main.c的實現,先讀取ELF標頭檔案,計算每個段的偏移量並讀取它們,因為是按頁讀取的,所以需要將多餘的部分重新初始化一下。 lcr3(PADDR(e->env_pgdir));設定使用使用者的線性地址空間,此時使用核心的也可以,因為當前環境與核心的頁目錄只有一項不同。之後將環境的執行起始地址設為可執行檔案的第一條指令。然後為程式分配一個棧。

env_create:

// Allocates a new env with env_alloc, loads the named elf
// binary into it with load_icode, and sets its env_type.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
// The new env's parent ID is set to 0.
//
void
env_create(uint8_t *binary, enum EnvType type)
{
	// LAB 3: Your code here.
	struct Env *e = NULL;
	if(env_alloc(&e, 0) == 0)
	{
		e->env_type = type;
		load_icode(e, binary);
	}
}

給定一個二進位制檔案和一個環境型別,我們要建立並初始化好這個環境。
直接調函數即可。env_alloc的實現也在kern/env.c中。該函數將建立好的環境儲存在e中。我們直接設定環境型別並載入檔案即可。

env_run:

// Context switch from curenv to env e.
// Note: if this is the first call to env_run, curenv is NULL.
//
// This function does not return.
//
void
env_run(struct Env *e)
{
	// Step 1: If this is a context switch (a new environment is running):
	//	   1. Set the current environment (if any) back to
	//	      ENV_RUNNABLE if it is ENV_RUNNING (think about
	//	      what other states it can be in),
	//	   2. Set 'curenv' to the new environment,
	//	   3. Set its status to ENV_RUNNING,
	//	   4. Update its 'env_runs' counter,
	//	   5. Use lcr3() to switch to its address space.
	// Step 2: Use env_pop_tf() to restore the environment's
	//	   registers and drop into user mode in the
	//	   environment.

	// Hint: This function loads the new environment's state from
	//	e->env_tf.  Go back through the code you wrote above
	//	and make sure you have set the relevant parts of
	//	e->env_tf to sensible values.

	// LAB 3: Your code here.
	if (curenv != NULL && curenv->env_status == ENV_RUNNING) {
		curenv->env_status = ENV_RUNNABLE;
	}
	curenv = e;
	e->env_status = ENV_RUNNING;
	e->env_runs += 1;
	lcr3(PADDR(e->env_pgdir));
	env_pop_tf(&(e->env_tf));
	// panic("env_run not yet implemented");
}

從當前環境切換到環境e。設定當前環境和環境e的狀態,增加環境e的執行次數,更新全域性變數curenv,載入環境e的地址空間。恢復儲存的暫存器值。

env_pop_tf:

// Restores the register values in the Trapframe with the 'iret' instruction.
// This exits the kernel and starts executing some environment's code.
//
// This function does not return.
//
void
env_pop_tf(struct Trapframe *tf)
{
	asm volatile(
		"\tmovl %0,%%esp\n"
		"\tpopal\n"
		"\tpopl %%es\n"
		"\tpopl %%ds\n"
		"\taddl $0x8,%%esp\n" /* skip tf_trapno and tf_errcode */
		"\tiret\n"
		: : "g" (tf) : "memory");
	panic("iret failed");  /* mostly to placate the compiler */
}

這個函數接受一個struct Trapframe為引數,這個結構的定義在inc/trap.h中:

struct PushRegs {
	/* registers as pushed by pusha */
	uint32_t reg_edi;
	uint32_t reg_esi;
	uint32_t reg_ebp;
	uint32_t reg_oesp;		/* Useless */
	uint32_t reg_ebx;
	uint32_t reg_edx;
	uint32_t reg_ecx;
	uint32_t reg_eax;
} __attribute__((packed));

struct Trapframe {
	struct PushRegs tf_regs;
	uint16_t tf_es;
	uint16_t tf_padding1;
	uint16_t tf_ds;
	uint16_t tf_padding2;
	uint32_t tf_trapno;
	/* below here defined by x86 hardware */
	uint32_t tf_err;
	uintptr_t tf_eip;
	uint16_t tf_cs;
	uint16_t tf_padding3;
	uint32_t tf_eflags;
	/* below here only when crossing rings, such as from user to kernel */
	uintptr_t tf_esp;
	uint16_t tf_ss;
	uint16_t tf_padding4;
} __attribute__((packed));

再看看這個函數的組合程式碼:

f010399a:	55                   	push   %ebp
f010399b:	89 e5                	mov    %esp,%ebp
f010399d:	53                   	push   %ebx
f010399e:	83 ec 08             	sub    $0x8,%esp
f01039a1:	e8 c1 c7 ff ff       	call   f0100167 <__x86.get_pc_thunk.bx>
f01039a6:	81 c3 7a 96 08 00    	add    $0x8967a,%ebx
	asm volatile(
f01039ac:	8b 65 08             	mov    0x8(%ebp),%esp
f01039af:	61                   	popa   
f01039b0:	07                   	pop    %es
f01039b1:	1f                   	pop    %ds
f01039b2:	83 c4 08             	add    $0x8,%esp
f01039b5:	cf                   	iret   
		"\tpopl %%es\n"
		"\tpopl %%ds\n"
		"\taddl $0x8,%%esp\n" /* skip tf_trapno and tf_errcode */
		"\tiret\n"
		: : "g" (tf) : "memory");
	panic("iret failed");  /* mostly to placate the compiler */
f01039b6:	8d 83 e9 95 f7 ff    	lea    -0x86a17(%ebx),%eax
f01039bc:	50                   	push   %eax
f01039bd:	68 dd 01 00 00       	push   $0x1dd
f01039c2:	8d 83 6a 95 f7 ff    	lea    -0x86a96(%ebx),%eax
f01039c8:	50                   	push   %eax
f01039c9:	e8 e3 c6 ff ff       	call   f01000b1 <_panic>

tf_regs儲存了當前環境的暫存器的值,因為儲存的暫存器為Trapframe的第一項,因此直接將tf設當前的棧指標,然後通過popa指令恢復通用暫存器的值。然後恢復暫存器esds的值。通過add $0x8,%esp跳過tf_errtf_eip。然後執行iret指令,該指令會載入cs,eflags,esp,ss等的值。
我們可以通過GDB偵錯檢視一下執行iret執行前後各個暫存器的值:

(gdb) b *0xf01039b5
Breakpoint 1 at 0xf01039b5: file kern/env.c, line 469.
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0xf01039b5 <env_pop_tf+27>:	iret   

Breakpoint 1, 0xf01039b5 in env_pop_tf (
    tf=<error reading variable: Unknown argument list address for `tf'.>)
    at kern/env.c:469
469		asm volatile(
(gdb) info registers
eax            0x0	0
ecx            0x0	0
edx            0x0	0
ebx            0x0	0
esp            0xf01d2030	0xf01d2030
ebp            0x0	0x0
esi            0x0	0
edi            0x0	0
eip            0xf01039b5	0xf01039b5 <env_pop_tf+27>
eflags         0x96	[ PF AF SF ]
cs             0x8	8
ss             0x10	16
ds             0x23	35
es             0x23	35
fs             0x23	35
gs             0x23	35
(gdb) si
=> 0x800020:	cmp    $0xeebfe000,%esp
0x00800020 in ?? ()
(gdb) info registers
eax            0x0	0
ecx            0x0	0
edx            0x0	0
ebx            0x0	0
esp            0xeebfe000	0xeebfe000
ebp            0x0	0x0
esi            0x0	0
edi            0x0	0
eip            0x800020	0x800020
eflags         0x2	[ ]
cs             0x1b	27
ss             0x23	35
ds             0x23	35
es             0x23	35
fs             0x23	35
gs             0x23	35

可以看到,執行iret後cs,eflags,esp,ss的值已經改變了,他們的值定義在env_alloc函數裡面:

……
	// Set up appropriate initial values for the segment registers.
	// GD_UD is the user data segment selector in the GDT, and
	// GD_UT is the user text segment selector (see inc/memlayout.h).
	// The low 2 bits of each segment register contains the
	// Requestor Privilege Level (RPL); 3 means user mode.  When
	// we switch privilege levels, the hardware does various
	// checks involving the RPL and the Descriptor Privilege Level
	// (DPL) stored in the descriptors themselves.
	e->env_tf.tf_ds = GD_UD | 3;
	e->env_tf.tf_es = GD_UD | 3;
	e->env_tf.tf_ss = GD_UD | 3;
	e->env_tf.tf_esp = USTACKTOP;
	e->env_tf.tf_cs = GD_UT | 3;
……

執行完iret指令後,eip的值變為0x800020,這是hello程式第一條指令的位置。之後就是在使用者環境裡執行剛剛載入的程式了。

Handling Interrupts and Exceptions

hello程式裡,我們會遇到int $0x30這樣一條指令。該指令是一個系統呼叫,將字元顯示到控制檯。

我們現在需要處理使用者發出的系統呼叫請求。

前置芝士:Chapter 9 Exceptions and Interrupts

Basics of Protected Control Transfer

異常和中斷都是受保護的控制轉移,處理器從使用者模式轉向核心模式。中斷通常是由外部裝置引發的,異常則是由當前執行的程式碼產生的。為了使這兩種控制轉移確實是受保護的(受限的),x86提供了兩種機制:

  • The Interrupt Descriptor Table。產生中斷或異常後,處理器執行的指令不能是隨意的,應當由核心規定,並且使用者無法修改。不同的中斷訊號所對應的中斷向量號不同,每個都具有獨特的處理程式。這個中斷向量號可以作為IDT的索引。在描述符中有處理程式的地址和cs暫存器的值。
  • The Task State Segment。在進入處理程式之前,我們需要儲存當前環境的狀態。而且儲存的環境狀態不能受到其他使用者環境的干擾,否則可能會危害到核心。因此在從使用者模式轉到核心模式時,我們會使用核心的棧來儲存環境狀態。

A structure called the task state segment (TSS) specifies the segment selector and address where this stack lives. The processor pushes (on this new stack) SS, ESP, EFLAGS, CS, EIP, and an optional error code. Then it loads the CS and EIP from the interrupt descriptor, and sets the ESP and SS to refer to the new stack.

Types of Exceptions and Interrupts

x86可以產生的同步異常的中斷向量號為 0-31。對應IDT的第0-31項。

Nested Exceptions and Interrupts

在核心模式出現異常或者中斷時,不需要進行棧的切換,也就不需要儲存當前環境的SS和ESP暫存器。

 The processor can take exceptions and interrupts both from kernel and user mode. It is only when entering the kernel from user mode, however, that the x86 processor automatically switches stacks before pushing its old register state onto the stack and invoking the appropriate exception handler through the IDT. If the processor is already in kernel mode when the interrupt or exception occurs (the low 2 bits of the CS register are already zero), then the CPU just pushes more values on the same kernel stack. In this way, the kernel can gracefully handle nested exceptions caused by code within the kernel itself. This capability is an important tool in implementing protection, as we will see later in the section on system calls.

If the processor is already in kernel mode and takes a nested exception, since it does not need to switch stacks, it does not save the old SS or ESP registers. For exception types that do not push an error code, the kernel stack therefore looks like the following on entry to the exception handler:

                     +--------------------+ <---- old ESP
                     |     old EFLAGS     |     " - 4
                     | 0x00000 | old CS   |     " - 8
                     |      old EIP       |     " - 12
                     +--------------------+             

For exception types that push an error code, the processor pushes the error code immediately after the old EIP, as before.

There is one important caveat to the processor's nested exception capability. If the processor takes an exception while already in kernel mode, and cannot push its old state onto the kernel stack for any reason such as lack of stack space, then there is nothing the processor can do to recover, so it simply resets itself. Needless to say, the kernel should be designed so that this can't happen. 

Setting Up the IDT

trapentry.S有兩個宏:

/* TRAPHANDLER defines a globally-visible function for handling a trap.
 * It pushes a trap number onto the stack, then jumps to _alltraps.
 * Use TRAPHANDLER for traps where the CPU automatically pushes an error code.
 *
 * You shouldn't call a TRAPHANDLER function from C, but you may
 * need to _declare_ one in C (for instance, to get a function pointer
 * during IDT setup).  You can declare the function with
 *   void NAME();
 * where NAME is the argument passed to TRAPHANDLER.
 */
#define TRAPHANDLER(name, num)						\
	.globl name;		/* define global symbol for 'name' */	\
	.type name, @function;	/* symbol type is function */		\
	.align 2;		/* align function definition */		\
	name:			/* function starts here */		\
	pushl $(num);							\
	jmp _alltraps

/* Use TRAPHANDLER_NOEC for traps where the CPU doesn't push an error code.
 * It pushes a 0 in place of the error code, so the trap frame has the same
 * format in either case.
 */
#define TRAPHANDLER_NOEC(name, num)					\
	.globl name;							\
	.type name, @function;						\
	.align 2;							\
	name:								\
	pushl $0;							\
	pushl $(num);							\
	jmp _alltraps

這兩個宏向棧內壓入一個數位num,然後跳轉到_alltraps。TRPHANDLER_NOEC會向棧內多壓一個0,這是為那些沒有error code的中斷和異常準備的,使所有中斷和異常具有相同的格式,方便後續處理。

我們現在需要設定IDT表。在trapentry.S中利用這兩個宏來定義我們的處理程式:

TRAPHANDLER_NOEC(DIVIDE_HANDLER, T_DIVIDE);
TRAPHANDLER_NOEC(DEBUG_HANDLER, T_DEBUG);
TRAPHANDLER_NOEC(NMI_HANDLER, T_NMI);
TRAPHANDLER_NOEC(BRKPT_HANDLER, T_BRKPT);
TRAPHANDLER_NOEC(OFLOW_HANDLER, T_OFLOW);
TRAPHANDLER_NOEC(BOUND_HANDLER, T_BOUND);
TRAPHANDLER_NOEC(ILLOP_HANDLER, T_ILLOP);
TRAPHANDLER_NOEC(DEVICE_HANDLER, T_DEVICE);
TRAPHANDLER(DBLFLT_HANDLER, T_DBLFLT);
/* reserved */
TRAPHANDLER(TSS_HANDLER, T_TSS);
TRAPHANDLER(SEGNP_HANDLER, T_SEGNP);
TRAPHANDLER(STACK_HANDLER, T_STACK);
TRAPHANDLER(GPFLT_HANDLER, T_GPFLT);
TRAPHANDLER(PGFLT_HANDLER, T_PGFLT);
/* reserved */
TRAPHANDLER_NOEC(FPERR_HANDLER, T_FPERR);
TRAPHANDLER(ALIGN_HANDLER, T_ALIGN);
TRAPHANDLER_NOEC(MCHK_HANDLER, T_MCHK);
TRAPHANDLER_NOEC(SIMDERR_HANDLER, T_SIMDERR);

然後在trap.c中定義我們的處理程式,然後使用SETGATE載入IDT。

……
	// LAB 3: Your code here.
	void DIVIDE_HANDLER();
	void DEBUG_HANDLER();
	void NMI_HANDLER();
	void BRKPT_HANDLER();
	void OFLOW_HANDLER();
	void BOUND_HANDLER();
	void ILLOP_HANDLER();
	void DEVICE_HANDLER();
	void DBLFLT_HANDLER();
	/* T_COPROC 9 reserved */
	void TSS_HANDLER();
	void SEGNP_HANDLER();
	void STACK_HANDLER();
	void GPFLT_HANDLER();
	void PGFLT_HANDLER();
	/* T_RES 15 reserved */
	void FPERR_HANDLER();
	void ALIGN_HANDLER();
	void MCHK_HANDLER();
	void SIMDERR_HANDLER();


	SETGATE(idt[T_DIVIDE], 0, GD_KT, DIVIDE_HANDLER, 0);
	SETGATE(idt[T_DEBUG], 0, GD_KT, DEBUG_HANDLER, 0);
	SETGATE(idt[T_NMI], 0, GD_KT, NMI_HANDLER, 0);
	SETGATE(idt[T_BRKPT], 0, GD_KT, BRKPT_HANDLER, 0);
	SETGATE(idt[T_OFLOW], 0, GD_KT, OFLOW_HANDLER, 0);
	SETGATE(idt[T_BOUND], 0, GD_KT, BOUND_HANDLER, 0);
	SETGATE(idt[T_ILLOP], 0, GD_KT, ILLOP_HANDLER, 0);
	SETGATE(idt[T_DEVICE], 0, GD_KT, DEVICE_HANDLER, 0);
	SETGATE(idt[T_DBLFLT], 0, GD_KT, DBLFLT_HANDLER, 0);
	/* reserved */
	SETGATE(idt[T_TSS], 0, GD_KT, TSS_HANDLER, 0);
	SETGATE(idt[T_SEGNP], 0, GD_KT, SEGNP_HANDLER, 0);
	SETGATE(idt[T_STACK], 0, GD_KT, STACK_HANDLER, 0);
	SETGATE(idt[T_GPFLT], 0, GD_KT, GPFLT_HANDLER, 0);
	SETGATE(idt[T_PGFLT], 0, GD_KT, PGFLT_HANDLER, 0);
	/* reserved */
	SETGATE(idt[T_FPERR], 0, GD_KT, FPERR_HANDLER, 0);
	SETGATE(idt[T_ALIGN], 0, GD_KT, ALIGN_HANDLER, 0);
	SETGATE(idt[T_MCHK], 0, GD_KT, MCHK_HANDLER, 0);
	SETGATE(idt[T_SIMDERR], 0, GD_KT, SIMDERR_HANDLER, 0);
……

現在在trapentry.S中實現_alltraps:

Your _alltraps should:

  • push values to make the stack look like a struct Trapframe
  • load GD_KD into %ds and %es
  • pushl %esp to pass a pointer to the Trapframe as an argument to trap()
  • call trap (can trap ever return?)
_alltraps:
	pushl %ds
	pushl %es
	pushal
	movw $GD_KD, %ax 
	movw %ax, %ds
	movw %ax, %es
	pushl %esp
	call trap

如果是從使用者模式轉向核心模式,那麼此時棧內的元素已經有ss,esp,eflags,cs,eip和可選的err。我們需要手動儲存ds和es,然後儲存通用暫存器。這個和env_pop_tf的順序剛好相反。然後呼叫trap函數進行處理。


Part B: Page Faults, Breakpoints Exceptions, and System Calls

Handling Page Faults

static void
trap_dispatch(struct Trapframe *tf)
{
	// Handle processor exceptions.
	// LAB 3: Your code here.
	switch(tf->tf_trapno) {
		case T_PGFLT:
			page_fault_handler(tf);
			return;
		default:
			break;
	}
	// Unexpected trap: The user process or the kernel has a bug.
	print_trapframe(tf);
	if (tf->tf_cs == GD_KT)
		panic("unhandled trap in kernel");
	else {
		env_destroy(curenv);
		return;
	}
}

根據tf->tf_trapno來選擇對應的處理程式。

The Breakpoint Exception

		case T_BRKPT:
			monitor(tf);
			return;

在switch中新增一項即可。修改 SETGATE(idt[T_BRKPT], 0, GD_KT, BRKPT_HANDLER, 0);的最後一項引數為3。因為該函數的最後一項引數為描述符的特權等級,此處應該設定為使用者所在的等級。

System calls

增加一項系統呼叫對應的處理程式。需要在trapentry.S中新增:

TRAPHANDLER_NOEC(SYSCALL_HANDLER, T_SYSCALL);

trap_init中新增:

	void SYSCALL_HANDLER();
	SETGATE(idt[T_SYSCALL], 0, GD_KT, SYSCALL_HANDLER, 3);

在switch中新增:

case T_SYSCALL:
			tf->tf_regs.reg_eax = syscall(tf->tf_regs.reg_eax, 
										tf->tf_regs.reg_edx, 
										tf->tf_regs.reg_ecx,
										tf->tf_regs.reg_ebx, 
										tf->tf_regs.reg_edi,
					 					tf->tf_regs.reg_esi);
			return;

要注意系統呼叫的返回值儲存在eax暫存器中。

syscall:

// Dispatches to the correct kernel function, passing the arguments.
int32_t
syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
	// Call the function corresponding to the 'syscallno' parameter.
	// Return any appropriate return value.
	// LAB 3: Your code here.

	// panic("syscall not implemented");
	int32_t result;
	switch (syscallno) {
		case SYS_cgetc :
			result = sys_cgetc();
			break;
		case SYS_cputs :
			sys_cputs((char *)a1, a2);
			result = 0;
			break;
		case SYS_env_destroy :
			result = sys_env_destroy((envid_t)a1);
			break;
		case SYS_getenvid :
			result = sys_getenvid();
			break;
		default:
			result = -E_INVAL;
	}
	return result;
}

User-mode startup

void
libmain(int argc, char **argv)
{
	// set thisenv to point at our Env structure in envs[].
	// LAB 3: Your code here.
	thisenv = &envs[ENVX(sys_getenvid())];
	// save the name of the program so that panic() can use it
	if (argc > 0)
		binaryname = argv[0];

	// call user main routine
	umain(argc, argv);

	// exit gracefully
	exit();
}

Page faults and memory protection

page_fault_handler

void
page_fault_handler(struct Trapframe *tf)
{
	uint32_t fault_va;

	// Read processor's CR2 register to find the faulting address
	fault_va = rcr2();

	// Handle kernel-mode page faults.
	if ((tf->tf_cs & 3) == 0) {
		panic("page-fault in kernel!\n");
	}
	// LAB 3: Your code here.

	// We've already handled kernel-mode exceptions, so if we get here,
	// the page fault happened in user mode.

	// Destroy the environment that caused the fault.
	cprintf("[%08x] user fault va %08x ip %08x\n",
		curenv->env_id, fault_va, tf->tf_eip);
	print_trapframe(tf);
	env_destroy(curenv);
}

根據cs的低2位來判斷,00為核心模式。
user_mem_check

// Check that an environment is allowed to access the range of memory
// [va, va+len) with permissions 'perm | PTE_P'.
// Normally 'perm' will contain PTE_U at least, but this is not required.
// 'va' and 'len' need not be page-aligned; you must test every page that
// contains any of that range.  You will test either 'len/PGSIZE',
// 'len/PGSIZE + 1', or 'len/PGSIZE + 2' pages.
//
// A user program can access a virtual address if (1) the address is below
// ULIM, and (2) the page table gives it permission.  These are exactly
// the tests you should implement here.
//
// If there is an error, set the 'user_mem_check_addr' variable to the first
// erroneous virtual address.
//
// Returns 0 if the user program can access this range of addresses,
// and -E_FAULT otherwise.
//
int
user_mem_check(struct Env *env, const void *va, size_t len, int perm)
{
	// LAB 3: Your code here.
	int result = 0;
	uint32_t cva = (uint32_t)va;
	void *down = ROUNDDOWN((void *)cva, PGSIZE);
	void *up = ROUNDUP((void *)cva + len, PGSIZE);
	for (; down < up; down += PGSIZE) {
		if((uint32_t)down >= ULIM) {
			user_mem_check_addr = (uint32_t)down;
			if((uint32_t)down < cva) user_mem_check_addr = cva;
			result = -E_FAULT;
			break;
		}
		pte_t *pte = pgdir_walk(env->env_pgdir, down, 0);
		if(!pte || ((*pte) & (perm | PTE_P)) != (perm | PTE_P)){
			user_mem_check_addr = (uint32_t)down;
			if((uint32_t)down < cva) user_mem_check_addr = cva;
			result = -E_FAULT;
			break;
		}

	}
	return result;
}

根據註釋要求寫即可。

// Print a string to the system console.
// The string is exactly 'len' characters long.
// Destroys the environment on memory errors.
static void
sys_cputs(const char *s, size_t len)
{
	// Check that the user has permission to read memory [s, s+len).
	// Destroy the environment if not.

	// LAB 3: Your code here.
	user_mem_assert(curenv, s, len, PTE_U);
	// Print the string supplied by the user.
	cprintf("%.*s", len, s);
}

make grade

divzero: OK (1.3s) 
softint: OK (2.0s) 
badsegment: OK (1.0s) 
Part A score: 30/30

faultread: OK (2.0s) 
faultreadkernel: OK (1.1s) 
faultwrite: OK (1.9s) 
faultwritekernel: OK (1.7s) 
breakpoint: OK (1.3s) 
testbss: OK (2.0s) 
hello: OK (1.9s) 
buggyhello: OK (1.1s) 
buggyhello2: OK (1.0s) 
evilhello: OK (1.6s) 
Part B score: 50/50

Score: 80/80