MIT 6.S081 Lab3: Page tables

1. Print a page table

1.1 Description

Define a function called vmprint(). It should take a pagetable_t argument, and print that pagetable in the format described below. Insert if(p->pid==1) vmprint(p->pagetable) in exec.c just before the return argc, to print the first process’s page table. You receive full credit for this assignment if you pass the pte printout test of make grade.

1.2 Implementation

  • vmprint函数添加到kernel/vm.c文件中并在kernel/defs.h文件中添加该函数的声明。使用格式符%p打印16进制数,使用kernel/riscv.h文件中定义的宏, 参考freewalk函数以递归的形式完成。实现该函数方便接下来的调试, 可以在gdb中测试其正确性。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    void vmprint(pagetable_t pagetable, int depth) {
    if (depth == 0)
    printf("page table %p\n", pagetable);
    if (depth == 3) // terminate condition.
    return;
    for (int i = 0; i < 512; ++i) {
    pte_t pte = pagetable[i];
    if (pte & PTE_V) {
    uint64 pa = PTE2PA(pte);
    for (int j = 0; j < depth; ++j) {
    printf(".. ");
    }
    printf("..%d: pte %p pa %p\n", i, pte, pa);
    vmprint((pagetable_t)pa, ++depth); // recursive.
    pagetable[i] = 0;
    }
    }
    }

2. A kernel page table per process

2.1 Description

Your first job is to modify the kernel so that every process uses its own copy of the kernel page table when executing in the kernel. Modify struct proc to maintain a kernel page table for each process, and modify the scheduler to switch kernel page tables when switching processes. For this step, each per-process kernel page table should be identical to the existing global kernel page table. You pass this part of the lab if usertests runs correctly.

  • Add a field to struct proc for the process’s kernel page table.

  • A reasonable way to produce a kernel page table for a new process is to implement a modified version of kvminit that makes a new page table instead of modifying kernel_pagetable. You’ll want to call this function from allocproc.

  • Make sure that each process’s kernel page table has a mapping for that process’s kernel stack. In unmodified xv6, all the kernel stacks are set up in procinit. You will need to move some or all of this functionality to allocproc.

  • Modify scheduler() to load the process’s kernel page table into the core’s satp register (see kvminithart for inspiration). Don’t forget to call sfence_vma() after calling w_satp().

  • scheduler() should use kernel_pagetable when no process is running.

  • Free a process’s kernel page table in freeproc.

  • You’ll need a way to free a page table without also freeing the leaf physical memory pages.

  • vmprint may come in handy to debug page tables.

  • It’s OK to modify xv6 functions or add new functions; you’ll probably need to do this in at least kernel/vm.c and kernel/proc.c. (But, don’t modify kernel/vmcopyin.c, kernel/stats.c, user/usertests.c, and user/stats.c.)

  • A missing page table mapping will likely cause the kernel to encounter a page fault. It will print an error that includes sepc=0x00000000XXXXXXXX. You can find out where the fault occurred by searching for XXXXXXXX in kernel/kernel.asm.

    2.2 Implementation

  • proc.h进程的结构体中加入内核页表的属性pkpagetable

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    struct proc {
    struct spinlock lock;

    // p->lock must be held when using these:
    enum procstate state; // Process state
    struct proc *parent; // Parent process
    void *chan; // If non-zero, sleeping on chan
    int killed; // If non-zero, have been killed
    int xstate; // Exit status to be returned to parent's wait
    int pid; // Process ID

    // these are private to the process, so p->lock need not be held.
    uint64 kstack; // Virtual address of kernel stack
    uint64 sz; // Size of process memory (bytes)
    pagetable_t pagetable; // User page table
    pagetable_t pkpagetable; // process's kernel page table
    struct trapframe *trapframe; // data page for trampoline.S
    struct context context; // swtch() here to run process
    struct file *ofile[NOFILE]; // Open files
    struct inode *cwd; // Current directory
    char name[16]; // Process name (debugging)
    };
  • 实现kvminit函数的另一个版本来初始化进程的内核页表。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    // create processes's kernel table
    pagetable_t
    ukvmcreate()
    {
    pagetable_t kpagetable = (pagetable_t) kalloc();
    memset(kpagetable, 0, PGSIZE);
    uvmmap(kpagetable, UART0, UART0, PGSIZE, PTE_R | PTE_W);
    uvmmap(kpagetable, VIRTIO0, VIRTIO0, PGSIZE, PTE_R | PTE_W);
    uvmmap(kpagetable, CLINT, CLINT, 0x10000, PTE_R | PTE_W);
    uvmmap(kpagetable, PLIC, PLIC, 0x400000, PTE_R | PTE_W);
    uvmmap(kpagetable, KERNBASE, KERNBASE, (uint64)etext-KERNBASE, PTE_R | PTE_X);
    uvmmap(kpagetable, (uint64)etext, (uint64)etext, PHYSTOP-(uint64)etext, PTE_R | PTE_W);
    uvmmap(kpagetable, TRAMPOLINE, (uint64)trampoline, PGSIZE, PTE_R | PTE_X);
    return kpagetable;
    }
  • 以及专门将映射加入进程的内核页表的函数uvmmap

    1
    2
    3
    4
    5
    6
    7
    8
    // add a mapping to processes's kernel
    // page table
    void
    uvmmap(pagetable_t pagetable, uint64 va, uint64 pa, uint64 sz, int perm)
    {
    if(mappages(pagetable, va, sz, pa, perm) != 0)
    panic("uvmmap");
    }
  • 确保每个进程的内核页表中包含该进程所使用到的内核栈的映射。可以将原来在boot time进程初始化procinit函数中全局内核页表映射内核栈的代码注释掉。应题目要求在allocproc函数中实现映射当前进程所对应的内核栈。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    // Allocate a page for the process's kernel stack.
    // Map it high in memory, followed by an invalid
    // guard page.
    char *pa = kalloc(); // allocate physical memory per page.
    if(pa == 0)
    panic("kalloc");
    uint64 va = KSTACK((int) (p - proc)); // virtual address.
    // make sure each process's kernel page table has a mapping
    // for that process's kernel stack.
    uvmmap(p->pkpagetable, va, (uint64)pa, PGSIZE, PTE_R | PTE_W);
    p->kstack = va;
  • scheduler函数内实现,模仿kvminithart函数在调度时切换页表,即将进程的内核页表的地址放入satp寄存器,相应地刷新TLB,当进程没有在运行时,调度器切换回全局内核页表。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    // Per-CPU process scheduler.
    // Each CPU calls scheduler() after setting itself up.
    // Scheduler never returns. It loops, doing:
    // - choose a process to run.
    // - swtch to start running that process.
    // - eventually that process transfers control
    // via swtch back to the scheduler.
    void
    scheduler(void)
    {
    struct proc *p;
    struct cpu *c = mycpu();

    c->proc = 0;
    for(;;){
    // Avoid deadlock by ensuring that devices can interrupt.
    intr_on();

    int found = 0;
    for(p = proc; p < &proc[NPROC]; p++) {
    acquire(&p->lock);
    if(p->state == RUNNABLE) {
    // Switch to chosen process. It is the process's job
    // to release its lock and then reacquire it
    // before jumping back to us.
    p->state = RUNNING;
    c->proc = p;

    // load the process's kernel page table into the core's satp register.
    w_satp(MAKE_SATP(p->pkpagetable));
    sfence_vma(); // flush the TLB.

    swtch(&c->context, &p->context);
    // use kernel_pagetable when no process is running
    kvminithart();

    // Process is done running for now.
    // It should have changed its p->state before coming back.
    c->proc = 0;

    found = 1;
    }
    release(&p->lock);
    }
    #if !defined (LAB_FS)
    if(found == 0) {
    intr_on();
    asm volatile("wfi"); // wait for interrupt.
    }
    #else
    ;
    #endif
    }
    }
  • freeproc函数中完成释放进程的内核页表操作, 同时实现函数freeukpagetable, 在释放页表时不释放掉leaf

    1
    2
    3
    if(p->pkpagetable) {
    freeukpagetable(p->pkpagetable);
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    void
    freeukpagetable(pagetable_t pagetable) {
    // there are 2^9 = 512 PTEs in a page table.
    for(int i = 0; i < 512; i++){
    pte_t pte = pagetable[i];
    if (pte & PTE_V) {
    pagetable[i] = 0;
    if ((pte & (PTE_R|PTE_W|PTE_X)) == 0){ // not leaf.
    // this PTE points to a lower-level page table.
    uint64 child = PTE2PA(pte);
    freeukpagetable((pagetable_t)child);
    }
    }
    }
    kfree((void*)pagetable);
    }

    3. Simplify

    3.1 Description

    Your job in this part of the lab is to add user mappings to each process’s kernel page table (created in the previous section) that allow copyin (and the related string function copyinstr) to directly dereference user pointers. Replace the body of copyin in kernel/vm.c with a call to copyin_new (defined in kernel/vmcopyin.c); do the same for copyinstr and copyinstr_new. Add mappings for user addresses to each process’s kernel page table so that copyin_new and copyinstr_new work. You pass this assignment if usertests runs correctly and all the make grade tests pass.

  • Replace copyin() with a call to copyin_new first, and make it work, before moving on to copyinstr.

  • At each point where the kernel changes a process’s user mappings, change the process’s kernel page table in the same way. Such points include fork(), exec(), and sbrk().

  • Don’t forget that to include the first process’s user page table in its kernel page table in userinit.

  • What permissions do the PTEs for user addresses need in a process’s kernel page table? (A page with PTE_U set cannot be accessed in kernel mode.)
    Don’t forget about the above-mentioned PLIC limit.

    3.2 Implementation

  • copy_new函数替代copyin

    1
    2
    3
    4
    5
    6
    7
    8
    // Copy from user to kernel.
    // Copy len bytes to dst from virtual address srcva in a given page table.
    // Return 0 on success, -1 on error.
    int
    copyin(pagetable_t pagetable, char *dst, uint64 srcva, uint64 len)
    {
    return copyin_new(pagetable, dst, srcva, len);
    }
  • copyinstr_new函数替代copyinstr

    1
    2
    3
    4
    5
    6
    7
    8
    9
    // Copy a null-terminated string from user to kernel.
    // Copy bytes to dst from virtual address srcva in a given page table,
    // until a '\0', or max.
    // Return 0 on success, -1 on error.
    int
    copyinstr(pagetable_t pagetable, char *dst, uint64 srcva, uint64 max)
    {
    return copyinstr_new(pagetable, dst, srcva, max);
    }
  • fork()中针对用户页表映射修改,相应进程的内核页表做出的改动。并实现将用户页表copy到内核页表并清空相应PTE_U标志的函数copyupttokpt

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    int i, pid;
    struct proc *np;
    struct proc *p = myproc();

    // Allocate process.
    if((np = allocproc()) == 0){
    return -1;
    }

    // Copy user memory from parent to child.
    if(uvmcopy(p->pagetable, np->pagetable, p->sz) < 0){
    freeproc(np);
    release(&np->lock);
    return -1;
    }

    np->sz = p->sz;

    np->parent = p;

    copyupttokpt(np->pkpagetable, np->pagetable, 0, np->sz);


    // copy saved user registers.
    *(np->trapframe) = *(p->trapframe);

    // Cause fork to return 0 in the child.
    np->trapframe->a0 = 0;
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    void
    copyupttokpt(pagetable_t ker, pagetable_t user, uint64 old_size, uint64 new_size) {
    pte_t *pte_from, *pte_to;
    uint64 pa, i;
    uint flags;
    old_size = PGROUNDUP(old_size);
    for(i = old_size; i < new_size; i += PGSIZE){
    if((pte_from = walk(user, i, 0)) == 0)
    panic("copyupttokpt: walk");
    if((pte_to = walk(ker, i, 1)) == 0)
    panic("copyupttokpt: walk");
    pa = PTE2PA(*pte_from);
    // clear PTE_U bit because of kernel page table.
    flags = PTE_FLAGS(*pte_from) & (~PTE_U);
    *pte_to = PA2PTE(pa) | flags;
    }
    }
  • 函数exec()中,替换当前进程的用户页表时,内核页表也随之释放,并copy新用户页表的内容。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    // Save program name for debugging.
    for(last=s=path; *s; s++)
    if(*s == '/')
    last = s+1;
    safestrcpy(p->name, last, sizeof(p->name));

    // Commit to the user image.
    oldpagetable = p->pagetable;
    p->pagetable = pagetable;
    p->sz = sz;

    uvmunmap(p->pkpagetable, 0, PGROUNDUP(oldsz)/PGSIZE, 0);
    copyupttokpt(p->pkpagetable, p->pagetable, 0, p->sz);

    p->trapframe->epc = elf.entry; // initial program counter = main
    p->trapframe->sp = sp; // initial stack pointer
    proc_freepagetable(oldpagetable, oldsz);
  • 函数sbrk(),相应的增加和减少内存时变更用户页表,需要同时更新进程的内核页表。需要判断用户分配地址空间的限制。如果增加的内存大于PLIC则返回-1

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    int
    growproc(int n)
    {
    uint sz;
    struct proc *p = myproc();

    sz = p->sz;
    if(n > 0){
    if ((sz + n) >= PLIC) return -1; // test overflow.
    // mapping the n bytes in process's kernel table.
    if((sz = uvmalloc(p->pagetable, sz, sz + n)) == 0) {
    return -1;
    }
    copyupttokpt(p->pkpagetable, p->pagetable, sz-n, sz);
    } else if(n < 0){
    sz = uvmdealloc(p->pagetable, sz, sz + n);
    // ummaping the n bytes in process's kernel table.
    uvmunmap(p->pkpagetable, PGROUNDUP(sz), (PGROUNDUP(p->sz) - PGROUNDUP(sz))/PGSIZE, 0);
    }
    p->sz = sz;

    return 0;
    }
  • userinit,初始化第一个process时也需要更新进程的内核页表。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    // Set up first user process.
    void
    userinit(void)
    {
    struct proc *p;

    p = allocproc();
    initproc = p;

    // allocate one user page and copy init's instructions
    // and data into it.
    uvminit(p->pagetable, initcode, sizeof(initcode));
    p->sz = PGSIZE;
    copyupttokpt(p->pkpagetable, p->pagetable, 0, PGSIZE);

    // prepare for the very first "return" from kernel to user.
    p->trapframe->epc = 0; // user program counter
    p->trapframe->sp = PGSIZE; // user stack pointer

    safestrcpy(p->name, "initcode", sizeof(p->name));
    p->cwd = namei("/");

    p->state = RUNNABLE;

    release(&p->lock);
    }

总结

简单的写一下lab 3的总结吧,做之前最好把概念理清楚,确实蛮有难度的。

打印三级也页表页表项的虚拟地址以及物理地址,利用递归就可以完成,leaf pte的判断的判断条件即递归结束条件,可以参考vm.c中部分函数的源代码。leaf pte可以通过pte中的flag进行判断,事实上,也就只有leaf pte才含有PTE_W,PTE_X,PTE_R。

由于历史原因部分os将内核和用户进程独立分为两个page table,本lab的初衷是为了利用好进程之间的isolation,为每个用户进程分配一个内核页表(初始时的映射和全局内核页表相同, 其中此内核页表中含有用户页表的映射),释放进程的同时也要将进程的内核页表释放,注意不将叶子pte的物理内存释放,初始化时也是同样的。注意到procinit函数里已经为每个用户进程通过全局内核页表分配好了内核栈,hint中要求把该部分迁移到分配用户进程是时分配栈,另外kvmpa函数中也有个坑,需要将全局内核页表替换为当前用户进程的内核页表。

将用户进程页表载入到用户的内核页表后,系统调用传参时所用到的copyin或者copyinstr就不需要再间接的将该参数的虚拟地址通过用户页表转换为物理地址之后再处理了,简化了一步walk的操作,直接dereference即可。copy页表映射时可以参考vmcopy函数,fork中有相关页表复制的内容,但不是全部参考,可能会出现remap的情况,因此考虑不使用mappages函数即可。注意sbrk时相关函数growproc的overflow检测,以及虚拟地址空间减少时要从用户内核页表中unmap掉相关内容,增加时正常复制即可。exec函数中copy前需要将用户内核页表的映射清空。其他都按hint去做就行了,The devil is in the details.

补充

强调一下walk函数的实现,在xv6中即为遍历三级页表, 很清晰地刻画了真实的页表实现。PTE2PA宏将当前PTE的地址(物理地址)右移10位将10个有效位置0,左移12位加上偏移量(因为页大小为4kb, $2^{12}=4096=4k$, 所以偏移量为12, byte为寻址能力大小), 最后转化为物理地址。
记住页目录存在于物理地址空间中。虚拟地址空间中的L2, L1, L0字段确定每一级页目录中的Entry非叶子节点PPN(Physical page number)连结上全0的Offset为下一级页目录的物理地址
Page table

  • walk
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    pte_t *
    walk(pagetable_t pagetable, uint64 va, int alloc)
    {
    if(va >= MAXVA)
    panic("walk");

    for(int level = 2; level > 0; level--) {
    pte_t *pte = &pagetable[PX(level, va)]; // 根据虚拟地址的L字段获取页目录中的Entry
    if(*pte & PTE_V) {
    // 解引用获取当前Entry的地址, 将PPN(物理页号)和Offset(全0)连接后更下一级页目录的物理地址
    pagetable = (pagetable_t)PTE2PA(*pte);
    } else {
    if(!alloc || (pagetable = (pde_t*)kalloc()) == 0)
    return 0;
    memset(pagetable, 0, PGSIZE);
    *pte = PA2PTE(pagetable) | PTE_V;
    }
    }
    return &pagetable[PX(0, va)]; // 叶子节点处PPN+虚拟地址的Offset为最终转换的物理地址
    }
  • 1
    2
    3
    4
    5
    #define PGSHIFT 12  // bits of offset within a page
    #define PTE2PA(pte) (((pte) >> 10) << 12)
    #define PXMASK 0x1FF // 9 bits
    #define PXSHIFT(level) (PGSHIFT+(9*(level)))
    #define PX(level, va) ((((uint64) (va)) >> PXSHIFT(level)) & PXMASK)