MIT6.828-Lab5

Introduction

In this lab, you will implement spawn, a library call that loads and runs on-disk executables. You will then flesh out your kernel and library operating system enough to run a shell on the console. These features need a file system, and this lab introduces a simple read/write file system.

File system preliminaries

This file system will provide the basic features: creating, reading, writing, and deleting files organized in a hierarchical directory structure. But does not support the UNIX notions of file owenership or permissions, and also doe not support hard links, symbolic links, time stamps, or special device files like most UNIX file systems do.

On-Disk File System Structure

Since our file system will not support hard links, we do not need this level of indirection and therefore can make a convenient simplification: our file system will not use inodes at all and instead will simply store all of a file’s (or sub-directory’s) meta-data within the (one and only) directory entry describing that file.
The file system environment handles all modifications to directories internally as a part of performing actions such as file creation and deletion. Our file system does allow user environments to read directory meta-data directly (e.g., with read), which means that user environments can perform directory scanning operations themselves (e.g., to implement the ls program).

Sectors and Blocks

Most disks cannot perform reads and writes at byte granularity and instead perform reads and writes in units of sectors. Be wary of the distinction between the two terms: sector size is a property of the disk hardware, whereas block size is an aspect of the operating system using the disk.
Most modern file systems use a larger block size, however, because storage space has gotten much cheaper and it is more efficient to manage storage at larger granularities. Our file system will use a block size of 4096 bytes, conveniently matching the processor’s page size.

Superblocks

Block 0 is typically reserved to hold boot loaders and partition tables, so file systems generally do not use the very first disk block. Many “real” file systems maintain multiple superblocks, replicated throughout several widely-spaced regions of the disk, so that if one of them is corrupted or the disk develops a media error in that region, the other superblocks can still be found and used to access the file system.

File Meta-data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct File {
char f_name[MAXNAMELEN]; // filename
off_t f_size; // file size in bytes
uint32_t f_type; // file type

// Block pointers.
// A block is allocated if its value is != 0.
uint32_t f_direct[NDIRECT]; // direct blocks
uint32_t f_indirect; // indirect block

// Pad out to 256 bytes; must do arithmetic in case we're compiling
// fsformat on a 64-bit machine.
uint8_t f_pad[256 - MAXNAMELEN - 8 - 4*NDIRECT - 4];
} __attribute__((packed)); // required only on some 64-bit machines

The File System

The goal for this lab is not to have you implement the entire file system, but for you to implement only certain key components. In particular, you will be responsible for reading blocks into the block cache and flushing them back to disk; allocating disk blocks; mapping file offsets to disk blocks; and implementing read, write, and open in the IPC interface.

Disk Access

In JOS, instead of taking the conventional “monolithic” operating system strategy of adding an IDE disk driver to the kernel along with the necessary system calls to allow the file system to access disk, we instead implement the IDE disk driver as part of the user-level file system environment.
So we have two strategies to implement disk access in user space, one is relyong on polling, “programmed I/O” (PIO)-based disk access and do not use disk interrupts, the other is implementing interrupt-driven device drivers in user mode, it needs the kernel field device interrupts and dispatch them to the correct user-mode environment. JOS chooses the former.
The x86 processor uses the IOPL bits in the EFLAGS register to determine whether protected-mode code is allowed to perform special device I/O instructions such as the IN and OUT instructions. Since all of the IDE disk registers we need to access are located in the x86’s I/O space rather than being memory-mapped, giving “I/O privilege” to the file system environment is the only thing we need to do in order to allow the file system to access these registers. In effect, the IOPL bits in the EFLAGS register provides the kernel with a simple “all-or-nothing” method of controlling whether user-mode code can access I/O space. In our case, we want the file system environment to be able to access I/O space, but we do not want any other environments to be able to access I/O space at all.

Details of IOPL

resoluton here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void
env_create(uint8_t *binary, enum EnvType type)
{
// LAB 3: Your code here.

// If this is the file server (type == ENV_TYPE_FS) give it I/O privileges.
// LAB 5: Your code here.
struct Env *new_e;
env_alloc(&new_e, 0);
load_icode(new_e, binary);
new_e->env_type = type;
if (type == ENV_TYPE_FS) {
new_e->env_tf.tf_eflags |= FL_IOPL_MASK;
}
}

The Block Cache

In our file system, we will implement a simple “buffer cache” (really just a block cache) with the help of the processor’s virtual memory system.
We reserve a large, fixed 3GB region of the file system environment’s address space, from 0x10000000 (DISKMAP) up to 0xD0000000 (DISKMAP+DISKMAX), as a “memory mapped” version of the disk.
Of course, it would take a long time to read the entire disk into memory, so instead we’ll implement a form of demand paging, wherein we only allocate pages in the disk map region and read the corresponding block from the disk in response to a page fault in this region.

resolution here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
void
flush_block(void *addr)
{
uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;
int r;

if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
panic("flush_block of bad va %08x", addr);

// LAB 5: Your code here.
addr = (void *) ROUNDDOWN(addr, BLKSIZE);

if ((r = ide_write(blockno * BLKSECTS, addr, BLKSECTS)) < 0)
panic("in flush_block, ide_write: %e", r);

if ((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0)
panic("in flush_block, sys_page_map: %e", r);
}

static void
bc_pgfault(struct UTrapframe *utf)
{
void *addr = (void *) utf->utf_fault_va;
uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;
int r;

// Check that the fault was within the block cache region
if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
panic("page fault in FS: eip %08x, va %08x, err %04x",
utf->utf_eip, addr, utf->utf_err);

// Sanity check the block number.
if (super && blockno >= super->s_nblocks)
panic("reading non-existent block %08x\n", blockno);

// Allocate a page in the disk map region, read the contents
// of the block from the disk into that page.
// Hint: first round addr to page boundary. fs/ide.c has code to read
// the disk.
//
// LAB 5: you code here:
addr = ROUNDDOWN(addr, BLKSIZE);

if ((r = sys_page_alloc(0, addr, PTE_W | PTE_U)) < 0)
panic("in bc_pgfault, sys_page_alloc: %e", r);

if ((r = ide_read(blockno * BLKSECTS, addr, BLKSECTS)) < 0)
panic("in bc_pgfault, ide_read: %e", r);

// Clear the dirty bit for the disk block page since we just read the
// block from disk
if ((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0)
panic("in bc_pgfault, sys_page_map: %e", r);

// Check that the block we read was allocated. (exercise for
// the reader: why do we do this *after* reading the block
// in?)
// 如果块包含旧数据,这样做可以避免数据的不一致性
if (bitmap && block_is_free(blockno))
panic("reading free block %08x\n", blockno);
}

The Block Bitmap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int
alloc_block(void)
{
// The bitmap consists of one or more blocks. A single bitmap block
// contains the in-use bits for BLKBITSIZE blocks. There are
// super->s_nblocks blocks in the disk altogether.

// LAB 5: Your code here.
size_t i;
// bitmap 1表示是空闲的,0表示的是占用的
for (i = 1; i < super->s_nblocks; i++) {
if (block_is_free(i)) {
bitmap[i / 32] &= ~(1 << (i % 32));
flush_block(&bitmap[i / 32]); // 将bitmap在硬盘中刷新
return i;
}
}
return -E_NO_DISK;
}

File Operations

file_block_walk and file_get_block are the workhorses of the file system. For example, file_read and file_write are little more than the bookkeeping atop file_get_block necessary to copy bytes between scattered blocks and a sequential buffer.

Solution here, note that *ppdiskbno modified by file_block_walk may be zero, and remeber to flush_block after clear block cache.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
static int
file_block_walk(struct File *f, uint32_t filebno, uint32_t **ppdiskbno, bool alloc)
{
// LAB 5: Your code here.
if (filebno >= NDIRECT + NINDIRECT)
return -E_INVAL;

if (filebno < NDIRECT) {
*ppdiskbno = &(f->f_direct[filebno]);
return 0;
}
else {
if (f->f_indirect == 0 && alloc == true) {
f->f_indirect = alloc_block();
if (f->f_indirect < 0) return f->f_indirect;
memset(diskaddr(f->f_indirect), 0, BLKSIZE);
flush_block(diskaddr(f->f_indirect));
}
else if (f->f_indirect == 0 && alloc == false) {
return -E_NOT_FOUND;
}
*ppdiskbno = (uint32_t*)diskaddr(f->f_indirect) + filebno - NDIRECT;
return 0;
}
}
int
file_get_block(struct File *f, uint32_t filebno, char **blk)
{
// LAB 5: Your code here.
if (filebno > f->f_size / BLKSIZE)
return -E_INVAL;

uint32_t *bno;
if (file_block_walk(f, filebno, &bno, true) < 0)
return -E_NO_DISK;

if (*bno == 0) {
if ((*bno = alloc_block()) < 0)
return -E_NO_DISK;
memset(diskaddr(*bno), 0, BLKSIZE);
flush_block(diskaddr(*bno));
}

*blk = (char *)diskaddr(*bno);
return 0;
}

The file system interface

Now that we have the necessary functionality within the file system environment itself, we must make it accessible to other environments that wish to use the file system. Since other environments can’t directly call functions in the file system environment, we’ll expose access to the file system environment via a remote procedure call, or RPC, abstraction, built atop JOS’s IPC mechanism. Graphically, here’s what a call to the file system server (say, read) looks like:

Exercise 5 & 6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// fs/serv.c
// read不应该管count大于buf大小的问题
int
serve_read(envid_t envid, union Fsipc *ipc)
{
struct Fsreq_read *req = &ipc->read;
struct Fsret_read *ret = &ipc->readRet;
struct OpenFile *o;
int r;

if (debug)
cprintf("serve_read %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

// Lab 5: Your code here:
if ((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
if ((r = file_read(o->o_file, ret->ret_buf, req->req_n, o->o_fd->fd_offset)) < 0)
return r;

o->o_fd->fd_offset += r;
return r;
}
int
serve_write(envid_t envid, struct Fsreq_write *req)
{
struct OpenFile *o;
size_t sz;
int r;

if (debug)
cprintf("serve_write %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

// LAB 5: Your code here.
if ((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
if ((r = file_write(o->o_file, req->req_buf, req->req_n, o->o_fd->fd_offset)) < 0)
return r;

o->o_fd->fd_offset += r;
return r;
}
// lib/file.c
static ssize_t
devfile_write(struct Fd *fd, const void *buf, size_t n)
{
// Make an FSREQ_WRITE request to the file system server. Be
// careful: fsipcbuf.write.req_buf is only so large, but
// remember that write is always allowed to write *fewer*
// bytes than requested.
// LAB 5: Your code here
fsipcbuf.write.req_fileid = fd->fd_file.id;
fsipcbuf.write.req_n = MIN(n, sizeof(fsipcbuf.write.req_buf));
memmove(fsipcbuf.write.req_buf, buf, fsipcbuf.write.req_n);
return fsipc(FSREQ_WRITE, NULL);
}

Spawning Process

code here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static int
sys_env_set_trapframe(envid_t envid, struct Trapframe *tf)
{
// LAB 5: Your code here.
// Remember to check whether the user has supplied us with a good
// address!
struct Env *env;
int r;
if ( (r = envid2env(envid, &env, 1)) < 0)
return r;

user_mem_assert(env, tf, sizeof(struct Trapframe), PTE_U);
env->env_tf = *tf;
env->env_tf.tf_cs |= 0x3;
env->env_tf.tf_eflags &= (~FL_IOPL_MASK);
env->env_tf.tf_eflags |= FL_IF;
return 0;
}

Sharing library state across fork and spawn

File descriptor state is kept in user-space memory. On fork, this memory will be marked copy-on-write, so the state will be duplicated rather than shared. On spawn, the memory will be left behind, not copied at all.
We will change fork and spawn to know that certain regions of memory are used by the “library operating system” and should always be shared.

My code here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// Copy the mappings for shared pages into the child address space.
static int
copy_shared_pages(envid_t child)
{
// LAB 5: Your code here.
unsigned pn;
for (pn = PGNUM(UTEXT); pn < PGNUM(USTACKTOP); pn++) {
if ((uvpd[pn >> 10] & PTE_P) && (uvpt[pn] & PTE_P) && (uvpt[pn] & PTE_SHARE)) {
void *addr = (void *)(pn * PGSIZE);
if(sys_page_map(0, addr, child, addr, uvpt[pn] & PTE_SYSCALL) < 0)
panic("duppage: single sys_page_map failed.");
}
}
return 0;
}
static int
duppage(envid_t envid, unsigned pn)
{
int r;

// LAB 4: Your code here.

void *addr = (void *)(pn * PGSIZE);
if (uvpt[pn] & PTE_SHARE) {
if(sys_page_map(0, addr, envid, addr, uvpt[pn] & PTE_SYSCALL) < 0)
panic("duppage: single sys_page_map failed.");
}
else if((uvpt[pn] & PTE_W) || (uvpt[pn] & PTE_COW)) {
if(sys_page_map(0, addr, envid, addr, PTE_U | PTE_COW | PTE_P) < 0)
panic("duppage: parent->child sys_page_map failed.");
if(sys_page_map(0, addr, 0, addr, PTE_U | PTE_COW | PTE_P) < 0)
panic("duppage: self sys_page_map failed.");
}
else {
if(sys_page_map(0, addr, envid, addr, PTE_P | PTE_U) < 0)
panic("duppage: single sys_page_map failed.");
}
return 0;
}

The keyboard interface

Use the keyboard and serial drivers already implemented, to handle trap keyboard interrupt and serial interrupt.

1
2
3
4
5
6
7
8
9
10
11
// Handle keyboard and serial interrupts.
// LAB 5: Your code here.
if (tf->tf_trapno == IRQ_OFFSET + IRQ_KBD) {
kbd_intr();
return;
}

if (tf->tf_trapno == IRQ_OFFSET + IRQ_SERIAL) {
serial_intr();
return;
}

MIT6.828-Lab5
http://bugeater.space/2024/03/25/MIT6-828-Lab5/
Author
BugEater
Posted on
March 25, 2024
Licensed under