0%

Chapter 3

Introduction

We need to place multiple app to multiple memory address to run app in cycle. Rather run once and clear for next.

First We want to place each app to each isolated addr, due to our kernel restriction, we need to load it with build.py.


Task

Task: a workflow process

Define every time slice of Task as Task Slice

Define the switch between app as Task switching

We need to store Task Context

Design:
switch

We will store these register in ctx:

1
2
3
4
5
6
7
// os/src/task/context.rs

pub struct TaskContext {
ra: usize,
sp: usize,
s: [usize; 12],
}
1
2
3
4
5
6
7
8
9
10
11
12
.altmacro
.macro SAVE_SN n
sd s\n, (\n+2)*8(a0)
.endm
.macro LOAD_SN n
ld s\n, (\n+2)*8(a1)
.endm
.section .text
.globl __switch
__switch:
...
# logic by store previous context and take out the new one to current register

Expose to Rust

1
2
3
4
5
6
7
8
9
10
11
12
// os/src/task/switch.rs

global_asm!(include_str!("switch.S"));

use super::TaskContext;

extern "C" {
pub fn __switch(
current_task_cx_ptr: *mut TaskContext,
next_task_cx_ptr: *const TaskContext
);
}

We will design TaskManager:

  • Store each App state array and current running app.
  • each state store TaskContext and TaskState for running or exited etc…
  • Init and ready by store the __restore ctx to TaskContext
  • Run for switch cx if needed.
1
2
3
4
5
6
7
8
9
10
let current_task_cx_ptr = &mut inner.tasks[current].task_cx as *mut TaskContext;
let next_task_cx_ptr = &inner.tasks[next].task_cx as *const TaskContext;
drop(inner);
// before this, we should drop local variables that must be dropped manually
unsafe {
__switch(
current_task_cx_ptr,
next_task_cx_ptr,
);
}rust

Dispatch Design

Collaboration

Manually design interface yield for App to use

1
2
3
4
5
6
7
pub fn sys_yield() -> isize {
syscall(SYSCALL_YIELD, [0, 0, 0])
}

// user/src/lib.rs

pub fn yield_() -> isize { sys_yield() }

But it can be inefficient for some case that app already done its work but reluctant to exit.

Preemptive

We will design interrupt clock in a fixed time bound to force switch between app.

  • Set timer design and get time
  • Set timer for trigger
  • enable timer and handle the interrupt cause of Timer in ecall

You should know this as a pre-knowledge:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
const SBI_SET_TIMER: usize = 0;

pub fn set_timer(timer: usize) {
sbi_call(SBI_SET_TIMER, timer, 0, 0);
}

// os/src/timer.rs

use crate::config::CLOCK_FREQ;
const TICKS_PER_SEC: usize = 100;

pub fn set_next_trigger() {
set_timer(get_time() + CLOCK_FREQ / TICKS_PER_SEC);
}

// os/src/trap/mod.rs

match scause.cause() {
Trap::Interrupt(Interrupt::SupervisorTimer) => {
set_next_trigger();
suspend_current_and_run_next();
}
}

// enable supervisor clock
// os/src/trap/mod.rs

use riscv::register::sie;

pub fn enable_timer_interrupt() {
unsafe { sie::set_stimer(); }
}

Chapter 2

Introduction

Introduction

It corresponds to riscv:

  • Privilege for S(guaranteed by Supervisor Execution Environment of RustSBI)
  • User for U(constructed in current chapter as Application Execution Environment)

Reason:

  • Safety(Prevent app from accessing kernel)
  • Recoverable

Workflow:

  • Start application and user-mode context
  • Trap(Called by system level) to handle system
    • Goes wrong! Kill it!
    • Finish! Next!
  • Restore to user-mode context

riscv designs following CSR(Control and Status Register) to handle this:

CSR

CSR

Begin Trap:

  • sstatus: SPP seg to the current level of CPU.
  • sepc: next addr after Trap finished.
  • scause/stval: Trap cause and additional info.
  • stvec: storage of entry addr of Trap

stvec is a 64-bit CSR, with:

  • MODE(Direct/Vectored) [1:0](read from right to left): 2-bits
  • BASE [63:2]: 62-bits

finally, it will return by instruction sret which will change level and jump by sepc.

Construct Trap

Design:

  • General register will be shared by U-level and S-level.
  • Maintain a reasonable state of CSR.
  • Separate workflow of U-level and S-level by stack

Construct:

  • build KernelStack and UserStack for separation
  • in KernelStack, we store TrapContext in it, by asm and rust to control dispatch and handle, then store the code to stvec as the entry of Trap.
  • restore register for UserStack by push a new context refer to UserStack.

build stack and push context:

1
2
3
4
5
6
7
8
9
10
11
12
13
// stack struct ...

// buttom to top
fn get_sp(&self) -> usize {
self.data.as_ptr() as usize + KERNEL_STACK_SIZE
}
pub fn push_context(&self, cx: TrapContext) -> &'static mut TrapContext {
let cx_ptr = (self.get_sp() - core::mem::size_of::<TrapContext>()) as *mut TrapContext;
unsafe {
*cx_ptr = cx;
}
unsafe { cx_ptr.as_mut().unwrap() }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// os/src/trap/context.rs

#[repr(C)]
pub struct TrapContext {
pub x: [usize; 32], // General register
pub sstatus: Sstatus,
pub sepc: usize,
}
/// set stack pointer to x_2 reg (sp)
pub fn set_sp(&mut self, sp: usize) {
self.x[2] = sp;
}
/// init app context
pub fn app_init_context(entry: usize, sp: usize) -> Self {
let mut sstatus = sstatus::read(); // CSR sstatus
sstatus.set_spp(SPP::User); //previous privilege mode: user mode
let mut cx = Self {
x: [0; 32],
sstatus,
sepc: entry, // entry point of app
};
cx.set_sp(sp); // app's user stack pointer
cx // return initial Trap Context of app
}

We will design __alltrap and __restore for operation by asm and part of rust:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
.altmacro
.macro SAVE_GP n
sd x\n, \n*8(sp)
.endm
.macro LOAD_GP n
ld x\n, \n*8(sp)
.endm
.align_2
__alltraps:
...
# set input argument of trap_handler(cx: &mut TrapContext)
mv a0, sp # sp is point to TrapContext in kernel stack
call trap_handler # (&mut TrapContext)

--restore:
...

To handle Trap context, we will use riscv lib:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// os/Cargo.toml

[dependencies]
riscv = { git = "https://github.com/rcore-os/riscv", features = ["inline-asm"] }

// os/src/trap/mod.rs
global_asm!(include_str!("trap.S"));

pub fn init() {
extern "C" { fn __alltraps(); }
// write to stvec
unsafe {
stvec::write(__alltraps as usize, TrapMode::Direct);
}
}

#[no_mangle]
pub fn trap_handler(cx: &mut TrapContext) -> &mut TrapContext {
...
}

restore operation:

1
2
3
4
5
6
7
extern "C" { fn __restore(cx_addr: usize); }
unsafe {
__restore(KERNEL_STACK.push_context(
TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp())
// This context store the ptr to UserStack for restoration
) as *const _ as usize);
}

Construct User App

  • Link app binary to kernel with specify memory layout
  • Read the layout, use AppManager to maintain and store
  • Load app from memory layout, copy consecutively to APP_BASE_ADDRESS(Currently we have no ability to dynamically read address)
  • AppManager will run each app
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# os/src/link_app.S

.align 3
.section .data
.global _num_app # read from the ptr
_num_app:
.quad 5
.quad app_0_start
.quad app_1_start
.quad app_2_start
.quad app_3_start
.quad app_4_start
.quad app_4_end`

...

Design it!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/batch.rs

struct AppManager {
num_app: usize,
current_app: usize,
app_start: [usize; MAX_APP_NUM + 1],
}

// part of read in static init of AppManager
let num_app_ptr = _num_app as usize as *const usize;
let num_app = num_app_ptr.read_volatile();
let mut app_start: [usize; MAX_APP_NUM + 1] = [0; MAX_APP_NUM + 1];
let app_start_raw: &[usize] = core::slice::from_raw_parts(
num_app_ptr.add(1), num_app + 1
);
app_start[..=num_app].copy_from_slice(app_start_raw);

Load App:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// part of code of copying to kernel
asm!("fence.i");
// clear app area
core::slice::from_raw_parts_mut(
APP_BASE_ADDRESS as *mut u8,
APP_SIZE_LIMIT
).fill(0);
let app_src = core::slice::from_raw_parts(
self.app_start[app_id] as *const u8,
self.app_start[app_id + 1] - self.app_start[app_id]
);
let app_dst = core::slice::from_raw_parts_mut(
APP_BASE_ADDRESS as *mut u8,
app_src.len()
);
app_dst.copy_from_slice(app_src);

Run each app!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// os/src/batch.rs

pub fn run_next_app() -> ! {
let mut app_manager = APP_MANAGER.exclusive_access();
let current_app = app_manager.get_current_app();
unsafe {
app_manager.load_app(current_app);
}
app_manager.move_to_next_app();
drop(app_manager);
// before this we have to drop local variables related to resources manually
// and release the resources
extern "C" { fn __restore(cx_addr: usize); }
unsafe {
__restore(KERNEL_STACK.push_context(
TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp())
) as *const _ as usize);
}
panic!("Unreachable in batch::run_current_app!");
}

// main logic:
// os/src/main.rs

...above code
// load entry for trap
trap::init()
// load app
batch::run_next_app()

Chapter 5

Introduction

After the demand of code execution one by one, we want to introduce Process which will be a full space-time description of execution process of binary file in OS. It means for one process, it should hold independent resources to be executed.

After Process, Thread and Coroutine are also developed in growth of OS. They are different in resources taken up, usually, Thread will be in one process and hold their own independent stack and workflow; Coroutine will be in one Thread and hold their own independent workflow.


Design

Every process need independent memory layout, can be dispatch by cpu. It’s the functionality based on Task, after that, each process can Fork their own children processes, so there’s a workflow in time discrepancy. Its resource can’t be recycled in time due to children processes, we need to mark it as Zombie Process.

To clarify which is children, which is parent, and each isolated process, we mark each with PID-Process Identifier. Notice if we fork a process, it will be same as parent only a0 which is the register called for return will be different, parent process return new PID of child process, child process return 0 as none of fork.

  • fork: copy a process state(like sp etc…) as its child process.
  • waitpid: wait a child become zombie and recycle all resources.
  • exec: clear a process state and load a execution file.

Data Constructon

We will recycle all presumed pid by PidAllocator, No need to worry about previous pid used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
pub struct PidHandle(pub usize);

struct PidAllocator {
current: usize,
recycled: Vec<usize>,
}

// impl PidAllocator
pub fn alloc(&mut self) -> PidHandle {
if let Some(pid) = self.recycled.pop() {
PidHandle(pid)
} else {
self.current += 1;
PidHandle(self.current - 1)
}
}

Therefore, if one pid recycled, it deallocated memory can be reused, we will define its KernelStack addr by pid.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// os/src/task/pid.rs

pub struct KernelStack {
pid: usize,
}

pub fn kernel_stack_position(app_id: usize) -> (usize, usize) {
let top = TRAMPOLINE - app_id * (KERNEL_STACK_SIZE + PAGE_SIZE);
let bottom = top - KERNEL_STACK_SIZE;
(bottom, top)
}

// impl KernelStack
pub fn push_on_top<T>(&self, value: T) -> *mut T where
T: Sized, {
let kernel_stack_top = self.get_top();
let ptr_mut = (kernel_stack_top - core::mem::size_of::<T>()) as *mut T;
unsafe { *ptr_mut = value; }
ptr_mut
}
pub fn get_top(&self) -> usize {
let (_, kernel_stack_top) = kernel_stack_position(self.pid);
kernel_stack_top
}

impl Drop for KernelStack {
fn drop(&mut self) {
let (kernel_stack_bottom, _) = kernel_stack_position(self.pid);
let kernel_stack_bottom_va: VirtAddr = kernel_stack_bottom.into();
KERNEL_SPACE
.exclusive_access()
.remove_area_with_start_vpn(kernel_stack_bottom_va.into());
}
}

We need to construct TaskControlBlock for parent and children process.

1
2
3
4
5
6
7
8
9
10
11
12
pub struct TaskControlBlockInner {
...
// new:
pub parent: Option<Weak<TaskControlBlock>>,
pub children: Vec<Arc<TaskControlBlock>>,
pub exit_code: i32,
}

// impl TaskControlBlockInner
pub fn is_zombie(&self) -> bool {
self.get_status() == TaskStatus::Zombie
}

TaskManager manage all tasks and cpu dispatch, we will separate only tasks management for TaskManager.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// os/src/task/manager.rs

pub struct TaskManager {
ready_queue: VecDeque<Arc<TaskControlBlock>>,
}

lazy_static! {
pub static ref TASK_MANAGER: UPSafeCell<TaskManager> = unsafe {
UPSafeCell::new(TaskManager::new())
};
}

pub fn add_task(task: Arc<TaskControlBlock>) {
TASK_MANAGER.exclusive_access().add(task);
}

pub fn fetch_task() -> Option<Arc<TaskControlBlock>> {
TASK_MANAGER.exclusive_access().fetch()
}

And cpu dispatch for newly introduced Processor.
We introduce a idle process that used to call other process.

  • Why not direct call next by previous one? rather use idle process?
  • Separate idle process for start and others for its own, then dispatch data won’t occur in other process and make the dispatch process invisible for Trap for each process.

The whole workflow would be:

  • idle process fetch task and switch
  • task run out of its time or finish
  • task switch to idle process
  • repeat…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
pub struct Processor {
current: Option<Arc<TaskControlBlock>>,
// idle process of current cpu
idle_task_cx: TaskContext,
}

lazy_static! {
pub static ref PROCESSOR: UPSafeCell<Processor> = unsafe {
UPSafeCell::new(Processor::new())
};
}

// run_tasks()
// loop to fetch task and switch possible task
if let Some(task) = fetch_task() {
let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
let mut task_inner = task.inner_exclusive_access();
let next_task_cx_ptr = &task_inner.task_cx as *const TaskContext;
task_inner.task_status = TaskStatus::Running;
drop(task_inner);
processor.current = Some(task);
drop(processor);
unsafe {
__switch(
idle_task_cx_ptr,
next_task_cx_ptr,
);
}
}

// switch to idle process if one task run out of its time.
pub fn schedule(switched_task_cx_ptr: *mut TaskContext) {
let mut processor = PROCESSOR.exclusive_access();
let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
drop(processor);
unsafe {
__switch(
switched_task_cx_ptr,
idle_task_cx_ptr,
);
}
}

Dispatch Construction

Previously, we use suspend_current_and_run_next to pause task and switch to next, now we need to adapt it to process design.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// os/src/task/mod.rs

pub fn suspend_current_and_run_next() {
let task = take_current_task().unwrap();

// ---- access current TCB exclusively
let mut task_inner = task.inner_exclusive_access();
let task_cx_ptr = &mut task_inner.task_cx as *mut TaskContext;
task_inner.task_status = TaskStatus::Ready;
drop(task_inner);
// ---- stop exclusively accessing current PCB

// add to task deque
add_task(task);
// change current to idle process
schedule(task_cx_ptr);
}

In previous case, task won’t be created by its parent task, but process will. So, if its TrapContext has been recycled, we need to refactor our trap_handler for such case.

1
2
3
4
5
6
7
8
9
10
11
// fn trap_handler() -> !
Trap::Exception(Exception::UserEnvCall) => {
// jump to next instruction anyway
let mut cx = current_trap_cx();
cx.sepc += 4;
// syscall may create new process and change trap context.
let result = syscall(cx.x[17], [cx.x[10], cx.x[11], cx.x[12]]);
// wether cx is changed or not, we will refetch it.
cx = current_trap_cx();
cx.x[10] = result as usize;
}

Now we will construct fork, exec, waitpid,exit.

Fork

We need to copy all memory layout and its task state. Then reallocate new kernel stack for it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// impl MemorySet
pub fn from_existed_user(user_space: &MemorySet) -> MemorySet {
let mut memory_set = Self::new_bare();
// map trampoline
memory_set.map_trampoline();
// copy data sections/trap_context/user_stack
for area in user_space.areas.iter() {
let new_area = MapArea::from_another(area);
memory_set.push(new_area, None);
// copy data from another space
for vpn in area.vpn_range {
let src_ppn = user_space.translate(vpn).unwrap().ppn();
let dst_ppn = memory_set.translate(vpn).unwrap().ppn();
dst_ppn.get_bytes_array().copy_from_slice(src_ppn.get_bytes_array());
}
}
memory_set
}

// impl TaskControlBlock
// fn fork
let trap_cx_ppn = memory_set
.translate(VirtAddr::from(TRAP_CONTEXT).into())
.unwrap()
.ppn();
// alloc a pid and a kernel stack in kernel space
let pid_handle = pid_alloc();
let kernel_stack = KernelStack::new(&pid_handle);
let kernel_stack_top = kernel_stack.get_top();
let task_control_block = Arc::new(TaskControlBlock {
pid: pid_handle,
kernel_stack,
inner: unsafe { UPSafeCell::new(TaskControlBlockInner {
trap_cx_ppn,
base_size: parent_inner.base_size,
task_cx: TaskContext::goto_trap_return(kernel_stack_top),
task_status: TaskStatus::Ready,
memory_set,
parent: Some(Arc::downgrade(self)),
children: Vec::new(),
exit_code: 0,
})},
});
// add child
parent_inner.children.push(task_control_block.clone());
// modify kernel_sp in trap_cx
// **** access children PCB exclusively
let trap_cx = task_control_block.inner_exclusive_access().get_trap_cx();
trap_cx.kernel_sp = kernel_stack_top;

Finally, implement sys_fork

1
2
3
4
5
6
7
8
9
10
11
12
pub fn sys_fork() -> isize {
let current_task = current_task().unwrap();
let new_task = current_task.fork();
let new_pid = new_task.pid.0;
let trap_cx = new_task.inner_exclusive_access().get_trap_cx();

// for child process, fork returns 0
trap_cx.x[10] = 0; //x[10] is a0 reg

add_task(new_task);
new_pid as isize
}

We can see that if trap_handler call sys_fork, parent process x[10] would be new_pid as return value.

Exec

If we want to execute a task by its name, we need to first load string in app load.

1
2
3
4
5
6
7
8
// os/build.rs

writeln!(f, r#"
.global _app_names
_app_names:"#)?;
for app in apps.iter() {
writeln!(f, r#" .string "{}""#, app)?;
}
1
2
3
4
5
6
7
# link_app.S

...
_app_names:
.string "exit"
.string "fantastic_text"
...

Construct APP_NAMES as global state in OS.

1
2
3
4
5
6
7
8
9
10
11
12
13
// os/src/loader.rs

// APP_NAMES: Vec<&'static str> {...
for _ in 0..num_app {
let mut end = start;
while end.read_volatile() != '\0' as u8 {
end = end.add(1);
}
let slice = core::slice::from_raw_parts(start, end as usize - start as usize);
let str = core::str::from_utf8(slice).unwrap();
v.push(str);
start = end.add(1);
}

When execute a new binary file, we need to read it and extract all state to replace original one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// os/src/task/task.rs

impl TaskControlBlock {
pub fn exec(&self, elf_data: &[u8]) {
let (memory_set, user_sp, entry_point) = MemorySet::from_elf(elf_data);
let trap_cx_ppn = memory_set
.translate(VirtAddr::from(TRAP_CONTEXT).into())
.unwrap()
.ppn();

// **** access inner exclusively
let mut inner = self.inner_exclusive_access();
inner.memory_set = memory_set;
inner.trap_cx_ppn = trap_cx_ppn;

let trap_cx = inner.get_trap_cx();
*trap_cx = TrapContext::app_init_context(
entry_point,
user_sp,
KERNEL_SPACE.exclusive_access().token(),
self.kernel_stack.get_top(),
trap_handler as usize,
);
// **** stop exclusively accessing inner automatically
}
}

We will read input str as a ptr and replace current task state.

1
2
3
4
5
6
7
8
9
10
11
12
13
// os/src/syscall/process.rs

pub fn sys_exec(path: *const u8) -> isize {
let token = current_user_token();
let path = translated_str(token, path);
if let Some(data) = get_app_data_by_name(path.as_str()) {
let task = current_task().unwrap();
task.exec(data);
0
} else {
-1
}
}

Exit

When a task exit, it will return a exit code assigned by app if successfully or kernel if anomaly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// fn exit_current_and_run_next(exit_code:i32)
inner.task_status = TaskStatus::Zombie;
inner.exit_code = exit_code;

// move all its children to the initial process
// ++++++ access initproc TCB exclusively
{
let mut initproc_inner = INITPROC.inner_exclusive_access();
for child in inner.children.iter() {
child.inner_exclusive_access().parent = Some(Arc::downgrade(&INITPROC));
initproc_inner.children.push(child.clone());
}
}
// ++++++ stop exclusively accessing parent PCB

// clear all memory.
inner.children.clear();
inner.memory_set.recycle_data_pages();
drop(inner);
// **** stop exclusively accessing current PCB
// drop task manually to maintain rc correctly
drop(task);
// use _unused replace original context, which will be recycled by rust.
let mut _unused = TaskContext::zero_init();
schedule(&mut _unused as *mut _);

// os/src/syscall/process.rs

pub fn sys_exit(exit_code: i32) -> ! {
exit_current_and_run_next(exit_code);
panic!("Unreachable in sys_exit!");
}

WaitPid

WaitPid will return -1 if there’s no specified pid process exist, if it’s running, return -2, finally, if it finished, recycle it and return 0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// sys_waitpid(pid:uisze, exit_code_ptr:*mut i32) -> isize

// -1 case
// search task manager and find (task_block)
|p| {pid == -1 || pid as usize == p.getpid()}
return -1;

let pair = // search task managers and find (idx,task_block)
p.inner_exclusive_access().is_zombie() && (pid == -1 || pid as usize == p.getpid())

if let Some((idx,_)) = pair {
let child = inner.children.remove(idx);
// confirm that child will be deallocated after removing from children list
assert_eq!(Arc::strong_count(&child), 1);
let found_pid = child.getpid();
// ++++ temporarily access child TCB exclusively
let exit_code = child.inner_exclusive_access().exit_code;
// ++++ stop exclusively accessing child PCB
*translated_refmut(inner.memory_set.token(), exit_code_ptr) = exit_code;
found_pid as isize
} else {
// pid process is running
-2
}

// user/src/lib.rs

pub fn wait(exit_code: &mut i32) -> isize {
loop {
match sys_waitpid(-1, exit_code as *mut _) {
-2 => { yield_(); }
// -1 or a real pid
exit_pid => return exit_pid,
}
}
}

Chapter 6-1

Introduction

The demand of persistent storage is with growth of computer. Currently we can only store things on our map memory, but it’s only exist in running time, which is Internal Storage, now we want to store it to External Storage.

Concept

Regular File

1
2
3
4
5
6
7
8
9
10
11
cd os/src/
stat main.rs

File: main.rs
Size: 940 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 4975 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ oslab) Gid: ( 1000/ oslab)
Access: 2021-02-28 23:32:50.289925450 +0800
Modify: 2021-02-28 23:32:50.133927136 +0800
Change: 2021-02-28 23:32:50.133927136 +0800
Birth: -

Beside usual info, if one file is not regular, it’s usually a block device file or character device file, whose major/minor ID will be shown.

  • Links: alias name for one file
  • Inode: the underneath id used to route
  • Uid: the file belonged user id.
  • Gid: the file begloned group id.
  • Blocks: take amount of blocks(in linux, it’s 4096KB).

Dir

1
2
3
4
5
6
7
8
9
stat os
File: os
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 801h/2049d Inode: 4982 Links: 5
Access: (0755/drwxr-xr-x) Uid: ( 1000/ oslab) Gid: ( 1000/ oslab)
Access: 2021-02-28 23:32:50.133927136 +0800
Modify: 2021-02-28 23:32:50.129927180 +0800
Change: 2021-02-28 23:32:50.129927180 +0800
Birth: -
  • Access:
    • d: dir
    • r: allowed to read files and subdir
    • w: allowed to create and delete files and subdir
    • x: allowed to “pass” this dir.

File System

Play the role of mapping the given dir tree structure to persistent storage. For example: windows-FAT/NTPS; linux-Ext3/Ext4/Btrfs. Therefore, construct a VFS-Virtual File System is necessary to restrict unified interface.


Design

Simplification

  • flatten: only root dir /
  • permission: only user and no restriction on file access
  • no timestamp
  • no soft/hard link
  • many…

Cache

For a persistent external storage, it will separate file in basic storage unit. Which is called sector(usually 512 bytes, 4KB), rather, file system will set its own storage unit which is called block, usually different from sector, but in this implementation, we set it as 512 bytes, same as sector.

A basic interface from device is:

1
2
3
4
5
6
// easy-fs/src/block_dev.rs

pub trait BlockDevice : Send + Sync + Any {
fn read_block(&self, block_id: usize, buf: &mut [u8]);
fn write_block(&self, block_id: usize, buf: &[u8]);
}

File will not read and write directly often which will slow down speed, we will construct Block Cache to store read data. Then we unify all block cache in to a manager with limit size and used for allocation and deallocation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// easy-fs/src/lib.rs

pub const BLOCK_SZ: usize = 512;

// easy-fs/src/block_cache.rs

pub struct BlockCache {
cache: [u8; BLOCK_SZ],
block_id: usize,
block_device: Arc<dyn BlockDevice>,
modified: bool,
}

impl BlockCache {
pub fn read<T, V>(&self, offset: usize, f: impl FnOnce(&T) -> V) -> V {
f(self.get_ref(offset))
}

pub fn modify<T, V>(&mut self, offset:usize, f: impl FnOnce(&mut T) -> V) -> V {
f(self.get_mut(offset))
}
}

// easy-fs/src/block_cache.rs

const BLOCK_CACHE_SIZE: usize = 16;

use alloc::collections::VecDeque;

pub struct BlockCacheManager {
queue: VecDeque<(usize, Arc<Mutex<BlockCache>>)>,
}

impl BlockCacheManager {
pub fn new() -> Self {
Self { queue: VecDeque::new() }
}
}

// impl BlockCacheManager
// fn get_block_cache(&mut self, block_id:usize, block_device: Arc<dyn BlockDevice>)
// if reach limit size:
if self.queue.len() == BLOCK_CACHE_SIZE {
// from front to tail
if let Some((idx, _)) = self.queue
.iter()
.enumerate()
.find(|(_, pair)| Arc::strong_count(&pair.1) == 1) {
self.queue.drain(idx..=idx);
} else {
panic!("Run out of BlockCache!");
}
}
// load block into mem and push back
let block_cache = Arc::new(Mutex::new(
BlockCache::new(block_id, Arc::clone(&block_device))
));
self.queue.push_back((block_id, Arc::clone(&block_cache)));
block_cache

Block Layout

alt text

We will design a whole map structure to control block caches.

First is Super Block which is a head to control everything, notice magic is magic number in mathematics to check the integrity of structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// easy-fs/src/layout.rs

#[repr(C)]
pub struct SuperBlock {
magic: u32,
pub total_blocks: u32,
pub inode_bitmap_blocks: u32,
pub inode_area_blocks: u32,
pub data_bitmap_blocks: u32,
pub data_area_blocks: u32,
}

pub fn is_valid(&self) -> bool {
self.magic == EFS_MAGIC
}

Bit Map is a nice structure to handle mapping operations, we set each block as 512 bytes(4KB), each bits represent a state of allocation(1/0 for allocated/deallocated).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// easy-fs/src/bitmap.rs

// notice it only store the id of start block and the len of blocks in its range.
pub struct Bitmap {
start_block_id: usize,
blocks: usize,
}

// the true structure to map
type BitmapBlock = [u64; 64];

// equal: 4096 bits
const BLOCK_BITS: usize = BLOCK_SZ * 8;

impl Bitmap {
pub fn alloc(&self, block_device: &Arc<dyn BlockDevice>) -> Option<usize> {
for block_id in 0..self.blocks {
let pos = get_block_cache(
block_id + self.start_block_id as usize,
Arc::clone(block_device),
)
.lock()
.modify(0, |bitmap_block: &mut BitmapBlock| {
// core!
if let Some((bits64_pos, inner_pos)) = bitmap_block
.iter()
.enumerate()
.find(|(_, bits64)| **bits64 != u64::MAX)
.map(|(bits64_pos, bits64)| {
(bits64_pos, bits64.trailing_ones() as usize)
}) {
// modify cache
bitmap_block[bits64_pos] |= 1u64 << inner_pos;
Some(block_id * BLOCK_BITS + bits64_pos * 64 + inner_pos as usize)
} else {
None
}
});
if pos.is_some() {
return pos;
}
}
None
}
}

Based on such structure, we could exposit what is Inode and Data Block, not all block will store real data because some of them need to be used as guidance. However, we also need to know where and how these route blocks be allocated. That’s the reason of Bit Map! Now we delve into Inode.

To make one inode control many data blocks, we will design layer of route for it. Beside direct index, it also store the index of layer 1 and layer 2 to route other index block(which is considered same as data block), and route to real data block. Notice one block contains 512 bytes, which is 512 u8, so it contains 512/4 = 128 u32, so one index block can route 128 * 512 bytes = 128 * 0.5 KB = 64 KB in one layer. In second layer, it can route as much as 128 * 64 KB = 64 MB.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// easy-fs/src/layout.rs

const INODE_DIRECT_COUNT: usize = 28;
// 128
const INODE_INDIRECT1_COUNT: usize = BLOCK_SZ / 4;

#[repr(C)]
pub struct DiskInode {
pub size: u32,
pub direct: [u32; INODE_DIRECT_COUNT],
pub indirect1: u32,
pub indirect2: u32,
type_: DiskInodeType,
}

#[derive(PartialEq)]
pub enum DiskInodeType {
File,
Directory,
}

Such Inode take 128 bytes, so in one block, it could contains 4 inodes. We should make a data structure could be fit exactly into a block size. Now we design route method.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// 28 + 128
const INDIRECT1_BOUND: usize = DIRECT_BOUND + INODE_INDIRECT1_COUNT;
type IndirectBlock = [u32; BLOCK_SZ / 4];

impl DiskInode {
pub fn get_block_id(&self, inner_id: u32, block_device: &Arc<dyn BlockDevice>) -> u32 {
let inner_id = inner_id as usize;
if inner_id < INODE_DIRECT_COUNT {
self.direct[inner_id]
} else if inner_id < INDIRECT1_BOUND {
get_block_cache(self.indirect1 as usize, Arc::clone(block_device))
.lock()
.read(0, |indirect_block: &IndirectBlock| {
indirect_block[inner_id - INODE_DIRECT_COUNT]
})
} else {
let last = inner_id - INDIRECT1_BOUND;
let indirect1 = get_block_cache(
self.indirect2 as usize,
Arc::clone(block_device)
)
.lock()
.read(0, |indirect2: &IndirectBlock| {
indirect2[last / INODE_INDIRECT1_COUNT]
});
get_block_cache(
indirect1 as usize,
Arc::clone(block_device)
)
.lock()
.read(0, |indirect1: &IndirectBlock| {
indirect1[last % INODE_INDIRECT1_COUNT]
})
}
}
}

Now we design Data block, which is simple. Because for file system, any data are just bytes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// easy-fs/src/layout.rs

// BLOCK_SZ = 512
type DataBlock = [u8; BLOCK_SZ];

// impl DiskInode
// pub fn read_at(
// &self,
// offset: usize,
// buf: &mut [u8],
// block_device: &Arc<dyn BlockDevice>,
loop {
// calculate end of current block
let mut end_current_block = (start / BLOCK_SZ + 1) * BLOCK_SZ;
end_current_block = end_current_block.min(end);
// read and update read size
let block_read_size = end_current_block - start;
let dst = &mut buf[read_size..read_size + block_read_size];
get_block_cache(
self.get_block_id(start_block as u32, block_device) as usize,
Arc::clone(block_device),
)
.lock()
.read(0, |data_block: &DataBlock| {
let src = &data_block[start % BLOCK_SZ..start % BLOCK_SZ + block_read_size];
dst.copy_from_slice(src);
});
read_size += block_read_size;
// move to next block
if end_current_block == end { break; }
start_block += 1;
start = end_current_block;
}

File System

Due to our consecutive layout, we will store bitmap and start block, then initiate a unified system to control allocation and route. We call it File System.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// easy-fs/src/efs.rs

pub struct EasyFileSystem {
pub block_device: Arc<dyn BlockDevice>,
pub inode_bitmap: Bitmap,
pub data_bitmap: Bitmap,
inode_area_start_block: u32,
data_area_start_block: u32,
}

impl EasyFileSystem {
pub fn create(
block_device: Arc<dyn BlockDevice>,
total_blocks: u32,
inode_bitmap_blocks: u32,
) -> Arc<Mutex<Self>> {
// calculate block size of areas & create bitmaps
let inode_bitmap = Bitmap::new(1, inode_bitmap_blocks as usize);
let inode_num = inode_bitmap.maximum();
let inode_area_blocks =
((inode_num * core::mem::size_of::<DiskInode>() + BLOCK_SZ - 1) / BLOCK_SZ) as u32;
let inode_total_blocks = inode_bitmap_blocks + inode_area_blocks;
let data_total_blocks = total_blocks - 1 - inode_total_blocks;
let data_bitmap_blocks = (data_total_blocks + 4096) / 4097;
let data_area_blocks = data_total_blocks - data_bitmap_blocks;
let data_bitmap = Bitmap::new(
(1 + inode_bitmap_blocks + inode_area_blocks) as usize,
data_bitmap_blocks as usize,
);
...

Use Bit Map, we finally know which is Inode and Data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// easy-fs/src/efs.rs

impl EasyFileSystem {
pub fn get_disk_inode_pos(&self, inode_id: u32) -> (u32, usize) {
let inode_size = core::mem::size_of::<DiskInode>();
let inodes_per_block = (BLOCK_SZ / inode_size) as u32;
let block_id = self.inode_area_start_block + inode_id / inodes_per_block;
(block_id, (inode_id % inodes_per_block) as usize * inode_size)
}

pub fn get_data_block_id(&self, data_block_id: u32) -> u32 {
self.data_area_start_block + data_block_id
}
}

Our Disk Inode is aims for underneath system, not for user, so we need a real Inode as a interface for Disk Inode to route, which store its id and offset.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// easy-fs/src/vfs.rs

pub struct Inode {
block_id: usize,
block_offset: usize,
fs: Arc<Mutex<EasyFileSystem>>,
block_device: Arc<dyn BlockDevice>,
}

// easy-fs/src/vfs.rs

impl Inode {
fn read_disk_inode<V>(&self, f: impl FnOnce(&DiskInode) -> V) -> V {
get_block_cache(
self.block_id,
Arc::clone(&self.block_device)
).lock().read(self.block_offset, f)
}

fn modify_disk_inode<V>(&self, f: impl FnOnce(&mut DiskInode) -> V) -> V {
get_block_cache(
self.block_id,
Arc::clone(&self.block_device)
).lock().modify(self.block_offset, f)
}
}

All methods exposed to user will be root inode.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// easy-fs/src/efs.rs

impl EasyFileSystem {
pub fn root_inode(efs: &Arc<Mutex<Self>>) -> Inode {
let block_device = Arc::clone(&efs.lock().block_device);
// acquire efs lock temporarily
let (block_id, block_offset) = efs.lock().get_disk_inode_pos(0);
// release efs lock
Inode::new(
block_id,
block_offset,
Arc::clone(efs),
block_device,
)
}
}

However, we still need one special data block which is DirEntry, as directory which store inode_number to route inode, DirEntry takes 32 bytes, so each block can store 4 DirEntry. Thus we can route to inode by &str.

Notice disk_inode contains type for dir and file. So some of inodes will store dir data and some of inodes will store file data, we can get inode of data by inode of dir.

1
2
3
4
5
6
7
8
9
10
11
// easy-fs/src/layout.rs

const NAME_LENGTH_LIMIT: usize = 27;

#[repr(C)]
pub struct DirEntry {
name: [u8; NAME_LENGTH_LIMIT + 1],
inode_number: u32,
}

pub const DIRENT_SZ: usize = 32;

First, we will

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// easy-fs/src/vfs.rs

impl Inode {
pub fn find(&self, name: &str) -> Option<Arc<Inode>> {
let fs = self.fs.lock();
self.read_disk_inode(|disk_inode| {
self.find_inode_id(name, disk_inode)
.map(|inode_id| {
let (block_id, block_offset) = fs.get_disk_inode_pos(inode_id);
Arc::new(Self::new(
block_id,
block_offset,
self.fs.clone(),
self.block_device.clone(),
))
})
})
}

fn find_inode_id(
&self,
name: &str,
disk_inode: &DiskInode,
) -> Option<u32> {
// assert it is a directory
assert!(disk_inode.is_dir());
let file_count = (disk_inode.size as usize) / DIRENT_SZ;
let mut dirent = DirEntry::empty();
for i in 0..file_count {
// note assert_eq! has side effect: read data to dirent.
assert_eq!(
disk_inode.read_at(
DIRENT_SZ * i,
dirent.as_bytes_mut(),
&self.block_device,
),
DIRENT_SZ,
);
if dirent.name() == name {
return Some(dirent.inode_number() as u32);
}
}
None
}
}

Usually, the workflow of create or delete, read or write would be:

  • read/write
    • get root inode which is dir type
    • read/write closure of disk inode through root inode
    • resize specified inode
  • create/clear
    • allocation/deallocation: alloc/dealloc inode by bitmap and get its index
    • initialization/clear by get its block cache by its index
    • resize root inode

Chapter 4

Introduction

We don’t want a fixed physical addr for allocation, rather, we want a unified abstract interface for dynamic memory layout for app storage. We call it Virtual Address

Safety: Every app will be allocated in its own virtual memory space, so each can’t interfere others.

Efficiency: Every app can coexist in same time without demand of reading outer peripherals to switch app.(With development of memory size)

We need MMU(Memory Management Unit) to achieve Address Translation for interview from virtual to physical.

Different designs occur.


Design

Segment Design

alt text

Each app exist in one fixed slot for one segment as $[0,bound)$, with a linear map by MMU.

Problem: Wasteful and inflexible

We may want a different linear map for each app, for example, its allocation for heap, data, code etc… So we can dispatch memory in more finer style, but it can’t resolve the problem because now even slot is dynamically allocated, it may still exist some free memory too small to reuse, cause the External Fragment rather than Internal Fragment which is the problem due to fixed slot.

Paging Design

alt text
We could set a Protection bit as r for read, w for write, x for execution.

Another way is to inverse our mind, rather take a slot on memory, we take slot on MMU, it can map its slot(now we call it Page) for real physical memory layout. To adjust, we can take slice for Page to form Frame which is a smaller unit to suit physical memory layout, each app can take many Page Number for a Page Table, record the map.

Page Design

alt text

SV39 only use lower 39 bits rather whole 64 bits in design even bit width is 64 bits(it’s a fairly large amount!)

In a address, $[11:0]$ called Page Offset, and $[38:12]$ is the VPN which will be used for location of page and use offset to direct in one page(with 4KiB in one page).

We should modify satp to open Paging for S and U-level memory.

1
2
const PAGE_SIZE: usize = 4096
const PAGE_SIZE_BIT: usize = 12

Page Size and offset to slice physical addr.

1
2
3
4
5
6
// os/src/mm/address.rs

impl PhysAddr {
pub fn floor(&self) -> PhysPageNum { PhysPageNum(self.0 / PAGE_SIZE) }
pub fn ceil(&self) -> PhysPageNum { PhysPageNum((self.0 + PAGE_SIZE - 1) / PAGE_SIZE) }
}

Page Entry to bundle permission and physical page.

1
2
3
4
5
6
pub fn ppn(&self) -> PhysPageNum {
(self.bits >> 10 & ((1usize << 44) - 1)).into()
}
pub fn flags(&self) -> PTEFlags {
PTEFlags::from_bits(self.bits as u8).unwrap()
}

Usually, a simple design for page manager would be a linear map from base addr and follow up. But actually it will take a huge amount of memory due to the storage of offset by base addr for each app.

A finer design is from Trie. We will take index slice for each 8 bits(it will fit in to 4KB just right!), it means for each slice has 512 states, and link those state up, form a tree. Usually with 3-level for Page Index. Total 27 bits.

Virtual Page will reserve 27 bits for the index slice and 12 bits for offset for certain page. Total 39 bits.

When we transform a virtual addr to physical one, we will do the following exposition reversely.

Page Management Design

A simple allocation for page(rather management!) is stack style.

1
2
3
4
5
pub struct StackFrameAllocator {
current: usize, //空闲内存的起始物理页号
end: usize, //空闲内存的结束物理页号
recycled: Vec<usize>,
}

Based on Allocation, we can design Page Table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// os/src/mm/frame_allocator.rs

pub struct FrameTracker {
pub ppn: PhysPageNum,
}

impl FrameTracker {
pub fn new(ppn: PhysPageNum) -> Self {
// page cleaning
let bytes_array = ppn.get_bytes_array();
for i in bytes_array {
*i = 0;
}
Self { ppn }
}
}

// os/src/mm/page_table.rs

pub struct PageTable {
root_ppn: PhysPageNum,
frames: Vec<FrameTracker>,
}

It means for one physical page, we will record each allocation by vector of FrameTracker as a representation of real Frame after the root.

We should design transformation from virtual addr to physical addr.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// for each idxs represent index slices of virtual addr.
for i in 0..3 {
let pte = &mut ppn.get_pte_array()[idxs[i]];
if i == 2 {
result = Some(pte);
break;
}
if !pte.is_valid() {
let frame = frame_alloc().unwrap();
// create or return None
*pte = PageTableEntry::new(frame.ppn, PTEFlags::V);
self.frames.push(frame);
}
ppn = pte.ppn();
}

Page Map Design

Based on our abstraction, we need a design for MapArea to given a map from continuous virtual address(no matter their corresponding page!) to physical address.

1
2
3
4
5
6
7
8
// os/src/mm/memory_set.rs

pub struct MapArea {
vpn_range: VPNRange,
data_frames: BTreeMap<VirtPageNum, FrameTracker>,
map_type: MapType,
map_perm: MapPermission,
}

Based on continuous virtual address map, we can define discontinuous map for one app.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/mm/memory_set.rs

pub struct MemorySet {
page_table: PageTable,
areas: Vec<MapArea>,
}

// impl MemorySet
fn push(&mut self, mut map_area: MapArea, data: Option<&[u8]>) {
// map range of virtual addr in allocation in page_table
map_area.map(&mut self.page_table);
if let Some(data) = data {
map_area.copy_data(&mut self.page_table, data);
}
self.areas.push(map_area);
}

In Each MapArea allocated for some key-value for virtual-physical addr, it will allocate the same for PageTable for Frame.


Allocation Space

To open SV39, we should write instruction for satp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// os/src/mm/page_table.rs

pub fn token(&self) -> usize {
8usize << 60 | self.root_ppn.0
}

// os/src/mm/memory_set.rs

impl MemorySet {
pub fn activate(&self) {
let satp = self.page_table.token();
unsafe {
satp::write(satp);
asm!("sfence.vma" :::: "volatile");
// sfence.vma is a special instruction to clear `Translaton Lookaside Buffer` which is used for quick search for memory addr to reduce performance expenses.
}
}
}

Therefore, it will fill current root of physical page number as activation.

Notice that we should make instruction contigeous for SV39 open in physical address to amend the gap of transformation of address before and after open.

Kernel

Initiation for Kernel memory layout.

1
2
3
4
5
6
7
8
9
10
11
let mut memory_set = Self::new_bare();
// map trampoline
memory_set.map_trampoline();
memory_set.push(MapArea::new(
(stext as usize).into(),
(etext as usize).into(),
MapType::Identical,
MapPermission::R | MapPermission::X,
), None);
println!("mapping .rodata section");
// other layout ...

Initiation for App memory layout.

Previously, we will cut off elf part of binary of apps, now we load it and extract useful infos, s.t. Permissions.

MemorySet should allocate storage of execution code with its permissions, allocate user stack and trap context at top of the memory layout.

Output MemorySet, user stack top, entry point addr.

1
2
3
4
5
6
7
8
9
10
11
12
13
let mut max_end_vpn = VirtPageNum(0);
let max_end_va: VirtAddr = max_end_vpn.into();
let mut user_stack_bottom: usize = max_end_va.into();
// guard page
user_stack_bottom += PAGE_SIZE;
let user_stack_top = user_stack_bottom + USER_STACK_SIZE;
memory_set.push(MapArea::new(
TRAP_CONTEXT.into(),
TRAMPOLINE.into(),
MapType::Framed,
MapPermission::R | MapPermission::W,
), None);
// ...

App

A problem is that separation of Kernel and App will also isolate Trap, the process need info from App to Kernel but App can’t see it. Therefore, we need a transfer operation. We achieve this by storing related info in TrapContext.

(Because there’s no more register to store these without breaking state like sscratch designed for kernel stack.)

1
2
3
4
5
6
7
8
9
10
11
12
// os/src/trap/context.rs

#[repr(C)]
pub struct TrapContext {
pub x: [usize; 32],
pub sstatus: Sstatus,
pub sepc: usize,
// new:
pub kernel_satp: usize,
pub kernel_sp: usize,
pub trap_handler: usize,
}

Notice that we also need to modify below to trigger satp and specify corresponding(U-level, S-level satp) physical page number to change state.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# __alltraps:

# load kernel_satp into t0
ld t0, 34*8(sp)
# load trap_handler into t1
ld t1, 36*8(sp)
# move to kernel_sp
ld sp, 35*8(sp)
# switch to kernel space
csrw satp, t0
sfence.vma
# jump to trap_handler
jr t1

# __restore:
# a0: *TrapContext in user space(Constant); a1: user space token
# switch to user space
csrw satp, a1
sfence.vma
csrw sscratch, a0

To amend the problem of contigeous instructions after switch, we need to adjust memory layout for trampoline which is in the same location in U-level and S-level(unified for all app to trap!). It will be placed at highest virtual page.

1
2
3
4
5
6
7
8
9
10
11
# os/src/linker.ld

stext = .;
.text : {
*(.text.entry)
+ . = ALIGN(4K);
+ strampoline = .;
+ *(.text.trampoline);
+ . = ALIGN(4K);
*(.text .text.*)
}

We modify rather raw handler and restore, due to virtual address, we need to adjust it for trampoline rather the address we had code!(it’s virtual!)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#[no_mangle]
pub fn trap_handler() -> ! {
set_kernel_trap_entry();
let cx = current_trap_cx();
let scause = scause::read();
let stval = stval::read();
match scause.cause() {
...
}
trap_return();
}

#[no_mangle]
pub fn trap_return() -> ! {
set_user_trap_entry();
let trap_cx_ptr = TRAP_CONTEXT;
let user_satp = current_user_token();
extern "C" {
fn __alltraps();
fn __restore();
}
let restore_va = __restore as usize - __alltraps as usize + TRAMPOLINE;
unsafe {
asm!(
"fence.i",
"jr {restore_va}",
restore_va = in(reg) restore_va,
in("a0") trap_cx_ptr,
in("a1") user_satp,
options(noreturn)
);
}
panic!("Unreachable in back_to_user!");
}

Then map the virtual address for it up to the physical address for unifying.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/config.rs

pub const TRAMPOLINE: usize = usize::MAX - PAGE_SIZE + 1;

// os/src/mm/memory_set.rs

impl MemorySet {
/// Mention that trampoline is not collected by areas.
fn map_trampoline(&mut self) {
self.page_table.map(
VirtAddr::from(TRAMPOLINE).into(),
PhysAddr::from(strampoline as usize).into(),
PTEFlags::R | PTEFlags::X,
);
}
}

We should adjust TaskControlBlock for the same reason, record each Trap.

1
2
3
4
5
6
7
8
9
// os/src/task/task.rs

pub struct TaskControlBlock {
pub task_cx: TaskContext,
pub task_status: TaskStatus,
pub memory_set: MemorySet,
pub trap_cx_ppn: PhysPageNum,
pub base_size: usize,
}

It will read data getting from elf, get trap contexr ppn, its kernel stack bottom and top, and then initiate trap context.

Here the part of task control initiation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
impl TaskContext {
pub fn goto_trap_return() -> Self {
Self {
ra: trap_return as usize,
s: [0; 12],
}
}
}

// fn new(elf_data: &[u8], app_id: usize) -> Self
let task_control_block = Self {
task_status,
task_cx: TaskContext::goto_trap_return(kernel_stack_top),
memory_set,
trap_cx_ppn,
base_size: user_sp,
};
// prepare TrapContext in user space
let trap_cx = task_control_block.get_trap_cx();
*trap_cx = TrapContext::app_init_context(
entry_point,
user_sp,
KERNEL_SPACE.exclusive_access().token(),
kernel_stack_top,
trap_handler as usize,
);
task_control_block

Chapter 7-1

Introduction

We gonna abstract Stdin and Stdout by file, and insert into file descriptor. Therefore support Pipe operation and IO Redirection across each process.

Everything Is a File

The design philosophy of Everything is a file will generalize everything to file based on IO operations while omit concrete content semantics.

Abstraction of IO hardware:

  • read-only: s.t. keyboard
  • write-only: s.t. screen
  • read-write: s.t. serial device

Abstraction of IO operations(based on file descriptor):

  • open: open file while possessing it by certain process.
  • close: close file while discarding it by certain process.
  • read: read file into memory.
  • write: write file from memory.

When a process is created, it owns three file as operation abstraction:

  • 0: Stdin
  • 1: Stdout
  • 2: Stderr(which we will merge with Stdout)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
impl TaskControlBlock {
pub fn new(elf_data: &[u8]) -> Self {
...
let task_control_block = Self {
pid: pid_handle,
kernel_stack,
inner: Mutex::new(TaskControlBlockInner {
// ...
fd_table: vec![
// 0 -> stdin
Some(Arc::new(Stdin)),
// 1 -> stdout
Some(Arc::new(Stdout)),
// 2 -> stderr
Some(Arc::new(Stdout)),
],
}),
};
...
}
}

Pipe

In usual shell, | is the symbolic of pipe. Manage input from left and output to right. If we abstract everything to file, s.t. Stdin or Stdout, so does Pipe, it has read and write ends, user could read thing from this end and write thing(often in child process) to other end, transfer those underneath thing.

We already has file descriptor as the indication of file, we will implement same operation for pipe.

sys_pipe get the ptr of a array with len = 2, output the write and the read ends of descriptors of pipe in the ptr.

1
2
3
4
5
6
7
// user/src/syscall.rs

const SYSCALL_PIPE: usize = 59;

pub fn sys_pipe(pipe: &mut [usize]) -> isize {
syscall(SYSCALL_PIPE, [pipe.as_mut_ptr() as usize, 0, 0])
}

So What’s the basic design of pipe?

It should has write and read ends which means ends share the same data, and record read and write informations on this data. We will construct RingBuffer to achieve this. Pipe owns a buffer control read and write, buffer will record data from head to tail index. Why we can’t just use two piece of data or Queue?

Because there’s no copy and suitable for our restriction! We will read data from head and move forward and push data to end in a fixed array rather allocation for Queue.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// os/src/fs/pipe.rs

pub struct Pipe {
readable: bool,
writable: bool,
buffer: Arc<Mutex<PipeRingBuffer>>,
}

const RING_BUFFER_SIZE: usize = 32;

#[derive(Copy, Clone, PartialEq)]
enum RingBufferStatus {
FULL,
EMPTY,
NORMAL,
}

pub struct PipeRingBuffer {
arr: [u8; RING_BUFFER_SIZE],
head: usize, // head index of ring buffer
tail: usize, // tail index of ring buffer
status: RingBufferStatus,
write_end: Option<Weak<Pipe>>,
}

impl PipeRingBuffer {
pub fn set_write_end(&mut self, write_end: &Arc<Pipe>) {
self.write_end = Some(Arc::downgrade(write_end));
}
}

/// Return (read_end, write_end)
pub fn make_pipe() -> (Arc<Pipe>, Arc<Pipe>) {
let buffer = Arc::new(Mutex::new(PipeRingBuffer::new()));
let read_end = Arc::new(
Pipe::read_end_with_buffer(buffer.clone())
);
let write_end = Arc::new(
Pipe::write_end_with_buffer(buffer.clone())
);
buffer.lock().set_write_end(&write_end);
(read_end, write_end)
}

impl PipeRingBuffer {
pub fn read_byte(&mut self) -> u8 {
self.status = RingBufferStatus::NORMAL;
let c = self.arr[self.head];
// move forward
self.head = (self.head + 1) % RING_BUFFER_SIZE;
if self.head == self.tail {
self.status = RingBufferStatus::EMPTY;
}
c
}
pub fn available_read(&self) -> usize {
if self.status == RingBufferStatus::EMPTY {
0
} else {
// data from head to tail!
if self.tail > self.head {
self.tail - self.head
} else {
self.tail + RING_BUFFER_SIZE - self.head
}
}
}
pub fn all_write_ends_closed(&self) -> bool {
self.write_end.as_ref().unwrap().upgrade().is_none()
}
}

In one process, there’s a possible that can’t read all thing in once, if so, we will pause and run other thing until the write end is finished.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// os/src/fs/pipe.rs

impl File for Pipe {
fn read(&self, buf: UserBuffer) -> usize {
assert!(self.readable());
let want_to_read = buf.len();
let mut buf_iter = buf.into_iter();
let mut already_read = 0usize;
loop {
let mut ring_buffer = self.buffer.exclusive_access();
let loop_read = ring_buffer.available_read();
if loop_read == 0 {
if ring_buffer.all_write_ends_closed() {
return already_read;
}
drop(ring_buffer);
suspend_current_and_run_next();
continue;
}
for _ in 0..loop_read {
if let Some(byte_ref) = buf_iter.next() {
unsafe {
*byte_ref = ring_buffer.read_byte();
}
already_read += 1;
if already_read == want_to_read {
return want_to_read;
}
} else {
return already_read;
}
}
}
}
}

Arguments

We will combine our pipe with our shell.

First, parse our arguments and push 0 to end to indicated end.

1
2
3
4
5
6
7
8
9
10
11
12
// user/src/bin/user_shell.rs

let args: Vec<_> = line.as_str().split(' ').collect();
let mut args_addr: Vec<*const u8> = args
.iter()
.map(|&arg| {
let s = arg.to_string();
s.push('\0');
s.as_ptr()
})
.collect();
args_addr.push(0 as *const u8)

Now task will accept a series of args rather than solely one string. So make sys_exec to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// os/src/syscall/process.rs

pub fn sys_exec(path: *const u8, mut args: *const usize) -> isize {
let token = current_user_token();
let path = translated_str(token, path);
let mut args_vec: Vec<String> = Vec::new();
// args would be a ptr of array contains ptr of string.
loop {
let arg_str_ptr = *translated_ref(token, args);
if arg_str_ptr == 0 {
break;
}
args_vec.push(translated_str(token, arg_str_ptr as *const u8));
unsafe { args = args.add(1); }
}
if let Some(app_inode) = open_file(path.as_str(), OpenFlags::RDONLY) {
let all_data = app_inode.read_all();
let task = current_task().unwrap();
let argc = args_vec.len();
task.exec(all_data.as_slice(), args_vec);
// return argc because cx.x[10] will be covered with it later
argc as isize
} else {
-1
}
}

Now, we really gonna use user stack to store these args!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// os/src/task/task.rs

impl TaskControlBlock {
// notice exec will allocate a new memory set!
pub fn exec(&self, elf_data: &[u8], args: Vec<String>) {
// ...
// first allocate memory for ptr of strings.
user_sp -= (args.len() + 1) * core::mem::size_of::<usize>();
let argv_base = user_sp;
// allocate new memory in user stack addr as a vector of strings
let mut argv: Vec<_> = (0..=args.len())
.map(|arg| {
translated_refmut(
memory_set.token(),
(argv_base + arg * core::mem::size_of::<usize>()) as *mut usize
)
})
.collect();
*argv[args.len()] = 0;
for i in 0..args.len() {
// allocate for strings themselves.
user_sp -= args[i].len() + 1;
*argv[i] = user_sp;
let mut p = user_sp;
for c in args[i].as_bytes() {
*translated_refmut(memory_set.token(), p as *mut u8) = *c;
p += 1;
}
*translated_refmut(memory_set.token(), p as *mut u8) = 0;
}
// make the user_sp aligned to 8B for k210 platform
user_sp -= user_sp % core::mem::size_of::<usize>();

// **** hold current PCB lock
let mut inner = self.acquire_inner_lock();
// substitute memory_set
inner.memory_set = memory_set;
// update trap_cx ppn
inner.trap_cx_ppn = trap_cx_ppn;
// initialize trap_cx
let mut trap_cx = TrapContext::app_init_context(
entry_point,
user_sp,
KERNEL_SPACE.lock().token(),
self.kernel_stack.get_top(),
trap_handler as usize,
);
// a[0] be args len
trap_cx.x[10] = args.len();
// a[1] be args base addr
trap_cx.x[11] = argv_base;
*inner.get_trap_cx() = trap_cx;
// **** release current PCB lock
}
}

Now we provide receive operation in _start, in which main could use it at first time S-level reading data and passing to U-level:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// user/src/lib.rs

#[no_mangle]
#[link_section = ".text.entry"]
pub extern "C" fn _start(argc: usize, argv: usize) -> ! {
unsafe {
HEAP.lock()
.init(HEAP_SPACE.as_ptr() as usize, USER_HEAP_SIZE);
}
let mut v: Vec<&'static str> = Vec::new();
for i in 0..argc {
let str_start = unsafe {
((argv + i * core::mem::size_of::<usize>()) as *const usize).read_volatile()
};
let len = (0usize..).find(|i| unsafe {
((str_start + *i) as *const u8).read_volatile() == 0
}).unwrap();
v.push(
core::str::from_utf8(unsafe {
core::slice::from_raw_parts(str_start as *const u8, len)
}).unwrap()
);
}
exit(main(argc, v.as_slice()));
}

Redirection

Redirection usually represent using > and < for output and input.

If we really want to redirect IO, we will combine user_shell and sys_dup.

First, sys_dup will duplicate a new file descriptor already opened in this process.

Then we parse user arguments, if there exist > or <, fork a new child process, open the file and close our corresponding Stdin and Stdout descriptor, using dup to hold the place of it by file itself! Then exec by original parsed arguments, and receive results in parent process.

Chapter 6-2

Introduction

We gonna load our easy-fs to kernel. First, we need to know our device interface. Second, our Inode should be wrapped in OS as OSInode for extended functionality. Then we implement sys_read/write for it.

MMIO-Memory-Mapped I/O

The device registers of peripherals can be accessed through specific physical memory addresses, VirtIO MMIO physical address range for the peripheral bus is 4KiB starting at 0X10001000.

1
2
3
4
5
6
// os/src/config.rs

#[cfg(feature = "board_qemu")]
pub const MMIO: &[(usize, usize)] = &[
(0x10001000, 0x1000),
];

OS Inode

We only take restriction on our operations with readable and writable by OpenFlags. offset and Arc is a way to tackle simple situation of multi processes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// os/src/fs/inode.rs

bitflags! {
pub struct OpenFlags: u32 {
const RDONLY = 0;
const WRONLY = 1 << 0;
const RDWR = 1 << 1;
const CREATE = 1 << 9;
const TRUNC = 1 << 10;
}
}

impl OpenFlags {
/// Do not check validity for simplicity
/// Return (readable, writable)
pub fn read_write(&self) -> (bool, bool) {
if self.is_empty() {
(true, false)
} else if self.contains(Self::WRONLY) {
(false, true)
} else {
(true, true)
}
}
}

pub struct OSInode {
readable: bool,
writable: bool,
inner: Mutex<OSInodeInner>,
}

pub struct OSInodeInner {
offset: usize,
inode: Arc<Inode>,
}

impl OSInode {
pub fn new(
readable: bool,
writable: bool,
inode: Arc<Inode>,
) -> Self {
Self {
readable,
writable,
inner: Mutex::new(OSInodeInner {
offset: 0,
inode,
}),
}
}
}

File Descriptor Table

Now we need to connect file operations with process, each process need a descriptors table(which manage many files!) to indicate file record infos.

1
2
3
4
5
6
// os/src/task/task.rs

pub struct TaskControlBlockInner {
// ...
pub fd_table: Vec<Option<Arc<dyn File + Send + Sync>>>,
}

Remember our previous root inode? We load it directly for easy manipulation. So our workflow would be for current process, push a descriptor of allocation, and return the ptr of this allocation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// os/src/fs/inode.rs

lazy_static! {
pub static ref ROOT_INODE: Arc<Inode> = {
let efs = EasyFileSystem::open(BLOCK_DEVICE.clone());
Arc::new(EasyFileSystem::root_inode(&efs))
};
}

pub fn open_file(name: &str, flags: OpenFlags) -> Option<Arc<OSInode>> {
let (readable, writable) = flags.read_write();
if flags.contains(OpenFlags::CREATE) {
if let Some(inode) = ROOT_INODE.find(name) {
// clear size
inode.clear();
Some(Arc::new(OSInode::new(
readable,
writable,
inode,
)))
} else {
// create file
ROOT_INODE.create(name)
.map(|inode| {
Arc::new(OSInode::new(
readable,
writable,
inode,
))
})
}
} else {
ROOT_INODE.find(name)
.map(|inode| {
if flags.contains(OpenFlags::TRUNC) {
inode.clear();
}
Arc::new(OSInode::new(
readable,
writable,
inode
))
})
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// os/src/syscall/fs.rs

pub fn sys_open(path: *const u8, flags: u32) -> isize {
let task = current_task().unwrap();
let token = current_user_token();
let path = translated_str(token, path);
if let Some(inode) = open_file(
path.as_str(),
OpenFlags::from_bits(flags).unwrap()
) {
let mut inner = task.acquire_inner_lock();
let fd = inner.alloc_fd();
inner.fd_table[fd] = Some(inode);
fd as isize
} else {
-1
}
}

// os/src/syscall/fs.rs

pub fn sys_close(fd: usize) -> isize {
let task = current_task().unwrap();
let mut inner = task.inner_exclusive_access();
if fd >= inner.fd_table.len() {
return -1;
}
if inner.fd_table[fd].is_none() {
return -1;
}
inner.fd_table[fd].take();
0
}

The implementation is same for sys_read/write.

Chapter 7-2

Introduction

If a process want to notify other process with event semantics, such one-side mechanism called Signal, one process received specific event will pause and implement corresponding operation to handle the notification.

For example, a program could receive the stop event sended by Ctrl+C, and stop itself.

The abstraction of handling of signal:

  • ignore: do own thing and ignore signal
  • trap: call corresponding operation of the received signal
  • stop: stop itself

Now, beside this raw idea, we want to classify such abstraction with specified data.


Signal Data

First, we define raw info for each possible event.

1
2
3
4
5
6
7
8
9
10
11
12
13
// user/src/lib.rs

pub const SIGDEF: i32 = 0; // Default signal handling
pub const SIGHUP: i32 = 1;
pub const SIGINT: i32 = 2;
pub const SIGQUIT: i32 = 3;
pub const SIGILL: i32 = 4;
pub const SIGTRAP: i32 = 5;
pub const SIGABRT: i32 = 6;
pub const SIGBUS: i32 = 7;
pub const SIGFPE: i32 = 8;
pub const SIGKILL: i32 = 9;
...

So, what if a process want to omit the signal, what should this process do? We will introduce Mask in bit design, which means higher number contains lower number, indicating higher priority.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// user/src/lib.rs

bitflags! {
pub struct SignalFlags: i32 {
const SIGDEF = 1; // Default signal handling
const SIGHUP = 1 << 1;
const SIGINT = 1 << 2;
const SIGQUIT = 1 << 3;
const SIGILL = 1 << 4;
const SIGTRAP = 1 << 5;
...
const SIGSYS = 1 << 31;
}
}

In a task block, it should record its current mask and current signal priority and each action corresponding to each flags, so we need a fixed array contains ptrs and its priority. After that, we need to record current flag it should implement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// user/src/lib.rs

/// Action for a signal
#[repr(C, align(16))]
#[derive(Debug, Clone, Copy)]
pub struct SignalAction {
pub handler: usize,
pub mask: SignalFlags,
}

// os/src/task/signal.rs

pub const MAX_SIG: usize = 31;

// os/src/task/action.rs

#[derive(Clone)]
pub struct SignalActions {
pub table: [SignalAction; MAX_SIG + 1],
}

// os/src/task/task.rs

pub struct TaskControlBlockInner {
...
pub handling_sig: isize,
// priority allowed
pub signals: SignalFlags,
// priority forbidden
pub signal_mask: SignalFlags,
pub signal_actions: SignalActions,
...
}

Then our task know which signal should be implemented, which should be omitted.


Signal Handle

Recall that, each process should receive signal and trap into possible level, some may be in S-level, some may be in U-level. And some of them may be illegal or atrocious that we should stop or frozen to wait. If so, we should backup our trap_ctx, because handler contains different environement.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// os/src/task/task.rs

pub struct TaskControlBlockInner {
...
pub killed: bool,
pub frozen: bool,
pub handling_sig: isize,
pub trap_ctx_backup: Option<TrapContext>,
...
}

// os/src/task/mod.rs

// Some signals are severe and handled by kernel.
fn call_kernel_signal_handler(signal: SignalFlags) {
let task = current_task().unwrap();
let mut task_inner = task.inner_exclusive_access();
match signal {
SignalFlags::SIGSTOP => {
task_inner.frozen = true;
task_inner.signals ^= SignalFlags::SIGSTOP;
}
SignalFlags::SIGCONT => {
if task_inner.signals.contains(SignalFlags::SIGCONT) {
task_inner.signals ^= SignalFlags::SIGCONT;
task_inner.frozen = false;
}
}
_ => {
// println!(
// "[K] call_kernel_signal_handler:: current task sigflag {:?}",
// task_inner.signals
// );
task_inner.killed = true;
}
}
}

// Some signals are normal and handled by user.
fn call_user_signal_handler(sig: usize, signal: SignalFlags) {
let task = current_task().unwrap();
let mut task_inner = task.inner_exclusive_access();

let handler = task_inner.signal_actions.table[sig].handler;
if handler != 0 {
// register signal into task
task_inner.handling_sig = sig as isize;
task_inner.signals ^= signal;

// backup
let mut trap_ctx = task_inner.get_trap_cx();
task_inner.trap_ctx_backup = Some(*trap_ctx);

// modify current trap for our event handler
trap_ctx.sepc = handler;
trap_ctx.x[10] = sig;
} else {
// default action
println!("[K] task/call_user_signal_handler: default action: ignore it or kill process");
}
}

Based on this, we could check our pending signal based on priority of signals, signal_mask of task and signal_mask of signal itself of table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// os/src/task/mod.rs

fn check_pending_signals() {
for sig in 0..(MAX_SIG + 1) {
let task = current_task().unwrap();
let task_inner = task.inner_exclusive_access();
let signal = SignalFlags::from_bits(1 << sig).unwrap();
if task_inner.signals.contains(signal) && (!task_inner.signal_mask.contains(signal)) {
let mut masked = true;
let handling_sig = task_inner.handling_sig;
if handling_sig == -1 {
masked = false;
} else {
let handling_sig = handling_sig as usize;
if !task_inner.signal_actions.table[handling_sig]
.mask
.contains(signal)
{
masked = false;
}
}
if !masked {
drop(task_inner);
drop(task);
if signal == SignalFlags::SIGKILL
|| signal == SignalFlags::SIGSTOP
|| signal == SignalFlags::SIGCONT
|| signal == SignalFlags::SIGDEF
{
// signal is a kernel signal
call_kernel_signal_handler(signal);
} else {
// signal is a user signal
call_user_signal_handler(sig, signal);
return;
}
}
}
}
}

Then record a loop function to handle repeatedly while changing the state of task.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/task/mod.rs

pub fn handle_signals() {
loop {
check_pending_signals();
let (frozen, killed) = {
let task = current_task().unwrap();
let task_inner = task.inner_exclusive_access();
(task_inner.frozen, task_inner.killed)
};
if !frozen || killed {
break;
}
suspend_current_and_run_next();
}
}

System Operation

Finally, we will design sys operations to construct interface.

  • procmask: set mask of current process
  • sigaction: set handler of a signal of current process and move original handler to our input old_action ptr.
  • kill: current process send signal to the other
  • sigreturn: clear current signal and back to original trap state

We will construct it one by one.

procmask is simple, we just set it directly and return original one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/process.rs

pub fn sys_sigprocmask(mask: u32) -> isize {
if let Some(task) = current_task() {
let mut inner = task.inner_exclusive_access();
let old_mask = inner.signal_mask;
if let Some(flag) = SignalFlags::from_bits(mask) {
inner.signal_mask = flag;
old_mask.bits() as isize
} else {
-1
}
} else {
-1
}
}

sigaction is a bit harder but still easy, however, notice that the ptr may be null.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// os/src/syscall/process.rs

fn check_sigaction_error(signal: SignalFlags, action: usize, old_action: usize) -> bool {
if action == 0
|| old_action == 0
|| signal == SignalFlags::SIGKILL
|| signal == SignalFlags::SIGSTOP
{
true
} else {
false
}
}

pub fn sys_sigaction(
signum: i32,
action: *const SignalAction,
old_action: *mut SignalAction,
) -> isize {
let token = current_user_token();
let task = current_task().unwrap();
let mut inner = task.inner_exclusive_access();
if signum as usize > MAX_SIG {
return -1;
}
if let Some(flag) = SignalFlags::from_bits(1 << signum) {
if check_sigaction_error(flag, action as usize, old_action as usize) {
return -1;
}
let prev_action = inner.signal_actions.table[signum as usize];
*translated_refmut(token, old_action) = prev_action;
inner.signal_actions.table[signum as usize] = *translated_ref(token, action);
0
} else {
-1
}
}

kill is simple, we will extract the task from pid, and insert flag to it if there’s no flag has been set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// os/src/syscall/process.rs

fn check_sigaction_error(signal: SignalFlags, action: usize, old_action: usize) -> bool {
if action == 0
|| old_action == 0
|| signal == SignalFlags::SIGKILL
|| signal == SignalFlags::SIGSTOP
{
true
} else {
false
}
}

pub fn sys_sigaction(
signum: i32,
action: *const SignalAction,
old_action: *mut SignalAction,
) -> isize {
let token = current_user_token();
let task = current_task().unwrap();
let mut inner = task.inner_exclusive_access();
if signum as usize > MAX_SIG {
return -1;
}
if let Some(flag) = SignalFlags::from_bits(1 << signum) {
if check_sigaction_error(flag, action as usize, old_action as usize) {
return -1;
}
let prev_action = inner.signal_actions.table[signum as usize];
*translated_refmut(token, old_action) = prev_action;
inner.signal_actions.table[signum as usize] = *translated_ref(token, action);
0
} else {
-1
}
}

sigreturn is simple, because we only need to restore our backup one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// os/src/syscall/process.rs

pub fn sys_sigreturn() -> isize {
if let Some(task) = current_task() {
let mut inner = task.inner_exclusive_access();
inner.handling_sig = -1;
// restore the trap context
let trap_ctx = inner.get_trap_cx();
*trap_ctx = inner.trap_ctx_backup.unwrap();
0
} else {
-1
}
}

Phew! We finish our Signal mechanism!

Chapter 8-1

Introduction

As the growth of OS, dispatch resource could be divided to smaller piece for more efficient operations. Now, process can’t satisfied our demand, we want some programs could be implemented in parallel. Then, we introduce Thread.

Therefore, process will be the container of threads, each threads contain its id, state, current instruction ptr, registers, stack. However, it will share data(which means same memory and addr) in the same process. So, we will develop a accompany exclusion mechanism for parallel operations by each threads.

Design Data

Now, clarify our resource dispatch for one thread:

Immutable:

  • kernel stack

Mutable:

  • thread id
  • user stack
  • trap context
  • trap status
  • exit code

Every tasks is a thread unit and contained in one process, so now, process is really a process rather a task, it can owns many tasks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// os/src/task/process.rs

pub struct ProcessControlBlock {
// immutable
pub pid: PidHandle,
// mutable
inner: UPSafeCell<ProcessControlBlockInner>,
}

pub struct ProcessControlBlockInner {
// ...
pub task_res_allocator: RecycleAllocator,
pub tasks: Vec<Option<Arc<TaskControlBlock>>>,
}

Notice, we should separate user stack and kernel stack, we shouldn’t allocate user stack and kernel stack by same logic. Kernel stack is immutable, we only need its top place for trap context.

Because every thread use the same memory set, so each user stack and its trampoline would be allocated by its thread id. We encapsulate these to TaskUserRes data.

We can see many structure need a id allocation, we could design a general id allocator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// os/src/task/id.rs

pub struct RecycleAllocator {
current: usize,
recycled: Vec<usize>,
}

impl RecycleAllocator {
pub fn new() -> Self {
RecycleAllocator {
current: 0,
recycled: Vec::new(),
}
}
pub fn alloc(&mut self) -> usize {
if let Some(id) = self.recycled.pop() {
id
} else {
self.current += 1;
self.current - 1
}
}
pub fn dealloc(&mut self, id: usize) {
assert!(id < self.current);
assert!(
!self.recycled.iter().any(|i| *i == id),
"id {} has been deallocated!",
id
);
self.recycled.push(id);
}
}

Kernel Stack Allocation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// os/src/task/id.rs

lazy_static! {
static ref KSTACK_ALLOCATOR: UPSafeCell<RecycleAllocator> =
unsafe { UPSafeCell::new(RecycleAllocator::new()) };
}

pub struct KernelStack(pub usize);

/// Return (bottom, top) of a kernel stack in kernel space.
pub fn kernel_stack_position(kstack_id: usize) -> (usize, usize) {
let top = TRAMPOLINE - kstack_id * (KERNEL_STACK_SIZE + PAGE_SIZE);
let bottom = top - KERNEL_STACK_SIZE;
(bottom, top)
}

pub fn kstack_alloc() -> KernelStack {
let kstack_id = KSTACK_ALLOCATOR.exclusive_access().alloc();
let (kstack_bottom, kstack_top) = kernel_stack_position(kstack_id);
KERNEL_SPACE.exclusive_access().insert_framed_area(
kstack_bottom.into(),
kstack_top.into(),
MapPermission::R | MapPermission::W,
);
KernelStack(kstack_id)
}

impl Drop for KernelStack {
fn drop(&mut self) {
let (kernel_stack_bottom, _) = kernel_stack_position(self.0);
let kernel_stack_bottom_va: VirtAddr = kernel_stack_bottom.into();
KERNEL_SPACE
.exclusive_access()
.remove_area_with_start_vpn(kernel_stack_bottom_va.into());
}
}

We will do the same for user stack:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// os/src/config.rs

pub const TRAMPOLINE: usize = usize::MAX - PAGE_SIZE + 1;
pub const TRAP_CONTEXT_BASE: usize = TRAMPOLINE - PAGE_SIZE;

// os/src/task/id.rs

fn trap_cx_bottom_from_tid(tid: usize) -> usize {
TRAP_CONTEXT_BASE - tid * PAGE_SIZE
}

fn ustack_bottom_from_tid(ustack_base: usize, tid: usize) -> usize {
ustack_base + tid * (PAGE_SIZE + USER_STACK_SIZE)
}

Then, TaskUserRes could be allocated with trap and user stack.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// impl TaskUserRes
pub fn alloc_user_res(&self) {
let process = self.process.upgrade().unwrap();
let mut process_inner = process.inner_exclusive_access();
// alloc user stack
let ustack_bottom = ustack_bottom_from_tid(self.ustack_base, self.tid);
let ustack_top = ustack_bottom + USER_STACK_SIZE;
process_inner.memory_set.insert_framed_area(
ustack_bottom.into(),
ustack_top.into(),
MapPermission::R | MapPermission::W | MapPermission::U,
);
// alloc trap_cx
let trap_cx_bottom = trap_cx_bottom_from_tid(self.tid);
let trap_cx_top = trap_cx_bottom + PAGE_SIZE;
process_inner.memory_set.insert_framed_area(
trap_cx_bottom.into(),
trap_cx_top.into(),
MapPermission::R | MapPermission::W,
);
}

Now, combine all things together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// os/src/task/task.rs

pub struct TaskControlBlock {
// immutable
pub process: Weak<ProcessControlBlock>,
pub kstack: KernelStack,
// mutable
inner: UPSafeCell<TaskControlBlockInner>,
}

pub struct TaskControlBlockInner {
pub res: Option<TaskUserRes>,
pub trap_cx_ppn: PhysPageNum,
pub task_cx: TaskContext,
pub task_status: TaskStatus,
pub exit_code: Option<i32>,
}

Design Data Operation

We still get one task in operation rather process, because it’s the smallest instance unit. However, we need some interface to control process id.

1
2
3
4
5
6
7
pub fn add_task(task: Arc<TaskControlBlock>);
pub fn remove_task(task: Arc<TaskControlBlock>);
pub fn fetch_task() -> Option<Arc<TaskControlBlock>>;

pub fn pid2process(pid: usize) -> Option<Arc<ProcessControlBlock>>;
pub fn insert_into_pid2process(pid: usize, process: Arc<ProcessControlBlock>);
pub fn remove_from_pid2process(pid: usize);

Actually, many thing is same, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// os/src/task/process.rs

impl ProcessControlBlock {
pub fn new(elf_data: &[u8]) -> Arc<Self> {
// ...
let pid_handle = pid_alloc();
let process = ...;
let task = Arc::new(TaskControlBlock::new(
Arc::clone(&process),
ustack_base,
true,
));
// initiation of task...

let mut process_inner = process.inner_exclusive_access();
process_inner.tasks.push(Some(Arc::clone(&task)));
drop(process_inner);
insert_into_pid2process(process.getpid(), Arc::clone(&process));
// add main thread to scheduler
add_task(task);
process
}
}

If we fork a process, we only extract the first task which is itself, so no others will be copied.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
pub fn fork(self: &Arc<Self>) -> Arc<Self> {
let child = ...;
parent.children.push(Arc::clone(&child));
let task = Arc::new(TaskControlBlock::new(
Arc::clone(&child),
parent
.get_task(0)
.inner_exclusive_access()
.res
.as_ref()
.unwrap()
.ustack_base(),
// here we do not allocate trap_cx or ustack again
// but mention that we allocate a new kstack here
false,
));
let mut child_inner = child.inner_exclusive_access();
child_inner.tasks.push(Some(Arc::clone(&task)));
drop(child_inner);
...
}

Design System Operation

If we want to create a thread, as a naive designer, we only need the function entry addr, and arguments, yes! That’s it!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// os/src/syscall/thread.rs

pub fn sys_thread_create(entry: usize, arg: usize) -> isize {
let task = current_task().unwrap();
let process = task.process.upgrade().unwrap();
// create a new thread
let new_task = Arc::new(TaskControlBlock::new(
Arc::clone(&process),
task.inner_exclusive_access().res.as_ref().unwrap().ustack_base,
true,
));
// add new task to scheduler
add_task(Arc::clone(&new_task));
let new_task_inner = new_task.inner_exclusive_access();
let new_task_res = new_task_inner.res.as_ref().unwrap();
let new_task_tid = new_task_res.tid;
let mut process_inner = process.inner_exclusive_access();
// add new thread to current process
let tasks = &mut process_inner.tasks;
while tasks.len() < new_task_tid + 1 {
tasks.push(None);
}
tasks[new_task_tid] = Some(Arc::clone(&new_task));
let new_task_trap_cx = new_task_inner.get_trap_cx();
*new_task_trap_cx = TrapContext::app_init_context(
entry,
new_task_res.ustack_top(),
kernel_token(),
new_task.kstack.get_top(),
trap_handler as usize,
);
(*new_task_trap_cx).x[10] = arg;
new_task_tid as isize
}

Now, sys_exit will receive a exit_code and recycle its resource. Notice, if tid == 0, the thread of process itself will make other sub threads moved to init process.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// pub fn exit_current_and_run_next(exit_code: i32) {
// ...
{
let mut initproc_inner = INITPROC.inner_exclusive_access();
for child in process_inner.children.iter() {
child.inner_exclusive_access().parent = Some(Arc::downgrade(&INITPROC));
initproc_inner.children.push(child.clone());
}
}

let mut recycle_res = Vec::<TaskUserRes>::new();
for task in process_inner.tasks.iter().filter(|t| t.is_some()) {
let task = task.as_ref().unwrap();
remove_inactive_task(Arc::clone(&task));
let mut task_inner = task.inner_exclusive_access();
if let Some(res) = task_inner.res.take() {
recycle_res.push(res);
}
}

sys_waittid will check thread state and recycle if could, return -2 if it has not exited. Why need it? Because sys_exit can’t recycle itself unless the thread of process, other thread can call waittid to remove it from tasks queue, then it will be cleared by rust!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// os/src/syscall/thread.rs

/// thread does not exist, return -1
/// thread has not exited yet, return -2
/// otherwise, return thread's exit code
pub fn sys_waittid(tid: usize) -> i32 {
let task = current_task().unwrap();
let process = task.process.upgrade().unwrap();
let task_inner = task.inner_exclusive_access();
let mut process_inner = process.inner_exclusive_access();
// a thread cannot wait for itself
if task_inner.res.as_ref().unwrap().tid == tid {
return -1;
}
let mut exit_code: Option<i32> = None;
let waited_task = process_inner.tasks[tid].as_ref();
if let Some(waited_task) = waited_task {
if let Some(waited_exit_code) = waited_task.inner_exclusive_access().exit_code {
exit_code = Some(waited_exit_code);
}
} else {
// waited thread does not exist
return -1;
}
if let Some(exit_code) = exit_code {
// dealloc the exited thread
process_inner.tasks[tid] = None;
exit_code
} else {
// waited thread has not exited
-2
}
}

Chapter 8-2

Introduction

We will develop exclusion mechanism previously mentioned.

Beside construction, we need to abstract possible situation of data sharing. A usual native thought is a thread want to modify one thing but due to thread switch, the data is already modified and we get wrong result. So based on this, we want a operation to be Atomic, which means the operation excluding others. Now we can alleviate this restriction and generalize this.

Generalization:

  • Allow multiple but finite thread can join one atomic operation.
  • Allow condition of atomic operation.

Before such generalization, we want a way to represent atomic operation. We call the content of this operation Critical Section, and multiple threads operations in indeterminate time sequence Race Condition. So the basic problem of data sharing push us to identify multiple different operations by different threads, we can’t restrict data because the problem is on modification by threads, we need to Lock operations!

So, it there’s a lock sharing by threads, each threads can declare Lock it!, and no other threads can access this thread again.

Now, back to our generalization. If this lock has a bound of access number, many can access until reaching a bound. That’s also a reasonable design, we call this Semaphore; If this lock has a signal which one thread can send it to others to allow others to access it, That’s also a reasonable design, we call this Condition Variable.

If the real minimal sharing thing is Lock rather than data, we can discard so called data problem, and focus on lock itself, each threads can do anything in this lock and excluding others.

Design

No matter which kinds of lock, this is shared among threads.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
pub struct ProcessControlBlock {
// immutable
pub pid: PidHandle,
// mutable
inner: UPSafeCell<ProcessControlBlockInner>,
}
pub struct ProcessControlBlockInner {
...
pub lock_list: Vec<Option<Arc<Lock>>>,
}

pub struct Lock {
pub inner: UPSafeCell<LockInner>,
}
pub struct LockInner {
pub data: ...
pub wait_queue: VecDeque<Arc<TaskControlBlock>>,
}

In such design, one lock can push one thread to wait_queue to stop it, and pop front to start it. data is a generalization for various locks.

Then, in one process, it owns many locks used in various conditions, one can easily take it as a generalization of many data(actually nothing related to real data) we want to share.

Basic Lock

Now, we want to construct a basic lock allowing simple lock, unlock operation.

1
2
3
4
pub trait Mutex: Sync + Send {
fn lock(&self);
fn unlock(&self);
}

Usually, there’s U-level, M-level, S-level implementation. First, we gonna try first one easily, knowing the heuristic design of M-level, and extend basic thought to S-level.


U-level

A naive approach is to declare a global boolean indicating block state. lock will wait if the boolean is true and try to set it to true, and unlock will set it to false to release.

1
2
3
4
5
6
7
8
9
10
static mut mutex :i32 = 0;

fn lock(mutex: i32) {
while (mutex);
mutex = 1;
}

fn unlock(mutex: i32){
mutex = 0;
}

However, that’s wrong! We can’t construct lock by things we want to lock! Threads can jump in any instructions and break it! That’s means we can’t do it in U-level? We should ponder further in real situation, imagine two threads modify one thing in nearly same time, if we could set two global state in a operation that excluding each other(for example, one state set to 1 and another state set to 0), then only one operation can really be implemented and we can check this condition, allow it to get the lock.

1
2
3
4
5
6
7
8
9
10
11
12
static mut flag : [i32;2] = [0,0]; // 哪个线程想拿到锁?
static mut turn : i32 = 0; // 排号:轮到哪个线程? (线程 0 or 1?)

fn lock() {
flag[self] = 1; // 设置自己想取锁 self: 线程 ID
turn = 1 - self; // 设置另外一个线程先排号
while ((flag[1-self] == 1) && (turn == 1 - self)); // 忙等
}

fn unlock() {
flag[self] = 0; // 设置自己放弃锁
}

Now analyze the code, we find that no matter which flag is 1, or both 1, indicating certain thread want to get lock, turn will be a excluding state to flag, which means if another thread modify turn in same time, the turn can only be in one of the state and only one thread can get the lock.


M-level

Is there any predefined operation in instructions that is atomic? Then we can use it as a lock. The answer is Yes, in RISC-V, it’s:

  • AMO: Atomic memory operation
  • LR/SC: Load Reserved/Store Conditional

AMO: will read the value in memory and write new value, then store the old value to target register(s.t. amoadd.w rd, rs2, (rs1)).

LR/SC: LR will read memory and store in target register, and leave the addr of this memory, then SC could check the addr and write data to this addr, output a condition(0/1) to target register.(s.t. lr.w rd, (rs1), sc.w rd, rs2, (rs1))

We can use it to implement a atomic function:

1
2
3
4
5
6
7
8
9
10
# RISC-V sequence for implementing a TAS  at (s1)
li t2, 1 # t2 <-- 1
Try: lr t1, s1 # t1 <-- mem[s1] (load reserved)
bne t1, x0, Try # if t1 != 0, goto Try:
sc t0, s1, t2 # mem[s1] <-- t2 (store conditional)
bne t0, x0, Try # if t0 !=0 ('sc' Instr failed), goto Try:
Locked:
... # critical section
Unlock:
sw x0,0(s1) # mem[s1] <-- 0

Here the logic of Try is mem[s1] would be zero if it’s unlocked, and would be non-zero if it’s locked. So, Try will compare t1 and x0, actually mem[s1] and 0, if equal to zero, then try to store t2 into s1, if succeed, it will compare it again for the output signal t0 and x0, actually the output signal and 0, if succeed, it will jump out otherwise repeat.In this process, if the write operation failed, t0 would be non-zero, and repeat in Try.

If we want to Unlock, we write x0 to s1 to set mem[s1] to zero. Which is the unlocked state.

S-level

Then we could take the function to rust and package it. A simple refactor is when we in repetition loop, we yield, and give CPU to others.


Now, for any kinds of locks, we could apply it to our structure.

First, when we create a lock, we create and push it to list or set in empty element.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// os/src/syscall/sync.rs
pub fn sys_mutex_create(blocking: bool) -> isize {
let process = current_process();
let mutex: Option<Arc<dyn Mutex>> = if !blocking {
Some(Arc::new(MutexSpin::new()))
} else {
Some(Arc::new(MutexBlocking::new()))
};
let mut process_inner = process.inner_exclusive_access();
if let Some(id) = process_inner
.mutex_list
.iter()
.enumerate()
.find(|(_, item)| item.is_none())
.map(|(id, _)| id)
{
process_inner.mutex_list[id] = mutex;
id as isize
} else {
process_inner.mutex_list.push(mutex);
process_inner.mutex_list.len() as isize - 1
}
}

When we call lock, we should provide corresponding id of the lock, if it’s already locked, we push to wait_queue, else we lock it and goes on.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// os/src/syscall/sync.rs
pub fn sys_mutex_lock(mutex_id: usize) -> isize {
let process = current_process();
let process_inner = process.inner_exclusive_access();
let mutex = Arc::clone(process_inner.mutex_list[mutex_id].as_ref().unwrap());
drop(process_inner);
drop(process);
mutex.lock();
0
}
// os/src/sync/mutex.rs
impl Lock for MutexBlocking {
fn lock(&self) {
let mut mutex_inner = self.inner.exclusive_access();
if ... {
mutex_inner.wait_queue.push_back(current_task().unwrap());
// ... other operations
drop(mutex_inner);
block_current_and_run_next();
} else {
// ... other operations
}
}
}

Same reverse operation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// os/src/syscall/sync.rs
pub fn sys_mutex_unlock(mutex_id: usize) -> isize {
let process = current_process();
let process_inner = process.inner_exclusive_access();
let mutex = Arc::clone(process_inner.mutex_list[mutex_id].as_ref().unwrap());
drop(process_inner);
drop(process);
mutex.unlock();
0
}
// os/src/sync/mutex.rs
impl Mutex for MutexBlocking {
fn unlock(&self) {
let mut mutex_inner = self.inner.exclusive_access();
// ... other operation
if ... {
if let Some(waking_task) = mutex_inner.wait_queue.pop_front() {
add_task(waking_task);
}
}
}
}

Semaphore

It’s simple, we only need to switch boolean to number and check the bound. So, the initiated count is the bound, if one thread access, it will minus one, and release, add one. We only need to check positive or negative.

Apply our structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
pub fn up(&self) {
let mut inner = self.inner.exclusive_access();
inner.count += 1;
if inner.count <= 0 {
if let Some(task) = inner.wait_queue.pop_front() {
add_task(task);
}
}
}

pub fn down(&self) {
let mut inner = self.inner.exclusive_access();
inner.count -= 1;
if inner.count < 0 {
inner.wait_queue.push_back(current_task().unwrap());
drop(inner);
block_current_and_run_next();
}
}

If the initiated count equal to 1, we back to mutex!, which indicates sole thread access!

Actually, we could use it for synchronization operation, we set count to 0, if one thread access, it will be blocked, and another thread will could release and add one to count, then the original thread finally could access. Then the second thread will always be advanced to first one.

Here, the first is always advanced to second.

1
2
3
4
5
6
7
8
9
10
11
12
13
const SEM_SYNC: usize = 0; //信号量ID
unsafe fn first() -> ! {
sleep(10);
println!("First work and wakeup Second");
semaphore_up(SEM_SYNC); //信号量V操作
exit(0)
}
unsafe fn second() -> ! {
println!("Second want to continue,but need to wait first");
semaphore_down(SEM_SYNC); //信号量P操作
println!("Second can work now");
exit(0)
}

Conditional Variable

If we want one thread owns the ability of release lock for others, we need the CondVar. We have to dispatch operation in wait_queue, if one thread signal others, it will pop out a thread, which means trigger it You are free!. And if one thread wait, it will push itself to queue to wait, The unlock and lock is important because in wait operation, it allow other thread to modify condition, but it should be after of the push operation, in case that the signal is before the push, then we can never receive the signal again! We won’t encapsulate condition check to CondVar because it should leave to user to design it, we only leave out interface for user.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
pub fn signal(&self) {
let mut inner = self.inner.exclusive_access();
if let Some(task) = inner.wait_queue.pop_front() {
add_task(task);
}
}
pub fn wait(&self, mutex: Arc<dyn Mutex>) {
let mut inner = self.inner.exclusive_access();
inner.wait_queue.push_back(current_task().unwrap());
drop(inner);
mutex.unlock();
block_current_and_run_next();
mutex.lock();
}

However, if condition check is leave out to user, we can’t ensure the condition be violated due to data sharing, so usually we need to append mutex lock for this section.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static mut A: usize = 0;   //全局变量

const CONDVAR_ID: usize = 0;
const MUTEX_ID: usize = 0;

unsafe fn first() -> ! {
sleep(10);
println!("First work, Change A --> 1 and wakeup Second");
mutex_lock(MUTEX_ID);
A=1;
condvar_signal(CONDVAR_ID);
mutex_unlock(MUTEX_ID);
...
}
unsafe fn second() -> ! {
println!("Second want to continue,but need to wait A=1");
mutex_lock(MUTEX_ID);
while A==0 {
condvar_wait(CONDVAR_ID, MUTEX_ID);
}
mutex_unlock(MUTEX_ID);
...
}

We can see that if A=1, second won’t wait repeatly, and goes out.