0%

总结

第二阶段的强度慢慢上来了,由于平时要上班和前期收不到短信验证码的问题,不能跟上每周的直播课,主要通过学习tutorial文档和课后看录频和课件的方式学习。本阶段对内存管理,进程管理和文件系统, 线程进行了系统的学习。特别是内存映射和文件系统章节,学习到很多。同时,随着代码量增大,学习的同时画UML图对理解代码和整个框架也很有帮助。

前导

本博客作为开源操作系统训练营2025S的1、2阶段学习记录,简单总结了在这两个阶段的学习和coding。留作纪念,也希望能够帮助到大家。

Read more »

2025年春夏开源操作系统训练营三阶段总结

第一阶段总结

第一次写rust,被所有权干烂了,不停和编译器搏斗+不停拷问AI终于磕磕绊绊写完了,以前习惯于C++的引用和指针乱飘了,写rust确实比C和C++安全,因为有编译器的所有权机制保障。

第二阶段总结

因为以前写过NJUPA(极其推荐),所以对上下文切换有一点概念。rcore-os的上下文切换采用了xv6类似的方式,在内核内部做控制流切换来切换执行的用户程序。对于操作系统中常用的锁,rcore采用了单核中的使用rust所有权机制来实现,上锁的时候将所有权转交给函数,尝试二次获取的时候就会因为所有权引发rust错误,避免了内核的死锁的情况。cargo的编译链接功能确实比GNU那一套方便,rust的属性语法支持也比C++要好很多。

地址空间和页表中极多的用来表示地址组成的结构体,充分体现了rust面向对象的特性和into的语法,增加了代码的可读性,以前用C系的写也就使用各种宏。rcore用了额外的BTreeMap来支持快速查询页表。作业题的实现也大多是调用现有的函数就差不多可以了。

到文件系统这里比较新鲜,因为以前没有怎么接触过,文件系统的实现层层嵌套,层层抽象,一个功能实现往往要修改多层,理解起来比较困难。

并发的作业是用Coffman提出的方法做死锁检测,开始的时候理解成了银行家算法,读作业提示的时候也没有读懂,卡了很长时间,在群u的提示下才通过。后来在上操作系统课的时候翻到书上才知道这是Coffman的死锁检测算法(书上的描述其实和作业里写得差不多)。

第三阶段

第三阶段换了个内核模式叫unikernel,不太习惯这样的组件化操作系统,还是习惯以前的传统宏内核结构。

print_with_color很简单,理解一下arceos提供的axstd模块,为内核应用提供了运行时环境。

support_hash_map在理解了rust可以自定义堆分配器后,唯一要解决的就是rust中如何得到一个hashable值的hash值,然后做简单的映射即可,测试不需要特别追求效率。

alt_alloc主要是实现一下需要的接口,了解如何向arceos添加自己的分配器,bump allocator可以说是最简单的分配器了。

ramfs_rename要把cargo里的包换成arceos下面的(不然文件系统的包永远是网上拉下来的),找到对traitVfsOpsVfsNodeOps的实现之后,搞清楚记录文件的数据结构,实现rename就可以了。

sys_map同样搞清楚task_ext提供的接口就可以直接调库了,第一次实现忘记输入地址是NULL的话要由内核自己找地址,遂发现了一个叫做find_free_area的妙妙工具。

simple_hv最简单的一次,同样的由操作系统代为执行一些指令的手法可以用来帮用户程序加载一些未对齐的数据(如果处理器不支持)。

第二阶段总结:操作系统学习的感悟与理解

经过这一阶段对操作系统核心内容的系统学习和实践,我对操作系统的本质、关键机制以及与 Rust 结合的优势有了更深刻的理解。以下是我的主要体会和总结:


一、操作系统的本质与核心任务

操作系统是管理硬件资源、为上层应用提供抽象和隔离的基础软件。它的核心任务包括:

  • 进程与线程管理:通过进程(资源分配单位)和线程(调度单位)实现多任务并发,保障系统的响应性和资源利用率。PCB(进程控制块)和 TCB(线程控制块)是操作系统调度和管理的核心数据结构。
  • 内存管理:通过虚拟内存、分页机制和权限控制,实现进程隔离、内存高效分配与回收。页表、TLB、懒分配、写时复制等机制极大提升了系统的安全性和性能。
  • 进程调度:采用多种调度算法(如时间片轮转、优先级、MLFQ等)实现公平与高效的 CPU 分配。上下文切换和调度策略直接影响系统吞吐和响应速度。
  • 系统调用接口:为用户程序提供受控访问硬件和内核资源的通道,实现用户态与内核态的安全切换。
  • 硬件抽象与性能优化:通过缓存、TLB、ASID 等机制优化访问速度,利用中断和异常机制实现高效的事件响应。

Read more »

之前对操作系统有一定理论基础,rcore 和 arceos 项目对我最大的挑战主要包括:

  1. risc-v 体系结构的知识,尤其是特权架构。这对理解 trap、context_switch、地址空间相关的代码极其重要。
  2. arceos 项目的组织构建。最底层是 axhal,抽象了硬件架构和运行平台,往上是各个 module 例如 axtask 等,再向上是 axapi 乃至 ulib。这种组件化的设计思想充分利用的 rust 语言的优势,极大方便构建。

unikernel 架构是没有特权级切换的,应用程序也运行在 s 态。刚开始没有仔细理解 ppt,给我造成了挺大的困扰。

hashmap 的实验我并没有自己手写代码,而是直接引入了 hashbrown 库。但手撕一下代码应该能更加锻炼能力。

此外,hypervisor 给我带来了挺大的困难,参考其他同学的经验我才得以通过。

Chapter 3

Introduction

We need to place multiple app to multiple memory address to run app in cycle. Rather run once and clear for next.

First We want to place each app to each isolated addr, due to our kernel restriction, we need to load it with build.py.


Task

Task: a workflow process

Define every time slice of Task as Task Slice

Define the switch between app as Task switching

We need to store Task Context

Design:
switch

We will store these register in ctx:

1
2
3
4
5
6
7
// os/src/task/context.rs

pub struct TaskContext {
ra: usize,
sp: usize,
s: [usize; 12],
}
1
2
3
4
5
6
7
8
9
10
11
12
.altmacro
.macro SAVE_SN n
sd s\n, (\n+2)*8(a0)
.endm
.macro LOAD_SN n
ld s\n, (\n+2)*8(a1)
.endm
.section .text
.globl __switch
__switch:
...
# logic by store previous context and take out the new one to current register

Expose to Rust

1
2
3
4
5
6
7
8
9
10
11
12
// os/src/task/switch.rs

global_asm!(include_str!("switch.S"));

use super::TaskContext;

extern "C" {
pub fn __switch(
current_task_cx_ptr: *mut TaskContext,
next_task_cx_ptr: *const TaskContext
);
}

We will design TaskManager:

  • Store each App state array and current running app.
  • each state store TaskContext and TaskState for running or exited etc…
  • Init and ready by store the __restore ctx to TaskContext
  • Run for switch cx if needed.
1
2
3
4
5
6
7
8
9
10
let current_task_cx_ptr = &mut inner.tasks[current].task_cx as *mut TaskContext;
let next_task_cx_ptr = &inner.tasks[next].task_cx as *const TaskContext;
drop(inner);
// before this, we should drop local variables that must be dropped manually
unsafe {
__switch(
current_task_cx_ptr,
next_task_cx_ptr,
);
}rust

Dispatch Design

Collaboration

Manually design interface yield for App to use

1
2
3
4
5
6
7
pub fn sys_yield() -> isize {
syscall(SYSCALL_YIELD, [0, 0, 0])
}

// user/src/lib.rs

pub fn yield_() -> isize { sys_yield() }

But it can be inefficient for some case that app already done its work but reluctant to exit.

Preemptive

We will design interrupt clock in a fixed time bound to force switch between app.

  • Set timer design and get time
  • Set timer for trigger
  • enable timer and handle the interrupt cause of Timer in ecall

You should know this as a pre-knowledge:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
const SBI_SET_TIMER: usize = 0;

pub fn set_timer(timer: usize) {
sbi_call(SBI_SET_TIMER, timer, 0, 0);
}

// os/src/timer.rs

use crate::config::CLOCK_FREQ;
const TICKS_PER_SEC: usize = 100;

pub fn set_next_trigger() {
set_timer(get_time() + CLOCK_FREQ / TICKS_PER_SEC);
}

// os/src/trap/mod.rs

match scause.cause() {
Trap::Interrupt(Interrupt::SupervisorTimer) => {
set_next_trigger();
suspend_current_and_run_next();
}
}

// enable supervisor clock
// os/src/trap/mod.rs

use riscv::register::sie;

pub fn enable_timer_interrupt() {
unsafe { sie::set_stimer(); }
}

Chapter 2

Introduction

Introduction

It corresponds to riscv:

  • Privilege for S(guaranteed by Supervisor Execution Environment of RustSBI)
  • User for U(constructed in current chapter as Application Execution Environment)

Reason:

  • Safety(Prevent app from accessing kernel)
  • Recoverable

Workflow:

  • Start application and user-mode context
  • Trap(Called by system level) to handle system
    • Goes wrong! Kill it!
    • Finish! Next!
  • Restore to user-mode context

riscv designs following CSR(Control and Status Register) to handle this:

CSR

CSR

Begin Trap:

  • sstatus: SPP seg to the current level of CPU.
  • sepc: next addr after Trap finished.
  • scause/stval: Trap cause and additional info.
  • stvec: storage of entry addr of Trap

stvec is a 64-bit CSR, with:

  • MODE(Direct/Vectored) [1:0](read from right to left): 2-bits
  • BASE [63:2]: 62-bits

finally, it will return by instruction sret which will change level and jump by sepc.

Construct Trap

Design:

  • General register will be shared by U-level and S-level.
  • Maintain a reasonable state of CSR.
  • Separate workflow of U-level and S-level by stack

Construct:

  • build KernelStack and UserStack for separation
  • in KernelStack, we store TrapContext in it, by asm and rust to control dispatch and handle, then store the code to stvec as the entry of Trap.
  • restore register for UserStack by push a new context refer to UserStack.

build stack and push context:

1
2
3
4
5
6
7
8
9
10
11
12
13
// stack struct ...

// buttom to top
fn get_sp(&self) -> usize {
self.data.as_ptr() as usize + KERNEL_STACK_SIZE
}
pub fn push_context(&self, cx: TrapContext) -> &'static mut TrapContext {
let cx_ptr = (self.get_sp() - core::mem::size_of::<TrapContext>()) as *mut TrapContext;
unsafe {
*cx_ptr = cx;
}
unsafe { cx_ptr.as_mut().unwrap() }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// os/src/trap/context.rs

#[repr(C)]
pub struct TrapContext {
pub x: [usize; 32], // General register
pub sstatus: Sstatus,
pub sepc: usize,
}
/// set stack pointer to x_2 reg (sp)
pub fn set_sp(&mut self, sp: usize) {
self.x[2] = sp;
}
/// init app context
pub fn app_init_context(entry: usize, sp: usize) -> Self {
let mut sstatus = sstatus::read(); // CSR sstatus
sstatus.set_spp(SPP::User); //previous privilege mode: user mode
let mut cx = Self {
x: [0; 32],
sstatus,
sepc: entry, // entry point of app
};
cx.set_sp(sp); // app's user stack pointer
cx // return initial Trap Context of app
}

We will design __alltrap and __restore for operation by asm and part of rust:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
.altmacro
.macro SAVE_GP n
sd x\n, \n*8(sp)
.endm
.macro LOAD_GP n
ld x\n, \n*8(sp)
.endm
.align_2
__alltraps:
...
# set input argument of trap_handler(cx: &mut TrapContext)
mv a0, sp # sp is point to TrapContext in kernel stack
call trap_handler # (&mut TrapContext)

--restore:
...

To handle Trap context, we will use riscv lib:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// os/Cargo.toml

[dependencies]
riscv = { git = "https://github.com/rcore-os/riscv", features = ["inline-asm"] }

// os/src/trap/mod.rs
global_asm!(include_str!("trap.S"));

pub fn init() {
extern "C" { fn __alltraps(); }
// write to stvec
unsafe {
stvec::write(__alltraps as usize, TrapMode::Direct);
}
}

#[no_mangle]
pub fn trap_handler(cx: &mut TrapContext) -> &mut TrapContext {
...
}

restore operation:

1
2
3
4
5
6
7
extern "C" { fn __restore(cx_addr: usize); }
unsafe {
__restore(KERNEL_STACK.push_context(
TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp())
// This context store the ptr to UserStack for restoration
) as *const _ as usize);
}

Construct User App

  • Link app binary to kernel with specify memory layout
  • Read the layout, use AppManager to maintain and store
  • Load app from memory layout, copy consecutively to APP_BASE_ADDRESS(Currently we have no ability to dynamically read address)
  • AppManager will run each app
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# os/src/link_app.S

.align 3
.section .data
.global _num_app # read from the ptr
_num_app:
.quad 5
.quad app_0_start
.quad app_1_start
.quad app_2_start
.quad app_3_start
.quad app_4_start
.quad app_4_end`

...

Design it!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/batch.rs

struct AppManager {
num_app: usize,
current_app: usize,
app_start: [usize; MAX_APP_NUM + 1],
}

// part of read in static init of AppManager
let num_app_ptr = _num_app as usize as *const usize;
let num_app = num_app_ptr.read_volatile();
let mut app_start: [usize; MAX_APP_NUM + 1] = [0; MAX_APP_NUM + 1];
let app_start_raw: &[usize] = core::slice::from_raw_parts(
num_app_ptr.add(1), num_app + 1
);
app_start[..=num_app].copy_from_slice(app_start_raw);

Load App:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// part of code of copying to kernel
asm!("fence.i");
// clear app area
core::slice::from_raw_parts_mut(
APP_BASE_ADDRESS as *mut u8,
APP_SIZE_LIMIT
).fill(0);
let app_src = core::slice::from_raw_parts(
self.app_start[app_id] as *const u8,
self.app_start[app_id + 1] - self.app_start[app_id]
);
let app_dst = core::slice::from_raw_parts_mut(
APP_BASE_ADDRESS as *mut u8,
app_src.len()
);
app_dst.copy_from_slice(app_src);

Run each app!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// os/src/batch.rs

pub fn run_next_app() -> ! {
let mut app_manager = APP_MANAGER.exclusive_access();
let current_app = app_manager.get_current_app();
unsafe {
app_manager.load_app(current_app);
}
app_manager.move_to_next_app();
drop(app_manager);
// before this we have to drop local variables related to resources manually
// and release the resources
extern "C" { fn __restore(cx_addr: usize); }
unsafe {
__restore(KERNEL_STACK.push_context(
TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp())
) as *const _ as usize);
}
panic!("Unreachable in batch::run_current_app!");
}

// main logic:
// os/src/main.rs

...above code
// load entry for trap
trap::init()
// load app
batch::run_next_app()

Chapter 4

Introduction

We don’t want a fixed physical addr for allocation, rather, we want a unified abstract interface for dynamic memory layout for app storage. We call it Virtual Address

Safety: Every app will be allocated in its own virtual memory space, so each can’t interfere others.

Efficiency: Every app can coexist in same time without demand of reading outer peripherals to switch app.(With development of memory size)

We need MMU(Memory Management Unit) to achieve Address Translation for interview from virtual to physical.

Different designs occur.


Design

Segment Design

alt text

Each app exist in one fixed slot for one segment as $[0,bound)$, with a linear map by MMU.

Problem: Wasteful and inflexible

We may want a different linear map for each app, for example, its allocation for heap, data, code etc… So we can dispatch memory in more finer style, but it can’t resolve the problem because now even slot is dynamically allocated, it may still exist some free memory too small to reuse, cause the External Fragment rather than Internal Fragment which is the problem due to fixed slot.

Paging Design

alt text
We could set a Protection bit as r for read, w for write, x for execution.

Another way is to inverse our mind, rather take a slot on memory, we take slot on MMU, it can map its slot(now we call it Page) for real physical memory layout. To adjust, we can take slice for Page to form Frame which is a smaller unit to suit physical memory layout, each app can take many Page Number for a Page Table, record the map.

Page Design

alt text

SV39 only use lower 39 bits rather whole 64 bits in design even bit width is 64 bits(it’s a fairly large amount!)

In a address, $[11:0]$ called Page Offset, and $[38:12]$ is the VPN which will be used for location of page and use offset to direct in one page(with 4KiB in one page).

We should modify satp to open Paging for S and U-level memory.

1
2
const PAGE_SIZE: usize = 4096
const PAGE_SIZE_BIT: usize = 12

Page Size and offset to slice physical addr.

1
2
3
4
5
6
// os/src/mm/address.rs

impl PhysAddr {
pub fn floor(&self) -> PhysPageNum { PhysPageNum(self.0 / PAGE_SIZE) }
pub fn ceil(&self) -> PhysPageNum { PhysPageNum((self.0 + PAGE_SIZE - 1) / PAGE_SIZE) }
}

Page Entry to bundle permission and physical page.

1
2
3
4
5
6
pub fn ppn(&self) -> PhysPageNum {
(self.bits >> 10 & ((1usize << 44) - 1)).into()
}
pub fn flags(&self) -> PTEFlags {
PTEFlags::from_bits(self.bits as u8).unwrap()
}

Usually, a simple design for page manager would be a linear map from base addr and follow up. But actually it will take a huge amount of memory due to the storage of offset by base addr for each app.

A finer design is from Trie. We will take index slice for each 8 bits(it will fit in to 4KB just right!), it means for each slice has 512 states, and link those state up, form a tree. Usually with 3-level for Page Index. Total 27 bits.

Virtual Page will reserve 27 bits for the index slice and 12 bits for offset for certain page. Total 39 bits.

When we transform a virtual addr to physical one, we will do the following exposition reversely.

Page Management Design

A simple allocation for page(rather management!) is stack style.

1
2
3
4
5
pub struct StackFrameAllocator {
current: usize, //空闲内存的起始物理页号
end: usize, //空闲内存的结束物理页号
recycled: Vec<usize>,
}

Based on Allocation, we can design Page Table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// os/src/mm/frame_allocator.rs

pub struct FrameTracker {
pub ppn: PhysPageNum,
}

impl FrameTracker {
pub fn new(ppn: PhysPageNum) -> Self {
// page cleaning
let bytes_array = ppn.get_bytes_array();
for i in bytes_array {
*i = 0;
}
Self { ppn }
}
}

// os/src/mm/page_table.rs

pub struct PageTable {
root_ppn: PhysPageNum,
frames: Vec<FrameTracker>,
}

It means for one physical page, we will record each allocation by vector of FrameTracker as a representation of real Frame after the root.

We should design transformation from virtual addr to physical addr.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// for each idxs represent index slices of virtual addr.
for i in 0..3 {
let pte = &mut ppn.get_pte_array()[idxs[i]];
if i == 2 {
result = Some(pte);
break;
}
if !pte.is_valid() {
let frame = frame_alloc().unwrap();
// create or return None
*pte = PageTableEntry::new(frame.ppn, PTEFlags::V);
self.frames.push(frame);
}
ppn = pte.ppn();
}

Page Map Design

Based on our abstraction, we need a design for MapArea to given a map from continuous virtual address(no matter their corresponding page!) to physical address.

1
2
3
4
5
6
7
8
// os/src/mm/memory_set.rs

pub struct MapArea {
vpn_range: VPNRange,
data_frames: BTreeMap<VirtPageNum, FrameTracker>,
map_type: MapType,
map_perm: MapPermission,
}

Based on continuous virtual address map, we can define discontinuous map for one app.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/mm/memory_set.rs

pub struct MemorySet {
page_table: PageTable,
areas: Vec<MapArea>,
}

// impl MemorySet
fn push(&mut self, mut map_area: MapArea, data: Option<&[u8]>) {
// map range of virtual addr in allocation in page_table
map_area.map(&mut self.page_table);
if let Some(data) = data {
map_area.copy_data(&mut self.page_table, data);
}
self.areas.push(map_area);
}

In Each MapArea allocated for some key-value for virtual-physical addr, it will allocate the same for PageTable for Frame.


Allocation Space

To open SV39, we should write instruction for satp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// os/src/mm/page_table.rs

pub fn token(&self) -> usize {
8usize << 60 | self.root_ppn.0
}

// os/src/mm/memory_set.rs

impl MemorySet {
pub fn activate(&self) {
let satp = self.page_table.token();
unsafe {
satp::write(satp);
asm!("sfence.vma" :::: "volatile");
// sfence.vma is a special instruction to clear `Translaton Lookaside Buffer` which is used for quick search for memory addr to reduce performance expenses.
}
}
}

Therefore, it will fill current root of physical page number as activation.

Notice that we should make instruction contigeous for SV39 open in physical address to amend the gap of transformation of address before and after open.

Kernel

Initiation for Kernel memory layout.

1
2
3
4
5
6
7
8
9
10
11
let mut memory_set = Self::new_bare();
// map trampoline
memory_set.map_trampoline();
memory_set.push(MapArea::new(
(stext as usize).into(),
(etext as usize).into(),
MapType::Identical,
MapPermission::R | MapPermission::X,
), None);
println!("mapping .rodata section");
// other layout ...

Initiation for App memory layout.

Previously, we will cut off elf part of binary of apps, now we load it and extract useful infos, s.t. Permissions.

MemorySet should allocate storage of execution code with its permissions, allocate user stack and trap context at top of the memory layout.

Output MemorySet, user stack top, entry point addr.

1
2
3
4
5
6
7
8
9
10
11
12
13
let mut max_end_vpn = VirtPageNum(0);
let max_end_va: VirtAddr = max_end_vpn.into();
let mut user_stack_bottom: usize = max_end_va.into();
// guard page
user_stack_bottom += PAGE_SIZE;
let user_stack_top = user_stack_bottom + USER_STACK_SIZE;
memory_set.push(MapArea::new(
TRAP_CONTEXT.into(),
TRAMPOLINE.into(),
MapType::Framed,
MapPermission::R | MapPermission::W,
), None);
// ...

App

A problem is that separation of Kernel and App will also isolate Trap, the process need info from App to Kernel but App can’t see it. Therefore, we need a transfer operation. We achieve this by storing related info in TrapContext.

(Because there’s no more register to store these without breaking state like sscratch designed for kernel stack.)

1
2
3
4
5
6
7
8
9
10
11
12
// os/src/trap/context.rs

#[repr(C)]
pub struct TrapContext {
pub x: [usize; 32],
pub sstatus: Sstatus,
pub sepc: usize,
// new:
pub kernel_satp: usize,
pub kernel_sp: usize,
pub trap_handler: usize,
}

Notice that we also need to modify below to trigger satp and specify corresponding(U-level, S-level satp) physical page number to change state.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# __alltraps:

# load kernel_satp into t0
ld t0, 34*8(sp)
# load trap_handler into t1
ld t1, 36*8(sp)
# move to kernel_sp
ld sp, 35*8(sp)
# switch to kernel space
csrw satp, t0
sfence.vma
# jump to trap_handler
jr t1

# __restore:
# a0: *TrapContext in user space(Constant); a1: user space token
# switch to user space
csrw satp, a1
sfence.vma
csrw sscratch, a0

To amend the problem of contigeous instructions after switch, we need to adjust memory layout for trampoline which is in the same location in U-level and S-level(unified for all app to trap!). It will be placed at highest virtual page.

1
2
3
4
5
6
7
8
9
10
11
# os/src/linker.ld

stext = .;
.text : {
*(.text.entry)
+ . = ALIGN(4K);
+ strampoline = .;
+ *(.text.trampoline);
+ . = ALIGN(4K);
*(.text .text.*)
}

We modify rather raw handler and restore, due to virtual address, we need to adjust it for trampoline rather the address we had code!(it’s virtual!)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#[no_mangle]
pub fn trap_handler() -> ! {
set_kernel_trap_entry();
let cx = current_trap_cx();
let scause = scause::read();
let stval = stval::read();
match scause.cause() {
...
}
trap_return();
}

#[no_mangle]
pub fn trap_return() -> ! {
set_user_trap_entry();
let trap_cx_ptr = TRAP_CONTEXT;
let user_satp = current_user_token();
extern "C" {
fn __alltraps();
fn __restore();
}
let restore_va = __restore as usize - __alltraps as usize + TRAMPOLINE;
unsafe {
asm!(
"fence.i",
"jr {restore_va}",
restore_va = in(reg) restore_va,
in("a0") trap_cx_ptr,
in("a1") user_satp,
options(noreturn)
);
}
panic!("Unreachable in back_to_user!");
}

Then map the virtual address for it up to the physical address for unifying.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// os/src/config.rs

pub const TRAMPOLINE: usize = usize::MAX - PAGE_SIZE + 1;

// os/src/mm/memory_set.rs

impl MemorySet {
/// Mention that trampoline is not collected by areas.
fn map_trampoline(&mut self) {
self.page_table.map(
VirtAddr::from(TRAMPOLINE).into(),
PhysAddr::from(strampoline as usize).into(),
PTEFlags::R | PTEFlags::X,
);
}
}

We should adjust TaskControlBlock for the same reason, record each Trap.

1
2
3
4
5
6
7
8
9
// os/src/task/task.rs

pub struct TaskControlBlock {
pub task_cx: TaskContext,
pub task_status: TaskStatus,
pub memory_set: MemorySet,
pub trap_cx_ppn: PhysPageNum,
pub base_size: usize,
}

It will read data getting from elf, get trap contexr ppn, its kernel stack bottom and top, and then initiate trap context.

Here the part of task control initiation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
impl TaskContext {
pub fn goto_trap_return() -> Self {
Self {
ra: trap_return as usize,
s: [0; 12],
}
}
}

// fn new(elf_data: &[u8], app_id: usize) -> Self
let task_control_block = Self {
task_status,
task_cx: TaskContext::goto_trap_return(kernel_stack_top),
memory_set,
trap_cx_ppn,
base_size: user_sp,
};
// prepare TrapContext in user space
let trap_cx = task_control_block.get_trap_cx();
*trap_cx = TrapContext::app_init_context(
entry_point,
user_sp,
KERNEL_SPACE.exclusive_access().token(),
kernel_stack_top,
trap_handler as usize,
);
task_control_block

Chapter 1

Execution Environment

Execution Environment

The execution environment is defined by the Target Triplet, which specifies the platform, CPU architecture, and library required for the build. For example: x86_64-unknown-linux-gnu.

Components of the Target Triplet:

  • Platform: The specific operating system or runtime environment.
  • CPU Architecture: The underlying hardware architecture (e.g., x86_64, ARM).
  • Library: The standard library or runtime support required.

If the target platform contains no std or any support syscall, such platform called bare-metal, Rust contains a core lib independent of any platform support.

If we change .cargo/config s.t.:

1
2
3
# os/.cargo/config
[build]
target = "riscv64gc-unknown-none-elf"

it called cross compile because the running platform is different form execution platform.

No Std and No Main

The basic functionality provided by std and start semantic is panic_handler and main entry.

To toggle it off with:

1
2
#![no_std]
#![no_main]

RISCV

As for riscv, thing will be tough in here, we need to complete our own entry point, exit, and basic functionality like print/println.

First, we need to define linker and entry for stack allocation.

Linker:

1
2
3
4
5
6
7
# os/src/linker.ld
OUTPUT_ARCH(riscv)
ENTRY(_start) # entry point
BASE_ADDRESS = 0x80200000; # base addr for entry

SECTIONS
...

Stack Space:

1
2
3
4
5
6
7
8
9
10
11
12
13
# os/src/entry.asm
.section .text.entry
.globl _start
_start:
la sp, boot_stack_top
call rust_main # call rust_main function as entry

.section .bss.stack
.globl boot_stack_lower_bound
boot_stack_lower_bound:
.space 4096 * 16
.globl boot_stack_top
boot_stack_top:

For riscv, we need to call RustSBI(a underlying specification for rust in riscv).

After implement sbi_call, we could construct put_char:

1
2
3
4
5
6
7
8
9
const SBI_CONSOLE_PUTCHAR: usize = 1;

fn sbi_call(...) -> usize {
...
}

pub fn console_putchar(c:usize) {
sbi_call(SBI_CONSOLE_PUTCHAR,c,0,0)
}

With a formal interface for write:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct Stdout;

impl Write for Stdout {
fn write_str(&mut self, s: &str) -> fmt::Result {
for c in s.chars() {
console_putchar(c as usize);
}
Ok(())
}
}

pub fn print(args: fmt::Arguments) {
Stdout.write_fmt(args).unwrap();
}

Now we construct basic functionality in println, you could also handle panic_handler and others…

Chapter 5

Introduction

After the demand of code execution one by one, we want to introduce Process which will be a full space-time description of execution process of binary file in OS. It means for one process, it should hold independent resources to be executed.

After Process, Thread and Coroutine are also developed in growth of OS. They are different in resources taken up, usually, Thread will be in one process and hold their own independent stack and workflow; Coroutine will be in one Thread and hold their own independent workflow.


Design

Every process need independent memory layout, can be dispatch by cpu. It’s the functionality based on Task, after that, each process can Fork their own children processes, so there’s a workflow in time discrepancy. Its resource can’t be recycled in time due to children processes, we need to mark it as Zombie Process.

To clarify which is children, which is parent, and each isolated process, we mark each with PID-Process Identifier. Notice if we fork a process, it will be same as parent only a0 which is the register called for return will be different, parent process return new PID of child process, child process return 0 as none of fork.

  • fork: copy a process state(like sp etc…) as its child process.
  • waitpid: wait a child become zombie and recycle all resources.
  • exec: clear a process state and load a execution file.

Data Constructon

We will recycle all presumed pid by PidAllocator, No need to worry about previous pid used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
pub struct PidHandle(pub usize);

struct PidAllocator {
current: usize,
recycled: Vec<usize>,
}

// impl PidAllocator
pub fn alloc(&mut self) -> PidHandle {
if let Some(pid) = self.recycled.pop() {
PidHandle(pid)
} else {
self.current += 1;
PidHandle(self.current - 1)
}
}

Therefore, if one pid recycled, it deallocated memory can be reused, we will define its KernelStack addr by pid.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// os/src/task/pid.rs

pub struct KernelStack {
pid: usize,
}

pub fn kernel_stack_position(app_id: usize) -> (usize, usize) {
let top = TRAMPOLINE - app_id * (KERNEL_STACK_SIZE + PAGE_SIZE);
let bottom = top - KERNEL_STACK_SIZE;
(bottom, top)
}

// impl KernelStack
pub fn push_on_top<T>(&self, value: T) -> *mut T where
T: Sized, {
let kernel_stack_top = self.get_top();
let ptr_mut = (kernel_stack_top - core::mem::size_of::<T>()) as *mut T;
unsafe { *ptr_mut = value; }
ptr_mut
}
pub fn get_top(&self) -> usize {
let (_, kernel_stack_top) = kernel_stack_position(self.pid);
kernel_stack_top
}

impl Drop for KernelStack {
fn drop(&mut self) {
let (kernel_stack_bottom, _) = kernel_stack_position(self.pid);
let kernel_stack_bottom_va: VirtAddr = kernel_stack_bottom.into();
KERNEL_SPACE
.exclusive_access()
.remove_area_with_start_vpn(kernel_stack_bottom_va.into());
}
}

We need to construct TaskControlBlock for parent and children process.

1
2
3
4
5
6
7
8
9
10
11
12
pub struct TaskControlBlockInner {
...
// new:
pub parent: Option<Weak<TaskControlBlock>>,
pub children: Vec<Arc<TaskControlBlock>>,
pub exit_code: i32,
}

// impl TaskControlBlockInner
pub fn is_zombie(&self) -> bool {
self.get_status() == TaskStatus::Zombie
}

TaskManager manage all tasks and cpu dispatch, we will separate only tasks management for TaskManager.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// os/src/task/manager.rs

pub struct TaskManager {
ready_queue: VecDeque<Arc<TaskControlBlock>>,
}

lazy_static! {
pub static ref TASK_MANAGER: UPSafeCell<TaskManager> = unsafe {
UPSafeCell::new(TaskManager::new())
};
}

pub fn add_task(task: Arc<TaskControlBlock>) {
TASK_MANAGER.exclusive_access().add(task);
}

pub fn fetch_task() -> Option<Arc<TaskControlBlock>> {
TASK_MANAGER.exclusive_access().fetch()
}

And cpu dispatch for newly introduced Processor.
We introduce a idle process that used to call other process.

  • Why not direct call next by previous one? rather use idle process?
  • Separate idle process for start and others for its own, then dispatch data won’t occur in other process and make the dispatch process invisible for Trap for each process.

The whole workflow would be:

  • idle process fetch task and switch
  • task run out of its time or finish
  • task switch to idle process
  • repeat…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
pub struct Processor {
current: Option<Arc<TaskControlBlock>>,
// idle process of current cpu
idle_task_cx: TaskContext,
}

lazy_static! {
pub static ref PROCESSOR: UPSafeCell<Processor> = unsafe {
UPSafeCell::new(Processor::new())
};
}

// run_tasks()
// loop to fetch task and switch possible task
if let Some(task) = fetch_task() {
let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
let mut task_inner = task.inner_exclusive_access();
let next_task_cx_ptr = &task_inner.task_cx as *const TaskContext;
task_inner.task_status = TaskStatus::Running;
drop(task_inner);
processor.current = Some(task);
drop(processor);
unsafe {
__switch(
idle_task_cx_ptr,
next_task_cx_ptr,
);
}
}

// switch to idle process if one task run out of its time.
pub fn schedule(switched_task_cx_ptr: *mut TaskContext) {
let mut processor = PROCESSOR.exclusive_access();
let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
drop(processor);
unsafe {
__switch(
switched_task_cx_ptr,
idle_task_cx_ptr,
);
}
}

Dispatch Construction

Previously, we use suspend_current_and_run_next to pause task and switch to next, now we need to adapt it to process design.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// os/src/task/mod.rs

pub fn suspend_current_and_run_next() {
let task = take_current_task().unwrap();

// ---- access current TCB exclusively
let mut task_inner = task.inner_exclusive_access();
let task_cx_ptr = &mut task_inner.task_cx as *mut TaskContext;
task_inner.task_status = TaskStatus::Ready;
drop(task_inner);
// ---- stop exclusively accessing current PCB

// add to task deque
add_task(task);
// change current to idle process
schedule(task_cx_ptr);
}

In previous case, task won’t be created by its parent task, but process will. So, if its TrapContext has been recycled, we need to refactor our trap_handler for such case.

1
2
3
4
5
6
7
8
9
10
11
// fn trap_handler() -> !
Trap::Exception(Exception::UserEnvCall) => {
// jump to next instruction anyway
let mut cx = current_trap_cx();
cx.sepc += 4;
// syscall may create new process and change trap context.
let result = syscall(cx.x[17], [cx.x[10], cx.x[11], cx.x[12]]);
// wether cx is changed or not, we will refetch it.
cx = current_trap_cx();
cx.x[10] = result as usize;
}

Now we will construct fork, exec, waitpid,exit.

Fork

We need to copy all memory layout and its task state. Then reallocate new kernel stack for it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// impl MemorySet
pub fn from_existed_user(user_space: &MemorySet) -> MemorySet {
let mut memory_set = Self::new_bare();
// map trampoline
memory_set.map_trampoline();
// copy data sections/trap_context/user_stack
for area in user_space.areas.iter() {
let new_area = MapArea::from_another(area);
memory_set.push(new_area, None);
// copy data from another space
for vpn in area.vpn_range {
let src_ppn = user_space.translate(vpn).unwrap().ppn();
let dst_ppn = memory_set.translate(vpn).unwrap().ppn();
dst_ppn.get_bytes_array().copy_from_slice(src_ppn.get_bytes_array());
}
}
memory_set
}

// impl TaskControlBlock
// fn fork
let trap_cx_ppn = memory_set
.translate(VirtAddr::from(TRAP_CONTEXT).into())
.unwrap()
.ppn();
// alloc a pid and a kernel stack in kernel space
let pid_handle = pid_alloc();
let kernel_stack = KernelStack::new(&pid_handle);
let kernel_stack_top = kernel_stack.get_top();
let task_control_block = Arc::new(TaskControlBlock {
pid: pid_handle,
kernel_stack,
inner: unsafe { UPSafeCell::new(TaskControlBlockInner {
trap_cx_ppn,
base_size: parent_inner.base_size,
task_cx: TaskContext::goto_trap_return(kernel_stack_top),
task_status: TaskStatus::Ready,
memory_set,
parent: Some(Arc::downgrade(self)),
children: Vec::new(),
exit_code: 0,
})},
});
// add child
parent_inner.children.push(task_control_block.clone());
// modify kernel_sp in trap_cx
// **** access children PCB exclusively
let trap_cx = task_control_block.inner_exclusive_access().get_trap_cx();
trap_cx.kernel_sp = kernel_stack_top;

Finally, implement sys_fork

1
2
3
4
5
6
7
8
9
10
11
12
pub fn sys_fork() -> isize {
let current_task = current_task().unwrap();
let new_task = current_task.fork();
let new_pid = new_task.pid.0;
let trap_cx = new_task.inner_exclusive_access().get_trap_cx();

// for child process, fork returns 0
trap_cx.x[10] = 0; //x[10] is a0 reg

add_task(new_task);
new_pid as isize
}

We can see that if trap_handler call sys_fork, parent process x[10] would be new_pid as return value.

Exec

If we want to execute a task by its name, we need to first load string in app load.

1
2
3
4
5
6
7
8
// os/build.rs

writeln!(f, r#"
.global _app_names
_app_names:"#)?;
for app in apps.iter() {
writeln!(f, r#" .string "{}""#, app)?;
}
1
2
3
4
5
6
7
# link_app.S

...
_app_names:
.string "exit"
.string "fantastic_text"
...

Construct APP_NAMES as global state in OS.

1
2
3
4
5
6
7
8
9
10
11
12
13
// os/src/loader.rs

// APP_NAMES: Vec<&'static str> {...
for _ in 0..num_app {
let mut end = start;
while end.read_volatile() != '\0' as u8 {
end = end.add(1);
}
let slice = core::slice::from_raw_parts(start, end as usize - start as usize);
let str = core::str::from_utf8(slice).unwrap();
v.push(str);
start = end.add(1);
}

When execute a new binary file, we need to read it and extract all state to replace original one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// os/src/task/task.rs

impl TaskControlBlock {
pub fn exec(&self, elf_data: &[u8]) {
let (memory_set, user_sp, entry_point) = MemorySet::from_elf(elf_data);
let trap_cx_ppn = memory_set
.translate(VirtAddr::from(TRAP_CONTEXT).into())
.unwrap()
.ppn();

// **** access inner exclusively
let mut inner = self.inner_exclusive_access();
inner.memory_set = memory_set;
inner.trap_cx_ppn = trap_cx_ppn;

let trap_cx = inner.get_trap_cx();
*trap_cx = TrapContext::app_init_context(
entry_point,
user_sp,
KERNEL_SPACE.exclusive_access().token(),
self.kernel_stack.get_top(),
trap_handler as usize,
);
// **** stop exclusively accessing inner automatically
}
}

We will read input str as a ptr and replace current task state.

1
2
3
4
5
6
7
8
9
10
11
12
13
// os/src/syscall/process.rs

pub fn sys_exec(path: *const u8) -> isize {
let token = current_user_token();
let path = translated_str(token, path);
if let Some(data) = get_app_data_by_name(path.as_str()) {
let task = current_task().unwrap();
task.exec(data);
0
} else {
-1
}
}

Exit

When a task exit, it will return a exit code assigned by app if successfully or kernel if anomaly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// fn exit_current_and_run_next(exit_code:i32)
inner.task_status = TaskStatus::Zombie;
inner.exit_code = exit_code;

// move all its children to the initial process
// ++++++ access initproc TCB exclusively
{
let mut initproc_inner = INITPROC.inner_exclusive_access();
for child in inner.children.iter() {
child.inner_exclusive_access().parent = Some(Arc::downgrade(&INITPROC));
initproc_inner.children.push(child.clone());
}
}
// ++++++ stop exclusively accessing parent PCB

// clear all memory.
inner.children.clear();
inner.memory_set.recycle_data_pages();
drop(inner);
// **** stop exclusively accessing current PCB
// drop task manually to maintain rc correctly
drop(task);
// use _unused replace original context, which will be recycled by rust.
let mut _unused = TaskContext::zero_init();
schedule(&mut _unused as *mut _);

// os/src/syscall/process.rs

pub fn sys_exit(exit_code: i32) -> ! {
exit_current_and_run_next(exit_code);
panic!("Unreachable in sys_exit!");
}

WaitPid

WaitPid will return -1 if there’s no specified pid process exist, if it’s running, return -2, finally, if it finished, recycle it and return 0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// sys_waitpid(pid:uisze, exit_code_ptr:*mut i32) -> isize

// -1 case
// search task manager and find (task_block)
|p| {pid == -1 || pid as usize == p.getpid()}
return -1;

let pair = // search task managers and find (idx,task_block)
p.inner_exclusive_access().is_zombie() && (pid == -1 || pid as usize == p.getpid())

if let Some((idx,_)) = pair {
let child = inner.children.remove(idx);
// confirm that child will be deallocated after removing from children list
assert_eq!(Arc::strong_count(&child), 1);
let found_pid = child.getpid();
// ++++ temporarily access child TCB exclusively
let exit_code = child.inner_exclusive_access().exit_code;
// ++++ stop exclusively accessing child PCB
*translated_refmut(inner.memory_set.token(), exit_code_ptr) = exit_code;
found_pid as isize
} else {
// pid process is running
-2
}

// user/src/lib.rs

pub fn wait(exit_code: &mut i32) -> isize {
loop {
match sys_waitpid(-1, exit_code as *mut _) {
-2 => { yield_(); }
// -1 or a real pid
exit_pid => return exit_pid,
}
}
}