rcore-os Blog

2025年春夏开源操作系统训练营三阶段总结-自由

Posted on 2025-04-29 Edited on 2025-12-04 In report

2025年春夏开源操作系统训练营三阶段总结

第一阶段总结

第一次写rust，被所有权干烂了，不停和编译器搏斗+不停拷问AI终于磕磕绊绊写完了，以前习惯于C++的引用和指针乱飘了，写rust确实比C和C++安全，因为有编译器的所有权机制保障。

第二阶段总结

因为以前写过NJUPA(极其推荐)，所以对上下文切换有一点概念。rcore-os的上下文切换采用了xv6类似的方式，在内核内部做控制流切换来切换执行的用户程序。对于操作系统中常用的锁，rcore采用了单核中的使用rust所有权机制来实现，上锁的时候将所有权转交给函数，尝试二次获取的时候就会因为所有权引发rust错误，避免了内核的死锁的情况。cargo的编译链接功能确实比GNU那一套方便，rust的属性语法支持也比C++要好很多。

地址空间和页表中极多的用来表示地址组成的结构体，充分体现了rust面向对象的特性和into的语法，增加了代码的可读性，以前用C系的写也就使用各种宏。rcore用了额外的BTreeMap来支持快速查询页表。作业题的实现也大多是调用现有的函数就差不多可以了。

到文件系统这里比较新鲜，因为以前没有怎么接触过，文件系统的实现层层嵌套，层层抽象，一个功能实现往往要修改多层，理解起来比较困难。

并发的作业是用Coffman提出的方法做死锁检测，开始的时候理解成了银行家算法，读作业提示的时候也没有读懂，卡了很长时间，在群u的提示下才通过。后来在上操作系统课的时候翻到书上才知道这是Coffman的死锁检测算法(书上的描述其实和作业里写得差不多)。

第三阶段

第三阶段换了个内核模式叫unikernel,不太习惯这样的组件化操作系统，还是习惯以前的传统宏内核结构。

print_with_color很简单，理解一下arceos提供的axstd模块，为内核应用提供了运行时环境。

support_hash_map在理解了rust可以自定义堆分配器后，唯一要解决的就是rust中如何得到一个hashable值的hash值，然后做简单的映射即可，测试不需要特别追求效率。

alt_alloc主要是实现一下需要的接口，了解如何向arceos添加自己的分配器，bump allocator可以说是最简单的分配器了。

ramfs_rename要把cargo里的包换成arceos下面的(不然文件系统的包永远是网上拉下来的)，找到对traitVfsOps和VfsNodeOps的实现之后，搞清楚记录文件的数据结构，实现rename就可以了。

sys_map同样搞清楚task_ext提供的接口就可以直接调库了，第一次实现忘记输入地址是NULL的话要由内核自己找地址，遂发现了一个叫做find_free_area的妙妙工具。

simple_hv最简单的一次，同样的由操作系统代为执行一些指令的手法可以用来帮用户程序加载一些未对齐的数据（如果处理器不支持）。

初探操作系统

Posted on 2025-04-28 Edited on 2025-12-04 In 感悟 , 总结

第二阶段总结：操作系统学习的感悟与理解

经过这一阶段对操作系统核心内容的系统学习和实践，我对操作系统的本质、关键机制以及与 Rust 结合的优势有了更深刻的理解。以下是我的主要体会和总结：

一、操作系统的本质与核心任务

操作系统是管理硬件资源、为上层应用提供抽象和隔离的基础软件。它的核心任务包括：

进程与线程管理：通过进程（资源分配单位）和线程（调度单位）实现多任务并发，保障系统的响应性和资源利用率。PCB（进程控制块）和 TCB（线程控制块）是操作系统调度和管理的核心数据结构。
内存管理：通过虚拟内存、分页机制和权限控制，实现进程隔离、内存高效分配与回收。页表、TLB、懒分配、写时复制等机制极大提升了系统的安全性和性能。
进程调度：采用多种调度算法（如时间片轮转、优先级、MLFQ等）实现公平与高效的 CPU 分配。上下文切换和调度策略直接影响系统吞吐和响应速度。
系统调用接口：为用户程序提供受控访问硬件和内核资源的通道，实现用户态与内核态的安全切换。
硬件抽象与性能优化：通过缓存、TLB、ASID 等机制优化访问速度，利用中断和异常机制实现高效的事件响应。

OrangeQi

Posted on 2025-04-28 Edited on 2025-12-04

之前对操作系统有一定理论基础，rcore 和 arceos 项目对我最大的挑战主要包括：

risc-v 体系结构的知识，尤其是特权架构。这对理解 trap、context_switch、地址空间相关的代码极其重要。
arceos 项目的组织构建。最底层是 axhal，抽象了硬件架构和运行平台，往上是各个 module 例如 axtask 等，再向上是 axapi 乃至 ulib。这种组件化的设计思想充分利用的 rust 语言的优势，极大方便构建。

unikernel 架构是没有特权级切换的，应用程序也运行在 s 态。刚开始没有仔细理解 ppt，给我造成了挺大的困扰。

hashmap 的实验我并没有自己手写代码，而是直接引入了 hashbrown 库。但手撕一下代码应该能更加锻炼能力。

此外，hypervisor 给我带来了挺大的困难，参考其他同学的经验我才得以通过。

rcore-handnote-1

Posted on 2025-04-27 Edited on 2025-12-04 In handnote

Chapter 1

Execution Environment

The execution environment is defined by the Target Triplet, which specifies the platform, CPU architecture, and library required for the build. For example: x86_64-unknown-linux-gnu.

Components of the Target Triplet:

Platform: The specific operating system or runtime environment.
CPU Architecture: The underlying hardware architecture (e.g., x86_64, ARM).
Library: The standard library or runtime support required.

If the target platform contains no std or any support syscall, such platform called bare-metal, Rust contains a core lib independent of any platform support.

If we change .cargo/config s.t.:

1
2
3

# os/.cargo/config
[build]
target = "riscv64gc-unknown-none-elf"

it called cross compile because the running platform is different form execution platform.

No Std and No Main

The basic functionality provided by std and start semantic is panic_handler and main entry.

To toggle it off with:

1 2	#![no_std] #![no_main]

RISCV

As for riscv, thing will be tough in here, we need to complete our own entry point, exit, and basic functionality like print/println.

First, we need to define linker and entry for stack allocation.

Linker:

# os/src/linker.ld
OUTPUT_ARCH(riscv)
ENTRY(_start) # entry point
BASE_ADDRESS = 0x80200000; # base addr for entry

SECTIONS
...

Stack Space:

# os/src/entry.asm
    .section .text.entry
    .globl _start
_start:
    la sp, boot_stack_top
    call rust_main # call rust_main function as entry 

    .section .bss.stack
    .globl boot_stack_lower_bound
boot_stack_lower_bound:
    .space 4096 * 16
    .globl boot_stack_top
boot_stack_top:

For riscv, we need to call RustSBI(a underlying specification for rust in riscv).

After implement sbi_call, we could construct put_char:

const SBI_CONSOLE_PUTCHAR: usize = 1;

fn sbi_call(...) -> usize {
	...
}

pub fn console_putchar(c:usize) {
	sbi_call(SBI_CONSOLE_PUTCHAR,c,0,0)
}

With a formal interface for write:

struct Stdout;

impl Write for Stdout {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        for c in s.chars() {
            console_putchar(c as usize);
        }
        Ok(())
    }
}

pub fn print(args: fmt::Arguments) {
    Stdout.write_fmt(args).unwrap();
}

Now we construct basic functionality in println, you could also handle panic_handler and others…

rcore-handnote-2

Posted on 2025-04-27 Edited on 2025-12-04 In handnote

Chapter 2

Introduction

It corresponds to riscv:

Privilege for S(guaranteed by Supervisor Execution Environment of RustSBI)
User for U(constructed in current chapter as Application Execution Environment)

Reason:

Safety(Prevent app from accessing kernel)
Recoverable

Workflow:

Start application and user-mode context
Trap(Called by system level) to handle system
- Goes wrong! Kill it!
- Finish! Next!
Restore to user-mode context

riscv designs following CSR(Control and Status Register) to handle this:

CSR

Begin Trap:

sstatus: SPP seg to the current level of CPU.
sepc: next addr after Trap finished.
scause/stval: Trap cause and additional info.
stvec: storage of entry addr of Trap

stvec is a 64-bit CSR, with:

MODE(Direct/Vectored) [1:0](read from right to left): 2-bits

BASE [63:2]: 62-bits

finally, it will return by instruction sret which will change level and jump by sepc.

Construct Trap

Design:

General register will be shared by U-level and S-level.
Maintain a reasonable state of CSR.
Separate workflow of U-level and S-level by stack

Construct:

build KernelStack and UserStack for separation
in KernelStack, we store TrapContext in it, by asm and rust to control dispatch and handle, then store the code to stvec as the entry of Trap.
restore register for UserStack by push a new context refer to UserStack.

build stack and push context:

// stack struct ...

// buttom to top
fn get_sp(&self) -> usize {
    self.data.as_ptr() as usize + KERNEL_STACK_SIZE
}
pub fn push_context(&self, cx: TrapContext) -> &'static mut TrapContext {
    let cx_ptr = (self.get_sp() - core::mem::size_of::<TrapContext>()) as *mut TrapContext;
    unsafe {
        *cx_ptr = cx;
    }
    unsafe { cx_ptr.as_mut().unwrap() }
}

// os/src/trap/context.rs

#[repr(C)]
pub struct TrapContext {
    pub x: [usize; 32], // General register
    pub sstatus: Sstatus,
    pub sepc: usize,
}
/// set stack pointer to x_2 reg (sp)
pub fn set_sp(&mut self, sp: usize) {
    self.x[2] = sp;
}
/// init app context
pub fn app_init_context(entry: usize, sp: usize) -> Self {
    let mut sstatus = sstatus::read(); // CSR sstatus
    sstatus.set_spp(SPP::User); //previous privilege mode: user mode
    let mut cx = Self {
        x: [0; 32],
        sstatus,
        sepc: entry, // entry point of app
    };
    cx.set_sp(sp); // app's user stack pointer
    cx // return initial Trap Context of app
}

We will design __alltrap and __restore for operation by asm and part of rust:

.altmacro
.macro SAVE_GP n
    sd x\n, \n*8(sp)
.endm
.macro LOAD_GP n
    ld x\n, \n*8(sp)
.endm
.align_2
__alltraps:
    ...
# set input argument of trap_handler(cx: &mut TrapContext)
    mv a0, sp # sp is point to TrapContext in kernel stack
    call trap_handler # (&mut TrapContext)

--restore:
    ...

To handle Trap context, we will use riscv lib:

// os/Cargo.toml

[dependencies]
riscv = { git = "https://github.com/rcore-os/riscv", features = ["inline-asm"] }

// os/src/trap/mod.rs
global_asm!(include_str!("trap.S"));

pub fn init() {
    extern "C" { fn __alltraps(); }
    // write to stvec
    unsafe {
        stvec::write(__alltraps as usize, TrapMode::Direct);
    }
}

#[no_mangle]
pub fn trap_handler(cx: &mut TrapContext) -> &mut TrapContext {
    ...
}

restore operation:

extern "C" { fn __restore(cx_addr: usize); }
unsafe {
    __restore(KERNEL_STACK.push_context(
        TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp())
// This context store the ptr to UserStack for restoration
    ) as *const _ as usize);
}

Construct User App

Link app binary to kernel with specify memory layout
Read the layout, use AppManager to maintain and store
Load app from memory layout, copy consecutively to APP_BASE_ADDRESS(Currently we have no ability to dynamically read address)
AppManager will run each app

# os/src/link_app.S

    .align 3
    .section .data
    .global _num_app # read from the ptr
_num_app:
    .quad 5
    .quad app_0_start
    .quad app_1_start
    .quad app_2_start
    .quad app_3_start
    .quad app_4_start
    .quad app_4_end`

...

Design it!

// os/src/batch.rs

struct AppManager {
    num_app: usize,
    current_app: usize,
    app_start: [usize; MAX_APP_NUM + 1],
}

// part of read in static init of AppManager
let num_app_ptr = _num_app as usize as *const usize;
let num_app = num_app_ptr.read_volatile();
let mut app_start: [usize; MAX_APP_NUM + 1] = [0; MAX_APP_NUM + 1];
let app_start_raw: &[usize] =  core::slice::from_raw_parts(
    num_app_ptr.add(1), num_app + 1
);
app_start[..=num_app].copy_from_slice(app_start_raw);

Load App:

// part of code of copying to kernel
asm!("fence.i");
// clear app area
core::slice::from_raw_parts_mut(
    APP_BASE_ADDRESS as *mut u8,
    APP_SIZE_LIMIT
).fill(0);
let app_src = core::slice::from_raw_parts(
    self.app_start[app_id] as *const u8,
    self.app_start[app_id + 1] - self.app_start[app_id]
);
let app_dst = core::slice::from_raw_parts_mut(
    APP_BASE_ADDRESS as *mut u8,
    app_src.len()
);
app_dst.copy_from_slice(app_src);

Run each app!

// os/src/batch.rs

pub fn run_next_app() -> ! {
    let mut app_manager = APP_MANAGER.exclusive_access();
    let current_app = app_manager.get_current_app();
    unsafe {
        app_manager.load_app(current_app);
    }
    app_manager.move_to_next_app();
    drop(app_manager);
    // before this we have to drop local variables related to resources manually
    // and release the resources
    extern "C" { fn __restore(cx_addr: usize); }
    unsafe {
        __restore(KERNEL_STACK.push_context(
            TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp())
        ) as *const _ as usize);
    }
    panic!("Unreachable in batch::run_current_app!");
}

// main logic:
// os/src/main.rs

...above code
// load entry for trap
trap::init()
// load app
batch::run_next_app()

rcore-handnote-3

Posted on 2025-04-27 Edited on 2025-12-04 In handnote

Chapter 3

Introduction

We need to place multiple app to multiple memory address to run app in cycle. Rather run once and clear for next.

First We want to place each app to each isolated addr, due to our kernel restriction, we need to load it with build.py.

Task

Task: a workflow process

Define every time slice of Task as Task Slice

Define the switch between app as Task switching

We need to store Task Context

Design:

We will store these register in ctx:

// os/src/task/context.rs

pub struct TaskContext {
    ra: usize,
    sp: usize,
    s: [usize; 12],
}

.altmacro
.macro SAVE_SN n
    sd s\n, (\n+2)*8(a0)
.endm
.macro LOAD_SN n
    ld s\n, (\n+2)*8(a1)
.endm
    .section .text
    .globl __switch
__switch:
...
# logic by store previous context and take out the new one to current register

Expose to Rust

// os/src/task/switch.rs

global_asm!(include_str!("switch.S"));

use super::TaskContext;

extern "C" {
    pub fn __switch(
        current_task_cx_ptr: *mut TaskContext,
        next_task_cx_ptr: *const TaskContext
    );
}

We will design TaskManager:

Store each App state array and current running app.
each state store TaskContext and TaskState for running or exited etc…
Init and ready by store the __restore ctx to TaskContext
Run for switch cx if needed.

let current_task_cx_ptr = &mut inner.tasks[current].task_cx as *mut TaskContext;
let next_task_cx_ptr = &inner.tasks[next].task_cx as *const TaskContext;
drop(inner);
// before this, we should drop local variables that must be dropped manually
unsafe {
	__switch(
		current_task_cx_ptr,
		next_task_cx_ptr,
	);
}rust

Dispatch Design

Collaboration

Manually design interface yield for App to use

pub fn sys_yield() -> isize {
    syscall(SYSCALL_YIELD, [0, 0, 0])
}

// user/src/lib.rs

pub fn yield_() -> isize { sys_yield() }

But it can be inefficient for some case that app already done its work but reluctant to exit.

Preemptive

We will design interrupt clock in a fixed time bound to force switch between app.

Set timer design and get time
Set timer for trigger
enable timer and handle the interrupt cause of Timer in ecall

You should know this as a pre-knowledge:

const SBI_SET_TIMER: usize = 0;

pub fn set_timer(timer: usize) {
    sbi_call(SBI_SET_TIMER, timer, 0, 0);
}

// os/src/timer.rs

use crate::config::CLOCK_FREQ;
const TICKS_PER_SEC: usize = 100;

pub fn set_next_trigger() {
    set_timer(get_time() + CLOCK_FREQ / TICKS_PER_SEC);
}

// os/src/trap/mod.rs

match scause.cause() {
    Trap::Interrupt(Interrupt::SupervisorTimer) => {
        set_next_trigger();
        suspend_current_and_run_next();
    }
}

// enable supervisor clock
// os/src/trap/mod.rs

use riscv::register::sie;

pub fn enable_timer_interrupt() {
    unsafe { sie::set_stimer(); }
}

rcore-handnote-4

Posted on 2025-04-27 Edited on 2025-12-04 In handnote

Chapter 4

Introduction

We don’t want a fixed physical addr for allocation, rather, we want a unified abstract interface for dynamic memory layout for app storage. We call it Virtual Address

Safety: Every app will be allocated in its own virtual memory space, so each can’t interfere others.

Efficiency: Every app can coexist in same time without demand of reading outer peripherals to switch app.(With development of memory size)

We need MMU(Memory Management Unit) to achieve Address Translation for interview from virtual to physical.

Different designs occur.

Design

Segment Design

Each app exist in one fixed slot for one segment as $[0,bound)$, with a linear map by MMU.

Problem: Wasteful and inflexible

We may want a different linear map for each app， for example, its allocation for heap, data, code etc… So we can dispatch memory in more finer style, but it can’t resolve the problem because now even slot is dynamically allocated, it may still exist some free memory too small to reuse, cause the External Fragment rather than Internal Fragment which is the problem due to fixed slot.

Paging Design

We could set a Protection bit as r for read, w for write, x for execution.

Another way is to inverse our mind, rather take a slot on memory, we take slot on MMU, it can map its slot(now we call it Page) for real physical memory layout. To adjust, we can take slice for Page to form Frame which is a smaller unit to suit physical memory layout, each app can take many Page Number for a Page Table, record the map.

Page Design

SV39 only use lower 39 bits rather whole 64 bits in design even bit width is 64 bits(it’s a fairly large amount!)

In a address, $[11:0]$ called Page Offset, and $[38:12]$ is the VPN which will be used for location of page and use offset to direct in one page(with 4KiB in one page).

We should modify satp to open Paging for S and U-level memory.

1 2	const PAGE_SIZE: usize = 4096 const PAGE_SIZE_BIT: usize = 12

Page Size and offset to slice physical addr.

// os/src/mm/address.rs

impl PhysAddr {
    pub fn floor(&self) -> PhysPageNum { PhysPageNum(self.0 / PAGE_SIZE) }
    pub fn ceil(&self) -> PhysPageNum { PhysPageNum((self.0 + PAGE_SIZE - 1) / PAGE_SIZE) }
}

Page Entry to bundle permission and physical page.

pub fn ppn(&self) -> PhysPageNum {
	(self.bits >> 10 & ((1usize << 44) - 1)).into()
}
pub fn flags(&self) -> PTEFlags {
	PTEFlags::from_bits(self.bits as u8).unwrap()
}

Usually, a simple design for page manager would be a linear map from base addr and follow up. But actually it will take a huge amount of memory due to the storage of offset by base addr for each app.

A finer design is from Trie. We will take index slice for each 8 bits(it will fit in to 4KB just right!), it means for each slice has 512 states, and link those state up, form a tree. Usually with 3-level for Page Index. Total 27 bits.

Virtual Page will reserve 27 bits for the index slice and 12 bits for offset for certain page. Total 39 bits.

When we transform a virtual addr to physical one, we will do the following exposition reversely.

Page Management Design

A simple allocation for page(rather management!) is stack style.

pub struct StackFrameAllocator {
    current: usize,  //空闲内存的起始物理页号
    end: usize,      //空闲内存的结束物理页号
    recycled: Vec<usize>,
}

Based on Allocation, we can design Page Table.

// os/src/mm/frame_allocator.rs

pub struct FrameTracker {
    pub ppn: PhysPageNum,
}

impl FrameTracker {
    pub fn new(ppn: PhysPageNum) -> Self {
        // page cleaning
        let bytes_array = ppn.get_bytes_array();
        for i in bytes_array {
            *i = 0;
        }
        Self { ppn }
    }
}

// os/src/mm/page_table.rs

pub struct PageTable {
    root_ppn: PhysPageNum,
    frames: Vec<FrameTracker>,
}

It means for one physical page, we will record each allocation by vector of FrameTracker as a representation of real Frame after the root.

We should design transformation from virtual addr to physical addr.

// for each idxs represent index slices of virtual addr.
for i in 0..3 {
	let pte = &mut ppn.get_pte_array()[idxs[i]];
	if i == 2 {
		result = Some(pte);
		break;
	}
	if !pte.is_valid() {
		let frame = frame_alloc().unwrap();
		// create or return None
		*pte = PageTableEntry::new(frame.ppn, PTEFlags::V);
		self.frames.push(frame);
	}
	ppn = pte.ppn();
}

Page Map Design

Based on our abstraction, we need a design for MapArea to given a map from continuous virtual address(no matter their corresponding page!) to physical address.

// os/src/mm/memory_set.rs

pub struct MapArea {
    vpn_range: VPNRange,
    data_frames: BTreeMap<VirtPageNum, FrameTracker>,
    map_type: MapType,
    map_perm: MapPermission,
}

Based on continuous virtual address map, we can define discontinuous map for one app.

// os/src/mm/memory_set.rs

pub struct MemorySet {
    page_table: PageTable,
    areas: Vec<MapArea>,
}    

// impl MemorySet
fn push(&mut self, mut map_area: MapArea, data: Option<&[u8]>) {
	// map range of virtual addr in allocation in page_table
	map_area.map(&mut self.page_table);
	if let Some(data) = data {
		map_area.copy_data(&mut self.page_table, data);
	}
	self.areas.push(map_area);
}

In Each MapArea allocated for some key-value for virtual-physical addr, it will allocate the same for PageTable for Frame.

Allocation Space

To open SV39, we should write instruction for satp:

// os/src/mm/page_table.rs

pub fn token(&self) -> usize {
    8usize << 60 | self.root_ppn.0
}

// os/src/mm/memory_set.rs

impl MemorySet {
    pub fn activate(&self) {
        let satp = self.page_table.token();
        unsafe {
            satp::write(satp);
            asm!("sfence.vma" :::: "volatile");
			// sfence.vma is a special instruction to clear `Translaton Lookaside Buffer` which is used for quick search for memory addr to reduce performance expenses.
        }
    }
}

Therefore, it will fill current root of physical page number as activation.

Notice that we should make instruction contigeous for SV39 open in physical address to amend the gap of transformation of address before and after open.

Kernel

Initiation for Kernel memory layout.

let mut memory_set = Self::new_bare();
// map trampoline
memory_set.map_trampoline();
memory_set.push(MapArea::new(
	(stext as usize).into(),
	(etext as usize).into(),
	MapType::Identical,
	MapPermission::R | MapPermission::X,
), None);
println!("mapping .rodata section");
// other layout ...

Initiation for App memory layout.

Previously, we will cut off elf part of binary of apps, now we load it and extract useful infos, s.t. Permissions.

MemorySet should allocate storage of execution code with its permissions, allocate user stack and trap context at top of the memory layout.

Output MemorySet, user stack top, entry point addr.

let mut max_end_vpn = VirtPageNum(0);
let max_end_va: VirtAddr = max_end_vpn.into();
let mut user_stack_bottom: usize = max_end_va.into();
// guard page
user_stack_bottom += PAGE_SIZE;
let user_stack_top = user_stack_bottom + USER_STACK_SIZE;
memory_set.push(MapArea::new(
	TRAP_CONTEXT.into(),
	TRAMPOLINE.into(),
	MapType::Framed,
	MapPermission::R | MapPermission::W,
), None);
// ...

App

A problem is that separation of Kernel and App will also isolate Trap, the process need info from App to Kernel but App can’t see it. Therefore, we need a transfer operation. We achieve this by storing related info in TrapContext.

(Because there’s no more register to store these without breaking state like sscratch designed for kernel stack.)

// os/src/trap/context.rs

#[repr(C)]
pub struct TrapContext {
    pub x: [usize; 32],
    pub sstatus: Sstatus,
    pub sepc: usize,
	// new:
    pub kernel_satp: usize, 
    pub kernel_sp: usize,
    pub trap_handler: usize,
}

Notice that we also need to modify below to trigger satp and specify corresponding(U-level, S-level satp) physical page number to change state.

# __alltraps:

# load kernel_satp into t0
ld t0, 34*8(sp)
# load trap_handler into t1
ld t1, 36*8(sp)
# move to kernel_sp
ld sp, 35*8(sp)
# switch to kernel space
csrw satp, t0
sfence.vma
# jump to trap_handler
jr t1

# __restore:
# a0: *TrapContext in user space(Constant); a1: user space token
# switch to user space
csrw satp, a1
sfence.vma
csrw sscratch, a0

To amend the problem of contigeous instructions after switch, we need to adjust memory layout for trampoline which is in the same location in U-level and S-level(unified for all app to trap!). It will be placed at highest virtual page.

# os/src/linker.ld

    stext = .;
    .text : {
        *(.text.entry)
+        . = ALIGN(4K);
+        strampoline = .;
+        *(.text.trampoline);
+        . = ALIGN(4K);
        *(.text .text.*)
    }

We modify rather raw handler and restore, due to virtual address, we need to adjust it for trampoline rather the address we had code!(it’s virtual!)

#[no_mangle]
pub fn trap_handler() -> ! {
    set_kernel_trap_entry();
    let cx = current_trap_cx();
    let scause = scause::read();
    let stval = stval::read();
    match scause.cause() {
        ...
    }
    trap_return();
}

#[no_mangle]
pub fn trap_return() -> ! {
    set_user_trap_entry();
    let trap_cx_ptr = TRAP_CONTEXT;
    let user_satp = current_user_token();
    extern "C" {
        fn __alltraps();
        fn __restore();
    }
    let restore_va = __restore as usize - __alltraps as usize + TRAMPOLINE;
    unsafe {
        asm!(
            "fence.i",
            "jr {restore_va}",
            restore_va = in(reg) restore_va,
            in("a0") trap_cx_ptr,
            in("a1") user_satp,
            options(noreturn)
        );
    }
    panic!("Unreachable in back_to_user!");
}

Then map the virtual address for it up to the physical address for unifying.

// os/src/config.rs

pub const TRAMPOLINE: usize = usize::MAX - PAGE_SIZE + 1;

// os/src/mm/memory_set.rs

impl MemorySet {
    /// Mention that trampoline is not collected by areas.
    fn map_trampoline(&mut self) {
        self.page_table.map(
            VirtAddr::from(TRAMPOLINE).into(),
            PhysAddr::from(strampoline as usize).into(),
            PTEFlags::R | PTEFlags::X,
        );
    }
}

We should adjust TaskControlBlock for the same reason, record each Trap.

// os/src/task/task.rs

pub struct TaskControlBlock {
    pub task_cx: TaskContext,
    pub task_status: TaskStatus,
    pub memory_set: MemorySet,
    pub trap_cx_ppn: PhysPageNum,
    pub base_size: usize,
}

It will read data getting from elf, get trap contexr ppn, its kernel stack bottom and top, and then initiate trap context.

Here the part of task control initiation:

impl TaskContext {
    pub fn goto_trap_return() -> Self {
        Self {
            ra: trap_return as usize,
            s: [0; 12],
        }
    }
}

// fn new(elf_data: &[u8], app_id: usize) -> Self 
let task_control_block = Self {
	task_status,
	task_cx: TaskContext::goto_trap_return(kernel_stack_top),
	memory_set,
	trap_cx_ppn,
	base_size: user_sp,
};
// prepare TrapContext in user space
let trap_cx = task_control_block.get_trap_cx();
*trap_cx = TrapContext::app_init_context(
	entry_point,
	user_sp,
	KERNEL_SPACE.exclusive_access().token(),
	kernel_stack_top,
	trap_handler as usize,
);
task_control_block

rcore-handnote-5

Posted on 2025-04-27 Edited on 2025-12-04 In handnote

Chapter 5

Introduction

After the demand of code execution one by one, we want to introduce Process which will be a full space-time description of execution process of binary file in OS. It means for one process, it should hold independent resources to be executed.

After Process, Thread and Coroutine are also developed in growth of OS. They are different in resources taken up, usually, Thread will be in one process and hold their own independent stack and workflow; Coroutine will be in one Thread and hold their own independent workflow.

Design

Every process need independent memory layout, can be dispatch by cpu. It’s the functionality based on Task, after that, each process can Fork their own children processes, so there’s a workflow in time discrepancy. Its resource can’t be recycled in time due to children processes, we need to mark it as Zombie Process.

To clarify which is children, which is parent, and each isolated process, we mark each with PID-Process Identifier. Notice if we fork a process, it will be same as parent only a0 which is the register called for return will be different, parent process return new PID of child process, child process return 0 as none of fork.

fork: copy a process state(like sp etc…) as its child process.
waitpid: wait a child become zombie and recycle all resources.
exec: clear a process state and load a execution file.

Data Constructon

We will recycle all presumed pid by PidAllocator, No need to worry about previous pid used.

pub struct PidHandle(pub usize);

struct PidAllocator {
    current: usize,
    recycled: Vec<usize>,
}

// impl PidAllocator
pub fn alloc(&mut self) -> PidHandle {
	if let Some(pid) = self.recycled.pop() {
		PidHandle(pid)
	} else {
		self.current += 1;
		PidHandle(self.current - 1)
	}
}

Therefore, if one pid recycled, it deallocated memory can be reused, we will define its KernelStack addr by pid.

// os/src/task/pid.rs

pub struct KernelStack {
    pid: usize,
}

pub fn kernel_stack_position(app_id: usize) -> (usize, usize) {
    let top = TRAMPOLINE - app_id * (KERNEL_STACK_SIZE + PAGE_SIZE);
    let bottom = top - KERNEL_STACK_SIZE;
    (bottom, top)
}

// impl KernelStack
pub fn push_on_top<T>(&self, value: T) -> *mut T where
	T: Sized, {
	let kernel_stack_top = self.get_top();
	let ptr_mut = (kernel_stack_top - core::mem::size_of::<T>()) as *mut T;
	unsafe { *ptr_mut = value; }
	ptr_mut
}
pub fn get_top(&self) -> usize {
	let (_, kernel_stack_top) = kernel_stack_position(self.pid);
	kernel_stack_top
}

impl Drop for KernelStack {
    fn drop(&mut self) {
        let (kernel_stack_bottom, _) = kernel_stack_position(self.pid);
        let kernel_stack_bottom_va: VirtAddr = kernel_stack_bottom.into();
        KERNEL_SPACE
            .exclusive_access()
            .remove_area_with_start_vpn(kernel_stack_bottom_va.into());
    }
}

We need to construct TaskControlBlock for parent and children process.

pub struct TaskControlBlockInner {
	...
	// new:
    pub parent: Option<Weak<TaskControlBlock>>,
    pub children: Vec<Arc<TaskControlBlock>>,
    pub exit_code: i32,
}

// impl TaskControlBlockInner
pub fn is_zombie(&self) -> bool {
	self.get_status() == TaskStatus::Zombie
}

TaskManager manage all tasks and cpu dispatch, we will separate only tasks management for TaskManager.

// os/src/task/manager.rs

pub struct TaskManager {
    ready_queue: VecDeque<Arc<TaskControlBlock>>,
}

lazy_static! {
    pub static ref TASK_MANAGER: UPSafeCell<TaskManager> = unsafe {
        UPSafeCell::new(TaskManager::new())
    };
}

pub fn add_task(task: Arc<TaskControlBlock>) {
    TASK_MANAGER.exclusive_access().add(task);
}

pub fn fetch_task() -> Option<Arc<TaskControlBlock>> {
    TASK_MANAGER.exclusive_access().fetch()
}

And cpu dispatch for newly introduced Processor.
We introduce a idle process that used to call other process.

Why not direct call next by previous one? rather use idle process?
Separate idle process for start and others for its own, then dispatch data won’t occur in other process and make the dispatch process invisible for Trap for each process.

The whole workflow would be:

idle process fetch task and switch
task run out of its time or finish
task switch to idle process
repeat…

pub struct Processor {
    current: Option<Arc<TaskControlBlock>>,
	// idle process of current cpu
    idle_task_cx: TaskContext,
}

lazy_static! {
    pub static ref PROCESSOR: UPSafeCell<Processor> = unsafe {
        UPSafeCell::new(Processor::new())
    };
}

// run_tasks() 
// loop to fetch task and switch possible task
if let Some(task) = fetch_task() {
    let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
    let mut task_inner = task.inner_exclusive_access();
    let next_task_cx_ptr = &task_inner.task_cx as *const TaskContext;
    task_inner.task_status = TaskStatus::Running;
    drop(task_inner);
    processor.current = Some(task);
    drop(processor);
    unsafe {
        __switch(
            idle_task_cx_ptr,
            next_task_cx_ptr,
        );
    }
}

// switch to idle process if one task run out of its time.
pub fn schedule(switched_task_cx_ptr: *mut TaskContext) {
    let mut processor = PROCESSOR.exclusive_access();
    let idle_task_cx_ptr = processor.get_idle_task_cx_ptr();
    drop(processor);
    unsafe {
        __switch(
            switched_task_cx_ptr,
            idle_task_cx_ptr,
        );
    }
}

Dispatch Construction

Previously, we use suspend_current_and_run_next to pause task and switch to next, now we need to adapt it to process design.

// os/src/task/mod.rs

pub fn suspend_current_and_run_next() {
    let task = take_current_task().unwrap();

    // ---- access current TCB exclusively
    let mut task_inner = task.inner_exclusive_access();
    let task_cx_ptr = &mut task_inner.task_cx as *mut TaskContext;
    task_inner.task_status = TaskStatus::Ready;
    drop(task_inner);
    // ---- stop exclusively accessing current PCB

    // add to task deque
    add_task(task);
    // change current to idle process
    schedule(task_cx_ptr);
}

In previous case, task won’t be created by its parent task, but process will. So, if its TrapContext has been recycled, we need to refactor our trap_handler for such case.

// fn trap_handler() -> !
Trap::Exception(Exception::UserEnvCall) => {
    // jump to next instruction anyway
    let mut cx = current_trap_cx();
    cx.sepc += 4;
    // syscall may create new process and change trap context.
    let result = syscall(cx.x[17], [cx.x[10], cx.x[11], cx.x[12]]);
    // wether cx is changed or not, we will refetch it.
    cx = current_trap_cx();
    cx.x[10] = result as usize;
}

Now we will construct fork, exec, waitpid,exit.

Fork

We need to copy all memory layout and its task state. Then reallocate new kernel stack for it.

// impl MemorySet
pub fn from_existed_user(user_space: &MemorySet) -> MemorySet {
    let mut memory_set = Self::new_bare();
    // map trampoline
    memory_set.map_trampoline();
    // copy data sections/trap_context/user_stack
    for area in user_space.areas.iter() {
        let new_area = MapArea::from_another(area);
        memory_set.push(new_area, None);
        // copy data from another space
        for vpn in area.vpn_range {
            let src_ppn = user_space.translate(vpn).unwrap().ppn();
            let dst_ppn = memory_set.translate(vpn).unwrap().ppn();
            dst_ppn.get_bytes_array().copy_from_slice(src_ppn.get_bytes_array());
        }
    }
    memory_set
}

// impl TaskControlBlock
// fn fork
let trap_cx_ppn = memory_set
    .translate(VirtAddr::from(TRAP_CONTEXT).into())
    .unwrap()
    .ppn();
// alloc a pid and a kernel stack in kernel space
let pid_handle = pid_alloc();
let kernel_stack = KernelStack::new(&pid_handle);
let kernel_stack_top = kernel_stack.get_top();
let task_control_block = Arc::new(TaskControlBlock {
    pid: pid_handle,
    kernel_stack,
    inner: unsafe { UPSafeCell::new(TaskControlBlockInner {
        trap_cx_ppn,
        base_size: parent_inner.base_size,
        task_cx: TaskContext::goto_trap_return(kernel_stack_top),
        task_status: TaskStatus::Ready,
        memory_set,
        parent: Some(Arc::downgrade(self)),
        children: Vec::new(),
        exit_code: 0,
    })},
});
// add child
parent_inner.children.push(task_control_block.clone());
// modify kernel_sp in trap_cx
// **** access children PCB exclusively
let trap_cx = task_control_block.inner_exclusive_access().get_trap_cx();
trap_cx.kernel_sp = kernel_stack_top;

Finally, implement sys_fork

pub fn sys_fork() -> isize {
    let current_task = current_task().unwrap();
    let new_task = current_task.fork();
    let new_pid = new_task.pid.0;
    let trap_cx = new_task.inner_exclusive_access().get_trap_cx();

    // for child process, fork returns 0
    trap_cx.x[10] = 0;  //x[10] is a0 reg

    add_task(new_task);
    new_pid as isize
}

We can see that if trap_handler call sys_fork, parent process x[10] would be new_pid as return value.

Exec

If we want to execute a task by its name, we need to first load string in app load.

// os/build.rs

writeln!(f, r#"
.global _app_names
_app_names:"#)?;
for app in apps.iter() {
    writeln!(f, r#"    .string "{}""#, app)?;
}

# link_app.S

...
_app_names:
    .string "exit"
    .string "fantastic_text"
...

Construct APP_NAMES as global state in OS.

// os/src/loader.rs

// APP_NAMES: Vec<&'static str> {...
for _ in 0..num_app {
    let mut end = start;
    while end.read_volatile() != '\0' as u8 {
        end = end.add(1);
    }
    let slice = core::slice::from_raw_parts(start, end as usize - start as usize);
    let str = core::str::from_utf8(slice).unwrap();
    v.push(str);
    start = end.add(1);
}

When execute a new binary file, we need to read it and extract all state to replace original one.

// os/src/task/task.rs

impl TaskControlBlock {
    pub fn exec(&self, elf_data: &[u8]) {
        let (memory_set, user_sp, entry_point) = MemorySet::from_elf(elf_data);
        let trap_cx_ppn = memory_set
            .translate(VirtAddr::from(TRAP_CONTEXT).into())
            .unwrap()
            .ppn();

        // **** access inner exclusively
        let mut inner = self.inner_exclusive_access();
        inner.memory_set = memory_set;
        inner.trap_cx_ppn = trap_cx_ppn;

        let trap_cx = inner.get_trap_cx();
        *trap_cx = TrapContext::app_init_context(
            entry_point,
            user_sp,
            KERNEL_SPACE.exclusive_access().token(),
            self.kernel_stack.get_top(),
            trap_handler as usize,
        );
        // **** stop exclusively accessing inner automatically
    }
}

We will read input str as a ptr and replace current task state.

// os/src/syscall/process.rs

pub fn sys_exec(path: *const u8) -> isize {
    let token = current_user_token();
    let path = translated_str(token, path);
    if let Some(data) = get_app_data_by_name(path.as_str()) {
        let task = current_task().unwrap();
        task.exec(data);
        0
    } else {
        -1
    }
}

Exit

When a task exit, it will return a exit code assigned by app if successfully or kernel if anomaly.

// fn exit_current_and_run_next(exit_code:i32)
    inner.task_status = TaskStatus::Zombie;
    inner.exit_code = exit_code;

    // move all its children to the initial process
    // ++++++ access initproc TCB exclusively
    {
        let mut initproc_inner = INITPROC.inner_exclusive_access();
        for child in inner.children.iter() {
            child.inner_exclusive_access().parent = Some(Arc::downgrade(&INITPROC));
            initproc_inner.children.push(child.clone());
        }
    }
    // ++++++ stop exclusively accessing parent PCB

    // clear all memory.
    inner.children.clear();
    inner.memory_set.recycle_data_pages();
    drop(inner);
    // **** stop exclusively accessing current PCB
    // drop task manually to maintain rc correctly
    drop(task);
    // use _unused replace original context, which will be recycled by rust.
    let mut _unused = TaskContext::zero_init();
    schedule(&mut _unused as *mut _);

// os/src/syscall/process.rs

pub fn sys_exit(exit_code: i32) -> ! {
    exit_current_and_run_next(exit_code);
    panic!("Unreachable in sys_exit!");
}

WaitPid

WaitPid will return -1 if there’s no specified pid process exist, if it’s running, return -2, finally, if it finished, recycle it and return 0.

// sys_waitpid(pid:uisze, exit_code_ptr:*mut i32) -> isize

// -1 case
// search task manager and find (task_block)
|p| {pid == -1 || pid as usize == p.getpid()}
return -1;

let pair = // search task managers and find (idx,task_block)
p.inner_exclusive_access().is_zombie() && (pid == -1 || pid as usize == p.getpid())

if let Some((idx,_)) = pair {
    let child = inner.children.remove(idx);
    // confirm that child will be deallocated after removing from children list
    assert_eq!(Arc::strong_count(&child), 1);
    let found_pid = child.getpid();
    // ++++ temporarily access child TCB exclusively
    let exit_code = child.inner_exclusive_access().exit_code;
    // ++++ stop exclusively accessing child PCB
    *translated_refmut(inner.memory_set.token(), exit_code_ptr) = exit_code;
    found_pid as isize
} else {
    // pid process is running
    -2
}

// user/src/lib.rs

pub fn wait(exit_code: &mut i32) -> isize {
    loop {
        match sys_waitpid(-1, exit_code as *mut _) {
            -2 => { yield_(); }
            // -1 or a real pid
            exit_pid => return exit_pid,
        }
    }
}

rcore-handnote-6-1

Posted on 2025-04-27 Edited on 2025-12-04 In handnote

Chapter 6-1

Introduction

The demand of persistent storage is with growth of computer. Currently we can only store things on our map memory, but it’s only exist in running time, which is Internal Storage, now we want to store it to External Storage.

Concept

Regular File

cd os/src/
stat main.rs

File: main.rs
Size: 940           Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d  Inode: 4975        Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/   oslab)   Gid: ( 1000/   oslab)
Access: 2021-02-28 23:32:50.289925450 +0800
Modify: 2021-02-28 23:32:50.133927136 +0800
Change: 2021-02-28 23:32:50.133927136 +0800
Birth: -

Beside usual info, if one file is not regular, it’s usually a block device file or character device file, whose major/minor ID will be shown.

Links: alias name for one file
Inode: the underneath id used to route
Uid: the file belonged user id.
Gid: the file begloned group id.
Blocks: take amount of blocks(in linux, it’s 4096KB).

Dir

stat os
File: os
Size: 4096          Blocks: 8          IO Block: 4096   directory
Device: 801h/2049d  Inode: 4982        Links: 5
Access: (0755/drwxr-xr-x)  Uid: ( 1000/   oslab)   Gid: ( 1000/   oslab)
Access: 2021-02-28 23:32:50.133927136 +0800
Modify: 2021-02-28 23:32:50.129927180 +0800
Change: 2021-02-28 23:32:50.129927180 +0800
Birth: -

Access:
- d: dir
- r: allowed to read files and subdir
- w: allowed to create and delete files and subdir
- x: allowed to “pass” this dir.

File System

Play the role of mapping the given dir tree structure to persistent storage. For example: windows-FAT/NTPS; linux-Ext3/Ext4/Btrfs. Therefore, construct a VFS-Virtual File System is necessary to restrict unified interface.

Design

Simplification

flatten: only root dir /
permission: only user and no restriction on file access
no timestamp
no soft/hard link
many…

Cache

For a persistent external storage, it will separate file in basic storage unit. Which is called sector(usually 512 bytes, 4KB), rather, file system will set its own storage unit which is called block, usually different from sector, but in this implementation, we set it as 512 bytes, same as sector.

A basic interface from device is:

// easy-fs/src/block_dev.rs

pub trait BlockDevice : Send + Sync + Any {
    fn read_block(&self, block_id: usize, buf: &mut [u8]);
    fn write_block(&self, block_id: usize, buf: &[u8]);
}

File will not read and write directly often which will slow down speed, we will construct Block Cache to store read data. Then we unify all block cache in to a manager with limit size and used for allocation and deallocation.

// easy-fs/src/lib.rs

pub const BLOCK_SZ: usize = 512;

// easy-fs/src/block_cache.rs

pub struct BlockCache {
    cache: [u8; BLOCK_SZ],
    block_id: usize,
    block_device: Arc<dyn BlockDevice>,
    modified: bool,
}

impl BlockCache {
    pub fn read<T, V>(&self, offset: usize, f: impl FnOnce(&T) -> V) -> V {
        f(self.get_ref(offset))
    }

    pub fn modify<T, V>(&mut self, offset:usize, f: impl FnOnce(&mut T) -> V) -> V {
        f(self.get_mut(offset))
    }
}

// easy-fs/src/block_cache.rs

const BLOCK_CACHE_SIZE: usize = 16;

use alloc::collections::VecDeque;

pub struct BlockCacheManager {
    queue: VecDeque<(usize, Arc<Mutex<BlockCache>>)>,
}

impl BlockCacheManager {
    pub fn new() -> Self {
        Self { queue: VecDeque::new() }
    }
}

// impl BlockCacheManager
// fn get_block_cache(&mut self, block_id:usize, block_device: Arc<dyn BlockDevice>)
// if reach limit size:
if self.queue.len() == BLOCK_CACHE_SIZE {
    // from front to tail
    if let Some((idx, _)) = self.queue
        .iter()
        .enumerate()
        .find(|(_, pair)| Arc::strong_count(&pair.1) == 1) {
        self.queue.drain(idx..=idx);
    } else {
        panic!("Run out of BlockCache!");
    }
}
// load block into mem and push back
let block_cache = Arc::new(Mutex::new(
    BlockCache::new(block_id, Arc::clone(&block_device))
));
self.queue.push_back((block_id, Arc::clone(&block_cache)));
block_cache

Block Layout

We will design a whole map structure to control block caches.

First is Super Block which is a head to control everything, notice magic is magic number in mathematics to check the integrity of structure.

// easy-fs/src/layout.rs

#[repr(C)]
pub struct SuperBlock {
    magic: u32,
    pub total_blocks: u32,
    pub inode_bitmap_blocks: u32,
    pub inode_area_blocks: u32,
    pub data_bitmap_blocks: u32,
    pub data_area_blocks: u32,
}

pub fn is_valid(&self) -> bool {
    self.magic == EFS_MAGIC
}

Bit Map is a nice structure to handle mapping operations, we set each block as 512 bytes(4KB), each bits represent a state of allocation(1/0 for allocated/deallocated).

// easy-fs/src/bitmap.rs

// notice it only store the id of start block and the len of blocks in its range.
pub struct Bitmap {
    start_block_id: usize,
    blocks: usize,
}

// the true structure to map
type BitmapBlock = [u64; 64];

// equal: 4096 bits
const BLOCK_BITS: usize = BLOCK_SZ * 8;

impl Bitmap {
    pub fn alloc(&self, block_device: &Arc<dyn BlockDevice>) -> Option<usize> {
        for block_id in 0..self.blocks {
            let pos = get_block_cache(
                block_id + self.start_block_id as usize,
                Arc::clone(block_device),
            )
            .lock()
            .modify(0, |bitmap_block: &mut BitmapBlock| {
                // core!
                if let Some((bits64_pos, inner_pos)) = bitmap_block
                    .iter()
                    .enumerate()
                    .find(|(_, bits64)| **bits64 != u64::MAX)
                    .map(|(bits64_pos, bits64)| {
                        (bits64_pos, bits64.trailing_ones() as usize)
                    }) {
                    // modify cache
                    bitmap_block[bits64_pos] |= 1u64 << inner_pos;
                    Some(block_id * BLOCK_BITS + bits64_pos * 64 + inner_pos as usize)
                } else {
                    None
                }
            });
            if pos.is_some() {
                return pos;
            }
        }
        None
    }
}

Based on such structure, we could exposit what is Inode and Data Block, not all block will store real data because some of them need to be used as guidance. However, we also need to know where and how these route blocks be allocated. That’s the reason of Bit Map! Now we delve into Inode.

To make one inode control many data blocks, we will design layer of route for it. Beside direct index, it also store the index of layer 1 and layer 2 to route other index block(which is considered same as data block), and route to real data block. Notice one block contains 512 bytes, which is 512 u8, so it contains 512/4 = 128 u32, so one index block can route 128 * 512 bytes = 128 * 0.5 KB = 64 KB in one layer. In second layer, it can route as much as 128 * 64 KB = 64 MB.

// easy-fs/src/layout.rs

const INODE_DIRECT_COUNT: usize = 28;
// 128
const INODE_INDIRECT1_COUNT: usize = BLOCK_SZ / 4;

#[repr(C)]
pub struct DiskInode {
    pub size: u32,
    pub direct: [u32; INODE_DIRECT_COUNT],
    pub indirect1: u32,
    pub indirect2: u32,
    type_: DiskInodeType,
}

#[derive(PartialEq)]
pub enum DiskInodeType {
    File,
    Directory,
}

Such Inode take 128 bytes, so in one block, it could contains 4 inodes. We should make a data structure could be fit exactly into a block size. Now we design route method.

// 28 + 128
const INDIRECT1_BOUND: usize = DIRECT_BOUND + INODE_INDIRECT1_COUNT;
type IndirectBlock = [u32; BLOCK_SZ / 4];

impl DiskInode {
    pub fn get_block_id(&self, inner_id: u32, block_device: &Arc<dyn BlockDevice>) -> u32 {
        let inner_id = inner_id as usize;
        if inner_id < INODE_DIRECT_COUNT {
            self.direct[inner_id]
        } else if inner_id < INDIRECT1_BOUND {
            get_block_cache(self.indirect1 as usize, Arc::clone(block_device))
                .lock()
                .read(0, |indirect_block: &IndirectBlock| {
                    indirect_block[inner_id - INODE_DIRECT_COUNT]
                })
        } else {
            let last = inner_id - INDIRECT1_BOUND;
            let indirect1 = get_block_cache(
                self.indirect2 as usize,
                Arc::clone(block_device)
            )
            .lock()
            .read(0, |indirect2: &IndirectBlock| {
                indirect2[last / INODE_INDIRECT1_COUNT]
            });
            get_block_cache(
                indirect1 as usize,
                Arc::clone(block_device)
            )
            .lock()
            .read(0, |indirect1: &IndirectBlock| {
                indirect1[last % INODE_INDIRECT1_COUNT]
            })
        }
    }
}

Now we design Data block, which is simple. Because for file system, any data are just bytes.

// easy-fs/src/layout.rs

// BLOCK_SZ = 512
type DataBlock = [u8; BLOCK_SZ];

// impl DiskInode
    // pub fn read_at(
    //     &self,
    //     offset: usize,
    //     buf: &mut [u8],
    //     block_device: &Arc<dyn BlockDevice>,
loop {
    // calculate end of current block
    let mut end_current_block = (start / BLOCK_SZ + 1) * BLOCK_SZ;
    end_current_block = end_current_block.min(end);
    // read and update read size
    let block_read_size = end_current_block - start;
    let dst = &mut buf[read_size..read_size + block_read_size];
    get_block_cache(
        self.get_block_id(start_block as u32, block_device) as usize,
        Arc::clone(block_device),
    )
    .lock()
    .read(0, |data_block: &DataBlock| {
        let src = &data_block[start % BLOCK_SZ..start % BLOCK_SZ + block_read_size];
        dst.copy_from_slice(src);
    });
    read_size += block_read_size;
    // move to next block
    if end_current_block == end { break; }
    start_block += 1;
    start = end_current_block;
}

File System

Due to our consecutive layout, we will store bitmap and start block, then initiate a unified system to control allocation and route. We call it File System.

// easy-fs/src/efs.rs

pub struct EasyFileSystem {
    pub block_device: Arc<dyn BlockDevice>,
    pub inode_bitmap: Bitmap,
    pub data_bitmap: Bitmap,
    inode_area_start_block: u32,
    data_area_start_block: u32,
}

impl EasyFileSystem {
    pub fn create(
        block_device: Arc<dyn BlockDevice>,
        total_blocks: u32,
        inode_bitmap_blocks: u32,
    ) -> Arc<Mutex<Self>> {
        // calculate block size of areas & create bitmaps
        let inode_bitmap = Bitmap::new(1, inode_bitmap_blocks as usize);
        let inode_num = inode_bitmap.maximum();
        let inode_area_blocks =
            ((inode_num * core::mem::size_of::<DiskInode>() + BLOCK_SZ - 1) / BLOCK_SZ) as u32;
        let inode_total_blocks = inode_bitmap_blocks + inode_area_blocks;
        let data_total_blocks = total_blocks - 1 - inode_total_blocks;
        let data_bitmap_blocks = (data_total_blocks + 4096) / 4097;
        let data_area_blocks = data_total_blocks - data_bitmap_blocks;
        let data_bitmap = Bitmap::new(
            (1 + inode_bitmap_blocks + inode_area_blocks) as usize,
            data_bitmap_blocks as usize,
        );
...

Use Bit Map, we finally know which is Inode and Data.

// easy-fs/src/efs.rs

impl EasyFileSystem {
    pub fn get_disk_inode_pos(&self, inode_id: u32) -> (u32, usize) {
        let inode_size = core::mem::size_of::<DiskInode>();
        let inodes_per_block = (BLOCK_SZ / inode_size) as u32;
        let block_id = self.inode_area_start_block + inode_id / inodes_per_block;
        (block_id, (inode_id % inodes_per_block) as usize * inode_size)
    }

    pub fn get_data_block_id(&self, data_block_id: u32) -> u32 {
        self.data_area_start_block + data_block_id
    }
}

Our Disk Inode is aims for underneath system, not for user, so we need a real Inode as a interface for Disk Inode to route, which store its id and offset.

// easy-fs/src/vfs.rs

pub struct Inode {
    block_id: usize,
    block_offset: usize,
    fs: Arc<Mutex<EasyFileSystem>>,
    block_device: Arc<dyn BlockDevice>,
}

// easy-fs/src/vfs.rs

impl Inode {
    fn read_disk_inode<V>(&self, f: impl FnOnce(&DiskInode) -> V) -> V {
        get_block_cache(
            self.block_id,
            Arc::clone(&self.block_device)
        ).lock().read(self.block_offset, f)
    }

    fn modify_disk_inode<V>(&self, f: impl FnOnce(&mut DiskInode) -> V) -> V {
        get_block_cache(
            self.block_id,
            Arc::clone(&self.block_device)
        ).lock().modify(self.block_offset, f)
    }
}

All methods exposed to user will be root inode.

// easy-fs/src/efs.rs

impl EasyFileSystem {
    pub fn root_inode(efs: &Arc<Mutex<Self>>) -> Inode {
        let block_device = Arc::clone(&efs.lock().block_device);
        // acquire efs lock temporarily
        let (block_id, block_offset) = efs.lock().get_disk_inode_pos(0);
        // release efs lock
        Inode::new(
            block_id,
            block_offset,
            Arc::clone(efs),
            block_device,
        )
    }
}

However, we still need one special data block which is DirEntry, as directory which store inode_number to route inode, DirEntry takes 32 bytes, so each block can store 4 DirEntry. Thus we can route to inode by &str.

Notice disk_inode contains type for dir and file. So some of inodes will store dir data and some of inodes will store file data, we can get inode of data by inode of dir.

// easy-fs/src/layout.rs

const NAME_LENGTH_LIMIT: usize = 27;

#[repr(C)]
pub struct DirEntry {
    name: [u8; NAME_LENGTH_LIMIT + 1],
    inode_number: u32,
}

pub const DIRENT_SZ: usize = 32;

First, we will

// easy-fs/src/vfs.rs

impl Inode {
    pub fn find(&self, name: &str) -> Option<Arc<Inode>> {
        let fs = self.fs.lock();
        self.read_disk_inode(|disk_inode| {
            self.find_inode_id(name, disk_inode)
            .map(|inode_id| {
                let (block_id, block_offset) = fs.get_disk_inode_pos(inode_id);
                Arc::new(Self::new(
                    block_id,
                    block_offset,
                    self.fs.clone(),
                    self.block_device.clone(),
                ))
            })
        })
    }

    fn find_inode_id(
        &self,
        name: &str,
        disk_inode: &DiskInode,
    ) -> Option<u32> {
        // assert it is a directory
        assert!(disk_inode.is_dir());
        let file_count = (disk_inode.size as usize) / DIRENT_SZ;
        let mut dirent = DirEntry::empty();
        for i in 0..file_count {
            // note assert_eq! has side effect: read data to dirent.
            assert_eq!(
                disk_inode.read_at(
                    DIRENT_SZ * i,
                    dirent.as_bytes_mut(),
                    &self.block_device,
                ),
                DIRENT_SZ,
            );
            if dirent.name() == name {
                return Some(dirent.inode_number() as u32);
            }
        }
        None
    }
}

Usually, the workflow of create or delete, read or write would be:

read/write
- get root inode which is dir type
- read/write closure of disk inode through root inode
- resize specified inode
create/clear
- allocation/deallocation: alloc/dealloc inode by bitmap and get its index
- initialization/clear by get its block cache by its index
- resize root inode

rcore-handnote-6-2

Posted on 2025-04-27 Edited on 2025-12-04 In handnote

Chapter 6-2

Introduction

We gonna load our easy-fs to kernel. First, we need to know our device interface. Second, our Inode should be wrapped in OS as OSInode for extended functionality. Then we implement sys_read/write for it.

MMIO-Memory-Mapped I/O

The device registers of peripherals can be accessed through specific physical memory addresses, VirtIO MMIO physical address range for the peripheral bus is 4KiB starting at 0X10001000.

// os/src/config.rs

#[cfg(feature = "board_qemu")]
pub const MMIO: &[(usize, usize)] = &[
    (0x10001000, 0x1000),
];

OS Inode

We only take restriction on our operations with readable and writable by OpenFlags. offset and Arc is a way to tackle simple situation of multi processes.

// os/src/fs/inode.rs

bitflags! {
    pub struct OpenFlags: u32 {
        const RDONLY = 0;
        const WRONLY = 1 << 0;
        const RDWR = 1 << 1;
        const CREATE = 1 << 9;
        const TRUNC = 1 << 10;
    }
}

impl OpenFlags {
    /// Do not check validity for simplicity
    /// Return (readable, writable)
    pub fn read_write(&self) -> (bool, bool) {
        if self.is_empty() {
            (true, false)
        } else if self.contains(Self::WRONLY) {
            (false, true)
        } else {
            (true, true)
        }
    }
}

pub struct OSInode {
    readable: bool,
    writable: bool,
    inner: Mutex<OSInodeInner>,
}

pub struct OSInodeInner {
    offset: usize,
    inode: Arc<Inode>,
}

impl OSInode {
    pub fn new(
        readable: bool,
        writable: bool,
        inode: Arc<Inode>,
    ) -> Self {
        Self {
            readable,
            writable,
            inner: Mutex::new(OSInodeInner {
                offset: 0,
                inode,
            }),
        }
    }
}

File Descriptor Table

Now we need to connect file operations with process, each process need a descriptors table(which manage many files!) to indicate file record infos.

// os/src/task/task.rs

pub struct TaskControlBlockInner {
	// ...
    pub fd_table: Vec<Option<Arc<dyn File + Send + Sync>>>,
}

Remember our previous root inode? We load it directly for easy manipulation. So our workflow would be for current process, push a descriptor of allocation, and return the ptr of this allocation.

// os/src/fs/inode.rs

lazy_static! {
    pub static ref ROOT_INODE: Arc<Inode> = {
        let efs = EasyFileSystem::open(BLOCK_DEVICE.clone());
        Arc::new(EasyFileSystem::root_inode(&efs))
    };
}

pub fn open_file(name: &str, flags: OpenFlags) -> Option<Arc<OSInode>> {
    let (readable, writable) = flags.read_write();
    if flags.contains(OpenFlags::CREATE) {
        if let Some(inode) = ROOT_INODE.find(name) {
            // clear size
            inode.clear();
            Some(Arc::new(OSInode::new(
                readable,
                writable,
                inode,
            )))
        } else {
            // create file
            ROOT_INODE.create(name)
                .map(|inode| {
                    Arc::new(OSInode::new(
                        readable,
                        writable,
                        inode,
                    ))
                })
        }
    } else {
        ROOT_INODE.find(name)
            .map(|inode| {
                if flags.contains(OpenFlags::TRUNC) {
                    inode.clear();
                }
                Arc::new(OSInode::new(
                    readable,
                    writable,
                    inode
                ))
            })
    }
}

// os/src/syscall/fs.rs

pub fn sys_open(path: *const u8, flags: u32) -> isize {
    let task = current_task().unwrap();
    let token = current_user_token();
    let path = translated_str(token, path);
    if let Some(inode) = open_file(
        path.as_str(),
        OpenFlags::from_bits(flags).unwrap()
    ) {
        let mut inner = task.acquire_inner_lock();
        let fd = inner.alloc_fd();
        inner.fd_table[fd] = Some(inode);
        fd as isize
    } else {
        -1
    }
}

// os/src/syscall/fs.rs

pub fn sys_close(fd: usize) -> isize {
    let task = current_task().unwrap();
    let mut inner = task.inner_exclusive_access();
    if fd >= inner.fd_table.len() {
        return -1;
    }
    if inner.fd_table[fd].is_none() {
        return -1;
    }
    inner.fd_table[fd].take();
    0
}

The implementation is same for sys_read/write.