The execution environment is defined by the Target Triplet, which specifies the platform, CPU architecture, and library required for the build. For example: x86_64-unknown-linux-gnu.
Components of the Target Triplet:
Platform: The specific operating system or runtime environment.
CPU Architecture: The underlying hardware architecture (e.g., x86_64, ARM).
Library: The standard library or runtime support required.
If the target platform contains no std or any support syscall, such platform called bare-metal, Rust contains a core lib independent of any platform support.
extern"C" { fn__restore(cx_addr: usize); } unsafe { __restore(KERNEL_STACK.push_context( TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp()) // This context store the ptr to UserStack for restoration ) as *const _ asusize); }
Construct User App
Link app binary to kernel with specify memory layout
Read the layout, use AppManager to maintain and store
Load app from memory layout, copy consecutively to APP_BASE_ADDRESS(Currently we have no ability to dynamically read address)
// part of read in static init of AppManager let num_app_ptr = _num_app asusizeas *constusize; let num_app = num_app_ptr.read_volatile(); letmut app_start: [usize; MAX_APP_NUM + 1] = [0; MAX_APP_NUM + 1]; let app_start_raw: &[usize] = core::slice::from_raw_parts( num_app_ptr.add(1), num_app + 1 ); app_start[..=num_app].copy_from_slice(app_start_raw);
Load App:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
// part of code of copying to kernel asm!("fence.i"); // clear app area core::slice::from_raw_parts_mut( APP_BASE_ADDRESS as *mutu8, APP_SIZE_LIMIT ).fill(0); let app_src = core::slice::from_raw_parts( self.app_start[app_id] as *constu8, self.app_start[app_id + 1] - self.app_start[app_id] ); let app_dst = core::slice::from_raw_parts_mut( APP_BASE_ADDRESS as *mutu8, app_src.len() ); app_dst.copy_from_slice(app_src);
pubfnrun_next_app() -> ! { letmut app_manager = APP_MANAGER.exclusive_access(); let current_app = app_manager.get_current_app(); unsafe { app_manager.load_app(current_app); } app_manager.move_to_next_app(); drop(app_manager); // before this we have to drop local variables related to resources manually // and release the resources extern"C" { fn__restore(cx_addr: usize); } unsafe { __restore(KERNEL_STACK.push_context( TrapContext::app_init_context(APP_BASE_ADDRESS, USER_STACK.get_sp()) ) as *const _ asusize); } panic!("Unreachable in batch::run_current_app!"); }
.altmacro .macro SAVE_SN n sd s\n, (\n+2)*8(a0) .endm .macro LOAD_SN n ld s\n, (\n+2)*8(a1) .endm .section .text .globl __switch __switch: ... # logic by store previous context and take out the new one to current register
Store each App state array and current running app.
each state store TaskContext and TaskState for running or exited etc…
Init and ready by store the __restore ctx to TaskContext
Run for switch cx if needed.
1 2 3 4 5 6 7 8 9 10
let current_task_cx_ptr = &mut inner.tasks[current].task_cx as *mut TaskContext; let next_task_cx_ptr = &inner.tasks[next].task_cx as *const TaskContext; drop(inner); // before this, we should drop local variables that must be dropped manually unsafe { __switch( current_task_cx_ptr, next_task_cx_ptr, ); }rust
We don’t want a fixed physical addr for allocation, rather, we want a unified abstract interface for dynamic memory layout for app storage. We call it Virtual Address
Safety: Every app will be allocated in its own virtual memory space, so each can’t interfere others.
Efficiency: Every app can coexist in same time without demand of reading outer peripherals to switch app.(With development of memory size)
We need MMU(Memory Management Unit) to achieve Address Translation for interview from virtual to physical.
Different designs occur.
Design
Segment Design
Each app exist in one fixed slot for one segment as $[0,bound)$, with a linear map by MMU.
Problem: Wasteful and inflexible
We may want a different linear map for each app, for example, its allocation for heap, data, code etc… So we can dispatch memory in more finer style, but it can’t resolve the problem because now even slot is dynamically allocated, it may still exist some free memory too small to reuse, cause the External Fragment rather than Internal Fragment which is the problem due to fixed slot.
Paging Design
We could set a Protection bit as r for read, w for write, x for execution.
Another way is to inverse our mind, rather take a slot on memory, we take slot on MMU, it can map its slot(now we call it Page) for real physical memory layout. To adjust, we can take slice for Page to form Frame which is a smaller unit to suit physical memory layout, each app can take many Page Number for a Page Table, record the map.
Page Design
SV39 only use lower 39 bits rather whole 64 bits in design even bit width is 64 bits(it’s a fairly large amount!)
In a address, $[11:0]$ called Page Offset, and $[38:12]$ is the VPN which will be used for location of page and use offset to direct in one page(with 4KiB in one page).
We should modify satp to open Paging for S and U-level memory.
Usually, a simple design for page manager would be a linear map from base addr and follow up. But actually it will take a huge amount of memory due to the storage of offset by base addr for each app.
A finer design is from Trie. We will take index slice for each 8 bits(it will fit in to 4KB just right!), it means for each slice has 512 states, and link those state up, form a tree. Usually with 3-level for Page Index. Total 27 bits.
Virtual Page will reserve 27 bits for the index slice and 12 bits for offset for certain page. Total 39 bits.
When we transform a virtual addr to physical one, we will do the following exposition reversely.
Page Management Design
A simple allocation for page(rather management!) is stack style.
It means for one physical page, we will record each allocation by vector of FrameTracker as a representation of real Frame after the root.
We should design transformation from virtual addr to physical addr.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// for each idxs represent index slices of virtual addr. for i in0..3 { let pte = &mut ppn.get_pte_array()[idxs[i]]; if i == 2 { result = Some(pte); break; } if !pte.is_valid() { let frame = frame_alloc().unwrap(); // create or return None *pte = PageTableEntry::new(frame.ppn, PTEFlags::V); self.frames.push(frame); } ppn = pte.ppn(); }
Page Map Design
Based on our abstraction, we need a design for MapArea to given a map from continuous virtual address(no matter their corresponding page!) to physical address.
impl MemorySet { pubfnactivate(&self) { let satp = self.page_table.token(); unsafe { satp::write(satp); asm!("sfence.vma" :::: "volatile"); // sfence.vma is a special instruction to clear `Translaton Lookaside Buffer` which is used for quick search for memory addr to reduce performance expenses. } } }
Therefore, it will fill current root of physical page number as activation.
Notice that we should make instruction contigeous for SV39 open in physical address to amend the gap of transformation of address before and after open.
A problem is that separation of Kernel and App will also isolate Trap, the process need info from App to Kernel but App can’t see it. Therefore, we need a transfer operation. We achieve this by storing related info in TrapContext.
(Because there’s no more register to store these without breaking state like sscratch designed for kernel stack.)
# load kernel_satp into t0 ld t0, 34*8(sp) # load trap_handler into t1 ld t1, 36*8(sp) # move to kernel_sp ld sp, 35*8(sp) # switch to kernel space csrw satp, t0 sfence.vma # jump to trap_handler jr t1
# __restore: # a0: *TrapContext in user space(Constant); a1: user space token # switch to user space csrw satp, a1 sfence.vma csrw sscratch, a0
To amend the problem of contigeous instructions after switch, we need to adjust memory layout for trampoline which is in the same location in U-level and S-level(unified for all app to trap!). It will be placed at highest virtual page.
After the demand of code execution one by one, we want to introduce Process which will be a full space-time description of execution process of binary file in OS. It means for one process, it should hold independent resources to be executed.
After Process, Thread and Coroutine are also developed in growth of OS. They are different in resources taken up, usually, Thread will be in one process and hold their own independent stack and workflow; Coroutine will be in one Thread and hold their own independent workflow.
Design
Every process need independent memory layout, can be dispatch by cpu. It’s the functionality based on Task, after that, each process can Fork their own children processes, so there’s a workflow in time discrepancy. Its resource can’t be recycled in time due to children processes, we need to mark it as Zombie Process.
To clarify which is children, which is parent, and each isolated process, we mark each with PID-Process Identifier. Notice if we fork a process, it will be same as parent only a0 which is the register called for return will be different, parent process return new PID of child process, child process return 0 as none of fork.
fork: copy a process state(like sp etc…) as its child process.
waitpid: wait a child become zombie and recycle all resources.
exec: clear a process state and load a execution file.
Data Constructon
We will recycle all presumed pid by PidAllocator, No need to worry about previous pid used.
And cpu dispatch for newly introduced Processor. We introduce a idle process that used to call other process.
Why not direct call next by previous one? rather use idle process?
Separate idle process for start and others for its own, then dispatch data won’t occur in other process and make the dispatch process invisible for Trap for each process.
// run_tasks() // loop to fetch task and switch possible task ifletSome(task) = fetch_task() { let idle_task_cx_ptr = processor.get_idle_task_cx_ptr(); letmut task_inner = task.inner_exclusive_access(); let next_task_cx_ptr = &task_inner.task_cx as *const TaskContext; task_inner.task_status = TaskStatus::Running; drop(task_inner); processor.current = Some(task); drop(processor); unsafe { __switch( idle_task_cx_ptr, next_task_cx_ptr, ); } }
// switch to idle process if one task run out of its time. pubfnschedule(switched_task_cx_ptr: *mut TaskContext) { letmut processor = PROCESSOR.exclusive_access(); let idle_task_cx_ptr = processor.get_idle_task_cx_ptr(); drop(processor); unsafe { __switch( switched_task_cx_ptr, idle_task_cx_ptr, ); } }
Dispatch Construction
Previously, we use suspend_current_and_run_next to pause task and switch to next, now we need to adapt it to process design.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
// os/src/task/mod.rs
pubfnsuspend_current_and_run_next() { let task = take_current_task().unwrap();
// ---- access current TCB exclusively letmut task_inner = task.inner_exclusive_access(); let task_cx_ptr = &mut task_inner.task_cx as *mut TaskContext; task_inner.task_status = TaskStatus::Ready; drop(task_inner); // ---- stop exclusively accessing current PCB
// add to task deque add_task(task); // change current to idle process schedule(task_cx_ptr); }
In previous case, task won’t be created by its parent task, but process will. So, if its TrapContext has been recycled, we need to refactor our trap_handler for such case.
1 2 3 4 5 6 7 8 9 10 11
// fn trap_handler() -> ! Trap::Exception(Exception::UserEnvCall) => { // jump to next instruction anyway letmut cx = current_trap_cx(); cx.sepc += 4; // syscall may create new process and change trap context. let result = syscall(cx.x[17], [cx.x[10], cx.x[11], cx.x[12]]); // wether cx is changed or not, we will refetch it. cx = current_trap_cx(); cx.x[10] = result asusize; }
Now we will construct fork, exec, waitpid,exit.
Fork
We need to copy all memory layout and its task state. Then reallocate new kernel stack for it.
// impl MemorySet pubfnfrom_existed_user(user_space: &MemorySet) -> MemorySet { letmut memory_set = Self::new_bare(); // map trampoline memory_set.map_trampoline(); // copy data sections/trap_context/user_stack for area in user_space.areas.iter() { let new_area = MapArea::from_another(area); memory_set.push(new_area, None); // copy data from another space for vpn in area.vpn_range { let src_ppn = user_space.translate(vpn).unwrap().ppn(); let dst_ppn = memory_set.translate(vpn).unwrap().ppn(); dst_ppn.get_bytes_array().copy_from_slice(src_ppn.get_bytes_array()); } } memory_set }
// impl TaskControlBlock // fn fork let trap_cx_ppn = memory_set .translate(VirtAddr::from(TRAP_CONTEXT).into()) .unwrap() .ppn(); // alloc a pid and a kernel stack in kernel space let pid_handle = pid_alloc(); let kernel_stack = KernelStack::new(&pid_handle); let kernel_stack_top = kernel_stack.get_top(); let task_control_block = Arc::new(TaskControlBlock { pid: pid_handle, kernel_stack, inner: unsafe { UPSafeCell::new(TaskControlBlockInner { trap_cx_ppn, base_size: parent_inner.base_size, task_cx: TaskContext::goto_trap_return(kernel_stack_top), task_status: TaskStatus::Ready, memory_set, parent: Some(Arc::downgrade(self)), children: Vec::new(), exit_code: 0, })}, }); // add child parent_inner.children.push(task_control_block.clone()); // modify kernel_sp in trap_cx // **** access children PCB exclusively let trap_cx = task_control_block.inner_exclusive_access().get_trap_cx(); trap_cx.kernel_sp = kernel_stack_top;
Finally, implement sys_fork
1 2 3 4 5 6 7 8 9 10 11 12
pubfnsys_fork() -> isize { let current_task = current_task().unwrap(); let new_task = current_task.fork(); let new_pid = new_task.pid.0; let trap_cx = new_task.inner_exclusive_access().get_trap_cx();
// for child process, fork returns 0 trap_cx.x[10] = 0; //x[10] is a0 reg
add_task(new_task); new_pid asisize }
We can see that if trap_handler call sys_fork, parent process x[10] would be new_pid as return value.
Exec
If we want to execute a task by its name, we need to first load string in app load.
1 2 3 4 5 6 7 8
// os/build.rs
writeln!(f, r#" .global _app_names _app_names:"#)?; for app in apps.iter() { writeln!(f, r#" .string "{}""#, app)?; }
// move all its children to the initial process // ++++++ access initproc TCB exclusively { letmut initproc_inner = INITPROC.inner_exclusive_access(); for child in inner.children.iter() { child.inner_exclusive_access().parent = Some(Arc::downgrade(&INITPROC)); initproc_inner.children.push(child.clone()); } } // ++++++ stop exclusively accessing parent PCB
// clear all memory. inner.children.clear(); inner.memory_set.recycle_data_pages(); drop(inner); // **** stop exclusively accessing current PCB // drop task manually to maintain rc correctly drop(task); // use _unused replace original context, which will be recycled by rust. letmut _unused = TaskContext::zero_init(); schedule(&mut _unused as *mut _);
// os/src/syscall/process.rs
pubfnsys_exit(exit_code: i32) -> ! { exit_current_and_run_next(exit_code); panic!("Unreachable in sys_exit!"); }
WaitPid
WaitPid will return -1 if there’s no specified pid process exist, if it’s running, return -2, finally, if it finished, recycle it and return 0.
let pair = // search task managers and find (idx,task_block) p.inner_exclusive_access().is_zombie() && (pid == -1 || pid asusize == p.getpid())
ifletSome((idx,_)) = pair { let child = inner.children.remove(idx); // confirm that child will be deallocated after removing from children list assert_eq!(Arc::strong_count(&child), 1); let found_pid = child.getpid(); // ++++ temporarily access child TCB exclusively let exit_code = child.inner_exclusive_access().exit_code; // ++++ stop exclusively accessing child PCB *translated_refmut(inner.memory_set.token(), exit_code_ptr) = exit_code; found_pid asisize } else { // pid process is running -2 }
// user/src/lib.rs
pubfnwait(exit_code: &muti32) -> isize { loop { match sys_waitpid(-1, exit_code as *mut _) { -2 => { yield_(); } // -1 or a real pid exit_pid => return exit_pid, } } }
The demand of persistent storage is with growth of computer. Currently we can only store things on our map memory, but it’s only exist in running time, which is Internal Storage, now we want to store it to External Storage.
Play the role of mapping the given dir tree structure to persistent storage. For example: windows-FAT/NTPS; linux-Ext3/Ext4/Btrfs. Therefore, construct a VFS-Virtual File System is necessary to restrict unified interface.
Design
Simplification
flatten: only root dir /
permission: only user and no restriction on file access
no timestamp
no soft/hard link
many…
Cache
For a persistent external storage, it will separate file in basic storage unit. Which is called sector(usually 512 bytes, 4KB), rather, file system will set its own storage unit which is called block, usually different from sector, but in this implementation, we set it as 512 bytes, same as sector.
File will not read and write directly often which will slow down speed, we will construct Block Cache to store read data. Then we unify all block cache in to a manager with limit size and used for allocation and deallocation.
Bit Map is a nice structure to handle mapping operations, we set each block as 512 bytes(4KB), each bits represent a state of allocation(1/0 for allocated/deallocated).
Based on such structure, we could exposit what is Inode and Data Block, not all block will store real data because some of them need to be used as guidance. However, we also need to know where and how these route blocks be allocated. That’s the reason of Bit Map! Now we delve into Inode.
To make one inode control many data blocks, we will design layer of route for it. Beside direct index, it also store the index of layer 1 and layer 2 to route other index block(which is considered same as data block), and route to real data block. Notice one block contains 512 bytes, which is 512 u8, so it contains 512/4 = 128 u32, so one index block can route 128 * 512 bytes = 128 * 0.5 KB = 64 KB in one layer. In second layer, it can route as much as 128 * 64 KB = 64 MB.
Such Inode take 128 bytes, so in one block, it could contains 4 inodes. We should make a data structure could be fit exactly into a block size. Now we design route method.
// impl DiskInode // pub fn read_at( // &self, // offset: usize, // buf: &mut [u8], // block_device: &Arc<dyn BlockDevice>, loop { // calculate end of current block letmut end_current_block = (start / BLOCK_SZ + 1) * BLOCK_SZ; end_current_block = end_current_block.min(end); // read and update read size let block_read_size = end_current_block - start; let dst = &mut buf[read_size..read_size + block_read_size]; get_block_cache( self.get_block_id(start_block asu32, block_device) asusize, Arc::clone(block_device), ) .lock() .read(0, |data_block: &DataBlock| { let src = &data_block[start % BLOCK_SZ..start % BLOCK_SZ + block_read_size]; dst.copy_from_slice(src); }); read_size += block_read_size; // move to next block if end_current_block == end { break; } start_block += 1; start = end_current_block; }
File System
Due to our consecutive layout, we will store bitmap and start block, then initiate a unified system to control allocation and route. We call it File System.
Our Disk Inode is aims for underneath system, not for user, so we need a real Inode as a interface for Disk Inode to route, which store its id and offset.
However, we still need one special data block which is DirEntry, as directory which store inode_number to route inode, DirEntry takes 32 bytes, so each block can store 4 DirEntry. Thus we can route to inode by &str.
Notice disk_inode contains type for dir and file. So some of inodes will store dir data and some of inodes will store file data, we can get inode of data by inode of dir.
fnfind_inode_id( &self, name: &str, disk_inode: &DiskInode, ) -> Option<u32> { // assert it is a directory assert!(disk_inode.is_dir()); let file_count = (disk_inode.size asusize) / DIRENT_SZ; letmut dirent = DirEntry::empty(); for i in0..file_count { // note assert_eq! has side effect: read data to dirent. assert_eq!( disk_inode.read_at( DIRENT_SZ * i, dirent.as_bytes_mut(), &self.block_device, ), DIRENT_SZ, ); if dirent.name() == name { returnSome(dirent.inode_number() asu32); } } None } }
Usually, the workflow of create or delete, read or write would be:
read/write
get root inode which is dir type
read/write closure of disk inode through root inode
resize specified inode
create/clear
allocation/deallocation: alloc/dealloc inode by bitmap and get its index
initialization/clear by get its block cache by its index
In usual shell, | is the symbolic of pipe. Manage input from left and output to right. If we abstract everything to file, s.t. Stdin or Stdout, so does Pipe, it has read and write ends, user could read thing from this end and write thing(often in child process) to other end, transfer those underneath thing.
We already has file descriptor as the indication of file, we will implement same operation for pipe.
sys_pipe get the ptr of a array with len = 2, output the write and the read ends of descriptors of pipe in the ptr.
It should has write and read ends which means ends share the same data, and record read and write informations on this data. We will construct RingBuffer to achieve this. Pipe owns a buffer control read and write, buffer will record data from head to tail index. Why we can’t just use two piece of data or Queue?
Because there’s no copy and suitable for our restriction! We will read data from head and move forward and push data to end in a fixed array rather allocation for Queue.
pubstructPipeRingBuffer { arr: [u8; RING_BUFFER_SIZE], head: usize, // head index of ring buffer tail: usize, // tail index of ring buffer status: RingBufferStatus, write_end: Option<Weak<Pipe>>, }
impl TaskControlBlock { // notice exec will allocate a new memory set! pubfnexec(&self, elf_data: &[u8], args: Vec<String>) { // ... // first allocate memory for ptr of strings. user_sp -= (args.len() + 1) * core::mem::size_of::<usize>(); let argv_base = user_sp; // allocate new memory in user stack addr as a vector of strings letmut argv: Vec<_> = (0..=args.len()) .map(|arg| { translated_refmut( memory_set.token(), (argv_base + arg * core::mem::size_of::<usize>()) as *mutusize ) }) .collect(); *argv[args.len()] = 0; for i in0..args.len() { // allocate for strings themselves. user_sp -= args[i].len() + 1; *argv[i] = user_sp; letmut p = user_sp; for c in args[i].as_bytes() { *translated_refmut(memory_set.token(), p as *mutu8) = *c; p += 1; } *translated_refmut(memory_set.token(), p as *mutu8) = 0; } // make the user_sp aligned to 8B for k210 platform user_sp -= user_sp % core::mem::size_of::<usize>();
// **** hold current PCB lock letmut inner = self.acquire_inner_lock(); // substitute memory_set inner.memory_set = memory_set; // update trap_cx ppn inner.trap_cx_ppn = trap_cx_ppn; // initialize trap_cx letmut trap_cx = TrapContext::app_init_context( entry_point, user_sp, KERNEL_SPACE.lock().token(), self.kernel_stack.get_top(), trap_handler asusize, ); // a[0] be args len trap_cx.x[10] = args.len(); // a[1] be args base addr trap_cx.x[11] = argv_base; *inner.get_trap_cx() = trap_cx; // **** release current PCB lock } }
Now we provide receive operation in _start, in which main could use it at first time S-level reading data and passing to U-level:
#[no_mangle] #[link_section = ".text.entry"] pubextern"C"fn_start(argc: usize, argv: usize) -> ! { unsafe { HEAP.lock() .init(HEAP_SPACE.as_ptr() asusize, USER_HEAP_SIZE); } letmut v: Vec<&'staticstr> = Vec::new(); for i in0..argc { let str_start = unsafe { ((argv + i * core::mem::size_of::<usize>()) as *constusize).read_volatile() }; let len = (0usize..).find(|i| unsafe { ((str_start + *i) as *constu8).read_volatile() == 0 }).unwrap(); v.push( core::str::from_utf8(unsafe { core::slice::from_raw_parts(str_start as *constu8, len) }).unwrap() ); } exit(main(argc, v.as_slice())); }
Redirection
Redirection usually represent using > and < for output and input.
If we really want to redirect IO, we will combine user_shell and sys_dup.
First, sys_dup will duplicate a new file descriptor already opened in this process.
Then we parse user arguments, if there exist > or <, fork a new child process, open the file and close our corresponding Stdin and Stdout descriptor, using dup to hold the place of it by file itself! Then exec by original parsed arguments, and receive results in parent process.