After the 3rd stage of the 2024 autumn winter open source operating system training camp, I would like to share my experience and summary with you.
There are three system structures to compare:
- Unikernel
- Single-Privilege Physical-Address-Space Combination-of-Kernel-and-UserApps
- Macrokernel Mode
- Multi-Privilege Paging-Address-Space Isolation-between-Kernel-and-UserApps
- Hypervisor (Virtual) Mode
- Isolation-between-Host-and-Guest
The critical key to the Component-based design is the Feature
. This can be configured in the Cargo.toml file and in the Makefile environment variable, which can decide which content to compile and link. It feels like a advanced #if - #endif
switch option.
Unikernel
We use 3 stages to express the system advancing from bare-metal program to a component-based system.
- Bare-metal program
- Hardfirm + Bootloader
- Initialize the special registers
- Initialize MMU (for Paging)
- Initialize Stack
- Initialize Interrupt Vector
- Initialize Peripherals / Devices
- Transfer to
main
program
- Layer / Hierarchy
- Hardfirm + Bootloader (Layer)
- Hardware Initialization(Layer): the special registers 、MMU(for Paging)、STACK、Interrupt Vector
- Peripherals (Layer)
- Transfer to
main
program (Layer)
- Component-based
- Hardfirm + Bootloader
- Hardware Initialization (HAL)
- Runtime Environment (RTE)
- Transfer to
main
program
1 | void Reset_Handler(void) { |
The above code selected from the HAL Code of STM32MP13, as the initialization and reset handler code, which will help initialize STACK, Interrupt-Vector and critical registers, and end up transferring to main
program.
The code is followed similar logic as the rCore, and can help us have a better understanding of the MCU, MPU and CPU.
Like rCore, We need to provide implementation for the memory interfaces about heap operations, to avoid memory leaks and memory fragmentation.
This can help us manage memory more efficiently, and provide convenience for future expansion.
There are 2 kinds of Memory Allocation Functions: One is based on Page (palloc
), and the other is based on Byte (balloc
), where the “ByteAlloc” is based on “PageAlloc”.
If we take “PageAlloc” based on “ByteAlloc”, it will be difficult to align.
Algorithm for Memory Allocation
- TLSF, Two-Level Segregated Fit
- Buddy
- Slab
- Bump
How to enable paging mechanism:
- Early in the kernel startup, use the ruled identity mapping part of memory
0xffff_ffc0_8000_0000 ~ 0xffff_ffc0_C000_FFFF
$\rightarrow$0x8000_0000~0xC000_0000
- Note that some address-related registers, such as SP, need to be changed to linear addresses
- Then if paging feature specified, rebuild the complete paging reflect.
Task switching:
Swaps the task currently being executed with a task in the ready queue.
For Single-core CPU, the form of multitasking can only be concurrent, but not parallel.
State of Task
- Running
- The number is equal to the number of processor cores
- SWITCH-TO: Ready or Blocked or Exited
- Ready
- Is ready to be scheduled at any time
- SWITCH-TO: Running
- Blocked
- Waiting for an event or resource satisfying a condition
- SWITCH-TO: Ready
- Exited
- The task is finished and waiting to be recycled
Task switching: Firsty, save the context of the current task. Then, restore the context of the new task. Finally, trandfer to switch tasks.
Note that the interrupt enable state of the processor switching the task, should be off during the switching process (CLI).
If neccessary, the Spinlock and Mutex are required to be used to avoid race conditions (SMP?).
There are usual preemption conditions: One is the time slice of the task is exhausted, and the other is the interrupt source, such as the clock (Timer). The privilege level may be used to determine the nested or re-entry of the traps.
Algorithm of Scheduling
- Collaborative scheduling algorithm (FIFO, fair)
- Preemptive scheduling algorithm (Privileged)
- ROUND_ROBIN
- CFS (Completely Fair Scheduler)
The DEVICEs are usually in the kinds of net
, block
, display
and so on.
How to discover and initialize devices
- (axruntime at startup) discovers the device and initializes it with the appropriate driver
- axdriver Indicates the process of discovering and initializing devices
- 2-stage cyclic detection discovers the device
- Level 1: traverses all virtio_mmio address ranges, determined by the platform physical memory layout, and performs transition page mapping
- Level 2: Enumerate devices with the for_each_drivers macro, and then probe each virtio device probe_mmio
- 2-stage cyclic detection discovers the device
- probe discovers devices based on the bus, matches drivers one by one, and initializes them
- Bus connecting with devices
- PCI
- MMIO
- Bus connecting with devices
A File System is a mechanism used in an operating system to manage files and data on computer storage devices such as hard disks, SSDS, flash memory, etc.
(In Linux, every device will also exist as one or more files)
File System
- RAMFS: A memory-based virtual file system
- For temporary data storage that is fast but easy to lose.
- DEVFS: Device file system
- For managing and accessing hardware devices, simplifying the development and access of device drivers.
- ProcFS: process file system
- Provides system process and status information for system monitoring and management.
- SysFS: System file system
- Exports kernel objects and properties for viewing and managing hardware devices and drivers.
Macro kernel
More than before:
- Privilege Level
- Address space
So we need
- Map user/kernel space (address space)
- Usual method
- The high end of the page table is used as kernel space
- The low end is used as user application space, because user programs mostly start from low addresses
- Kernel space is shared and user space is used independently
- Usual method
- Add system call (cross-privilege + address space)
The user applications toggle between two privilege levels:
- Task Context : User Level, execute application logic
- Trap Context : Kernel Level, handle system calls and exceptions
Address space Area mapping Back-end Backend
maps specific areas in a space
- Linear
- case The target physical address space area already exists, and the mapping relationship is directly established
- It can be used for MMIO area mapping and special shared address area mapping
- Corresponding physical page frames must be consecutive
- Alloc (Lazy)
- case Use missing page exception (亡羊补牢)
- Mapped by page, the corresponding physical page frames are usually discontinuous
To compatible with Linux applications, we should implement the compatible system calls, file systems and other system resources. This asks us to follow the POSIX standard.
POSIX allows developers to write applications using a standard set of apis without worrying about differences in the underlying operating system.
Except Windows, many modern operating systems, such as Linux, macOS (based on Unix), and many embedded systems (RtOS ?) , are partially or fully POSIX compliant. This allows applications developed on these systems to run seamlessly on different platforms.
When loading the ELF-formatted application, the VirtAddr
and MemSiz
are used to place the segment in the target virtual memory location. Beware that some segments like .bss are all zeros, so the actual data is not stored in the ELF file, but rather dynamically allocated space is requested.
The important support to do is to implemwnt hosted-standing environment main
function.
We should provide:
- Parameter information (
argc
andargv
): parameter number, parameter string pointer array - Environment information (
envv
): environment variable string pointer array
Hypervisor
Each virtual machine has its own independent virtual machine hardware, based on the physical computer. Each virtual machine runs its own operating system, and the operating system believes that it is alone in the execution environment, and cannot distinguish whether the environment is a physical machine or a virtual machine. (ideal state)
This is similar to virtualization software, like VMware ?
Is the virtual 8086 mode on x86 the same or similar principle?
The difference between a Hypervisor and an Emulator is whether the architecture of the virtual running environment is the same as that of the physical running environment that supports it.
It Seems to have something to do with emulation KVM acceleration.
Layers of resource objects supported by the Hypervisor:
- VM: manages the address space.
- vCPU: indicates the virtualization of computing resources and the flow of execution on VMS
- vMem: Memory virtualization based on the physical space layout of VMS
- vDevice: indicates device virtualization, including direct mapping and emulation
- vUtilities: interrupts virtualization and bus device operation
Usually use below items to implement a Hypervisor:
- run_guest is responsible for entering Guest environment
- guest_exit is responsible for exiting Guest environment