migrate eBPF to rCore tutorial
This is a step-by-step tutorial of how to migrate eBPF to rCore. You might also refer to zCore implmentaion for comparsion.
Environment Setup
Note that rCore tutorial only provides rust user programs but their is no rust support for eBPF yet. In that case, we need to use uCore-test to compile the user programs because uCore and rCore share the same ABI.
First, clone the repo
git clone https://github.com/cubele/rCore-Tutorial-Code-2022A rcore-ebpf
cd rcore-ebpf
git submodule update --init --recursive
Install dependencies
We need rust, llvm, clang, qemu and musl toolchains.
You may need to install toolchains from musl.cc, qemu version >= 6.0.0, clang version = 12, and add them to your PATH.
make env # this will install rust dependencies
Change config
You need to mannually change the path config in user/ebpf/build.sh
, you basically just need to change the prefix of builddir, ucoredir, targetdir
, use absolute path. Then,
cd os8
make run
You should see that rCore is started and the application is running. Run ebpf_user_loadprogextest
and ebpf_user_kernmaptest
for a glimpse.
Migration
In to the section overview, we now needs to add some os-dependent code to make it work. We only needs to add the syscall interface and implement the osutils.rs
to provide necessary kernel functions.
We should always bear in mind that rCore kernel has its own page table, rather than those share with user space, like Linux and zCore.
Add syscall interface
We need to add the following syscall interface to syscall.rs
:
First, add the syscall number to syscall/mod.rs
:
//! syscall/mod.rs
const SYSCALL_BPF: usize = 280;
// ...
pub fn syscall(syscall_id: usize, args: [usize; 4]) -> isize {
match syscall_id {
// ...
SYSCALL_BPF => sys_bpf(args[0] as isize, args[1] as usize, args[2] as usize),
}
Then, implement sys_bpf
//! syscall/ebpf.rs
pub fn sys_bpf(cmd: isize, bpf_attr: usize , size: usize) -> isize {
let ptr = bpf_attr as *const u8;
let cmd = cmd as i32;
use crate::ebpf::BpfCommand::*;
if let Ok(bpf_cmd) = BpfCommand::try_from(cmd) {
let ret = match bpf_cmd {
BPF_MAP_CREATE => sys_bpf_map_create(ptr, size),
BPF_MAP_LOOKUP_ELEM => sys_bpf_map_lookup_elem(ptr, size),
BPF_MAP_UPDATE_ELEM => sys_bpf_map_update_elem(ptr, size),
BPF_MAP_DELETE_ELEM => sys_bpf_map_delete_elem(ptr, size),
BPF_MAP_GET_NEXT_KEY => sys_bpf_map_get_next_key(ptr, size),
BPF_PROG_LOAD => todo!(),
BPF_PROG_ATTACH => sys_bpf_program_attach(ptr, size),
BPF_PROG_DETACH => sys_bpf_program_detach(ptr, size),
BPF_PROG_LOAD_EX => sys_preprocess_bpf_program_load_ex(ptr, size),
};
if ret < 0 {
-1
} else {
ret as isize
}
} else {
-1
}
We just need to add the syscall interface and use a match to call the corresponding function. We pass the attr
as argument and let the function to handle it. Note that BpfCommand
is an enum that represents the command type following the Linux API.
And we are finished with the syscall interface. This is quite simple, isn't it?
Implement osutils.rs
In osutils.rs
, we need to implement the following parts:
- A trait
ThreadLike
that is an analog to Linux thread. It should haveget_pid
,get_tid
andget_name
methods. And the functionos_current_thread
that returns that thread-like object.
In rCore, we know that TaskControlBlock
is the thread-like object. So we can implement the trait for TaskControlBlock
and return the current thread.
impl ThreadLike for TaskControlBlock {
fn get_pid(&self) -> u64 {
let proc = self.process.upgrade().unwrap();
return proc.pid.0 as u64;
}
fn get_tid(&self) -> u64 {
return 0; // no viable in rcore tutor
}
fn get_name(&self) -> String {
return String::from("not viable in rcore tutorial")
}
}
Note that we don't have a tid
in rCore. So just return 0. And we don't have a process name. So we just return a dummy string. Also, we need to implement the os_current_thread
function.
pub fn os_current_thread() -> Arc<dyn ThreadLike> {
if let Some(thread) = crate::task::current_task() {
thread
} else {
panic!("cannot get current thread!")
}
}
os_current_time
that returns the current time in nanoseconds.os_get_current_cpu
that returns the current cpu id.os_console_write_str
that writes a string to the console. Those are used in helper functions.
pub fn os_current_time() -> u128 {
crate::timer::get_time_us() as u128 * 1000
}
pub fn os_get_current_cpu() -> u8 {
0 // not viable
}
pub fn os_console_write_str(s: &str) {
crate::console::Stdout.write_str(s).unwrap();
}
Again, rCore is a single-core system. So we just return 0 for os_get_current_cpu
. And we use Stdout
to write to the console.
os_copy_from_user
andos_copy_to_user
that copy data from/to user space. Because some OS use virtual memory, we need to translate the address. This two functions are primitives for other memory related functions likeget_generic_from_user
and all other syscall branches.
pub fn os_copy_from_user(usr_addr: usize, kern_buf: *mut u8, len: usize) -> i32 {
use crate::mm::translated_byte_buffer;
use crate::task::current_user_token;
let t = translated_byte_buffer(current_user_token(), usr_addr as *const u8, len);
let mut all = vec![];
for i in t {
all.extend(i.to_vec());
}
copy(kern_buf, all.as_ptr() as *const u8, len);
0
}
pub fn os_copy_to_user(usr_addr: usize, kern_buf: *const u8, len: usize) -> i32 {
use crate::mm::translated_byte_buffer;
use crate::task::current_user_token;
let dst = translated_byte_buffer(current_user_token(), usr_addr as *const u8, len);
let mut ptr = kern_buf;
let mut total_len = len as i32;
for seg in dst {
let cur_len = seg.len();
total_len -= cur_len as i32;
unsafe {
core::ptr::copy_nonoverlapping(ptr, seg.as_mut_ptr(), cur_len);
ptr = ptr.add(cur_len);
}
}
assert_eq!(total_len, 0);
0
}
// You don't need to change this two functions
pub fn copy(dst: *mut u8, src: *const u8, len: usize) {
let from = unsafe { from_raw_parts(src, len) };
let to = unsafe { from_raw_parts_mut(dst, len) };
to.copy_from_slice(from);
}
pub fn memcmp(u: *const u8, v: *const u8, len: usize) -> bool {
return unsafe {
from_raw_parts(u, len) == from_raw_parts(v, len)
}
}
This part is very tricky, since rCore use different page table for kernel and user space. So for every pointer from user space, we needs to translate and copy it to kernel buffer and vice versa. We use translated_byte_buffer
to translate the address, which is provided by mm
.
It is worth note that for zCore, which use the same page table for kernel and user space, the two function is just a memcpy
.
And that is all for osutils.rs
! You can refer to the source code to see we use this two primitives to build get_generic_from_user
, and use that to get the attr
from user space. And all other syscall branches are done. That is because their logic is os-independent, we only needs to provide a memory interface.
BPF map operations
We don't need to change any code here, but I want to show that how different page table effect the migration.
Nte that functions like bpf_map_lookup_elem
can be called both from kernel space and user space. It is nothing important in zCore, since zCore use the same page table for kernel and user space. But in rCore, we need to be especially careful.
When a user program wants to access a map, it needs to call bpf_map_lookup_elem
in user space. And the syscall interface will call sys_bpf_map_lookup_elem
in kernel space. In that case, address are from user space, and we need the additional translation.
But a loaded BPF program can access maps as well and it runs in kernel space. In that case, the address is from kernel space, and we don't need the translation.
So, we need to pass a flag to tell the function whether the address is from user space or not. Argument from_user: bool
in that case. For user program, it calls through sys_bpf_map_...
, which passes from_user = true
. For BPF program, it calls bpf_helper_map_...
, which passes from_user = false
.
Let's take a look at the code of bpf_map_ops
, in which the actual map operations are done.
pub fn bpf_map_ops(fd: u32, op: BpfMapOp, key: *const u8, value: *mut u8, flags: u64, from_user: bool) -> BpfResult {
let bpf_objs = BPF_OBJECTS.lock();
let obj = bpf_objs.get(&fd).ok_or(ENOENT)?;
let shared_map = obj.is_map().ok_or(ENOENT)?;
let mut map = shared_map.lock();
if from_user {
let key_size = map.get_attr().key_size;
let value_size = map.get_attr().value_size;
let mut key_kern_buf = alloc::vec![0 as u8; key_size];
let kptr = key_kern_buf.as_mut_ptr();
os_copy_from_user(key as usize, kptr, key_size);
let mut value_kern_buf = alloc::vec![0 as u8; value_size];
let vptr = value_kern_buf.as_mut_ptr();
match op {
BpfMapOp::LookUp => {
let ret = map.lookup(kptr, vptr);
os_copy_to_user(value as usize, vptr, value_size);
ret
},
BpfMapOp::Update => {
os_copy_from_user(value as usize, vptr, value_size);
let ret = map.update(kptr, vptr, flags);
ret
},
BpfMapOp::Delete => map.delete(kptr),
BpfMapOp::GetNextKey => {
let ret = map.next_key(kptr, vptr);
os_copy_to_user(value as usize, vptr, value_size);
ret
}
_ => Err(EINVAL),
}
} else {
match op {
BpfMapOp::LookUp => map.lookup(key, value),
BpfMapOp::Update => map.update(key, value, flags),
BpfMapOp::Delete => map.delete(key),
BpfMapOp::GetNextKey => map.next_key(key, value),
_ => Err(EINVAL),
}
}
}
We can see that we need extra copy when from_user
is true. And we use os_copy_from_user
and os_copy_to_user
to do the copy. For OS like zCore, those steps are redundant. An extra copy will make it run slower, but does not harm the correctness. This is the fundamental trade off between generality and performance.
Add user program
Now we need add user program that test our eBPF utility. There are 4 user tests.
ebpf_user_naivetest.c
is a naive test that just callsys_bpf
to test the syscall interface.ebpf_user_maptest.c
is a test that test the map operations from user program.ebpf_user_loadprogextest.c
is a test that load a BPF kernel programcontext.c
from user program and print out the registers. This tests the kprobe and some helper functions.ebpf_user_kernmaptest.c
is a test that load a BPF kernel programmap.c
from user program and test the map operations from kernel program. This tests the bpf map in kernel space.
You can run them using make run
.
Add more user programs
If you want to add more user programs, you need to add the source file in ucore/src
just like the 4 tests above.
If you want to add kernel BPF programs and load them, you need to add the kernel source file in user/ebpf/kern
just like map.c
and context.c
, and do the following thing:
# suppose you add map2.c as kernal program
cd user/ebpf/kern
make
python3 hex2char.py
# copy the output, which is the length of kern prog
# copy map2.dump to your user program, take ebpf_user_loadprogextest.c as example
That is because rCore tutorial does not support mmap
or malloc
, we have to mannually copy the byte code into a char []
. hex2char.py
reads an ELF and print out its hex value byte by byte. We also have to mannually set the length of the program as argument. You can refer to zCore test program if you have implmented mmap
or malloc
and fstat
for kernel.
Don't forget to change build.sh
before you type make run
Details about Makefile
This section can be skipped if you are not interested in the details. But to do it from scratch, you need to know the whole procedure.
We need to compile the user program and link it with our library. We just use uCore-Test's makefile because its compatitble with rCore. But for eBPF programs, we need a user library bpf.h
and bpf.c
, in ucore/include
, ucore/lib
, respectively.
When we call make
in /ucore
, it will compile the user program and the ELF is in ucore/build/riscv64
.
Then we need to copy the elf to rCore's filesystem, which is done by user/ebpf/build.sh
. In the script, we touch a empty file called [testcase].rs
and copy the elf to target directory. Then we run make run
in os8
.
You can refer to makefile fs-img
and fn easy_fs_pack
for the reason. Those steps are quiet stupid and you might find better solution.
Further work
Currently, the symbol table is not supported, thus the kprobe attach target is hardcoded in tracepoint.rs::resolve_symbol
as sys_open
. You can try to support it by adding the symbol table utility. You may refer to zCore's repo and this blog