WEBVTT

00:00.000 --> 00:25.000
I'm just an ordinary guy, and while being a child, my mother bought me a book, principles of operating systems.

00:25.000 --> 00:33.000
The story, how I go to it, is very long. It's boggings from 60s, written by Betterbury Hanson.

00:33.000 --> 00:39.000
So instead of political priests, I've been reading about ancient principles of operating systems.

00:39.000 --> 00:50.000
Then, in 1999, I spotted an article called Q&AX Operating System on a single floppy, which was my first contact with micro kernels.

00:50.000 --> 00:55.000
But unfortunately, back then, I wasn't a muse by the fact that it's a microchernel.

00:55.000 --> 00:59.000
I've been a muse by the fact that it runs from a single floppy.

00:59.000 --> 01:06.000
Then, I started my software developer career, and they're all from an industry to an industry.

01:06.000 --> 01:11.000
And ended up in automotive where I got into contact with HardenBedit.

01:11.000 --> 01:27.000
And what I've seen there wasn't very a muse, wasn't very amusing because the general status of HardenBedit of all power embedded is very poor regarding the cybersecurity.

01:27.000 --> 01:35.000
A bulk of popular operating systems are running entirely unprotected.

01:35.000 --> 01:47.000
And while some of them offer some solution to get memory protection running, even their own documentation states it's complicated.

01:47.000 --> 01:56.000
Like, if you want it to separate two user space processes in the fire, it's like 150 lines of color or something like that.

01:56.000 --> 02:01.000
So, and in general developers avoid it.

02:01.000 --> 02:21.000
So, my question was, is it possible to come up with a way to bring memory protection to microcontrollers in a way which will be simple to use and still be usable for real world use cases?

02:21.000 --> 02:29.000
And for this to happen, I came up with one concept.

02:29.000 --> 02:36.000
If you want memory protection, you have some, you have to have some way to isolate processes, things apart.

02:36.000 --> 02:39.000
So, I retrofit it, the concept of process.

02:39.000 --> 02:50.000
It's nothing new for general microcorner or operating system community, but it's almost nonexistent in terms in a realms of, let's say, the fire.

02:50.000 --> 03:01.000
And the way I did it was that I statically mapped parts of process onto memory protection unit regions.

03:01.000 --> 03:11.000
The reason why I did it statically is that any other way of mapping, which would be dynamic, which compromise real time properties.

03:11.000 --> 03:25.000
Because the very last thing you want to happen when you are serving, let's say, motor control routine is to have a page fault.

03:25.000 --> 03:33.000
And another factor is that this is trivial to do an in a fully automatic way.

03:33.000 --> 03:41.000
For example, here one process is considered everything which goes into one statically library.

03:41.000 --> 03:44.000
So, as a developer doesn't have to do anything.

03:44.000 --> 03:55.000
You simply think of it one library equals one process. Simple, well defined, it cannot communicate with each other.

03:55.000 --> 04:00.000
You could say, okay, let's use kernel for that purpose.

04:00.000 --> 04:02.000
Here's the problem.

04:02.000 --> 04:05.000
Microcontrollers only have memory protection units.

04:05.000 --> 04:15.000
And these make sharing memory, which you need for implementing Cisco's, and incredibly hard task.

04:15.000 --> 04:22.000
You almost can share memory. You can't do copy and write or anything like that.

04:22.000 --> 04:31.000
So, this also kind of prevents message passing and message passing based micro kernels.

04:31.000 --> 04:39.000
So, I had to come up with some other primitive to allow any communication at all.

04:39.000 --> 04:45.000
And this primitive is remote procedure calling.

04:45.000 --> 04:57.000
Why? The way how CMRX implements remote procedure calling is that when client thread performs remote procedure call,

04:57.000 --> 05:05.000
the thread itself is migrated into server process, with its access to the client process memory.

05:05.000 --> 05:11.000
This avoids problem of sharing memory because the thread still has access to its original memory.

05:11.000 --> 05:16.000
So, you can execute server routines on top of client memory.

05:16.000 --> 05:22.000
Another characteristic here is that this RPC mechanism is object oriented.

05:22.000 --> 05:25.000
So, you are not calling functions directly.

05:25.000 --> 05:31.000
Rather, you are referring to method between interface, which can be supplied by third party.

05:31.000 --> 05:34.000
And server implements these methods.

05:35.000 --> 05:42.000
It turns out that all these can be checked at the compile time by the compiler.

05:42.000 --> 05:47.000
No need for interface definition language. Everything is defined entirely in C.

05:47.000 --> 05:54.000
You can check if the RPC call is calling a valid method within valid interface.

05:54.000 --> 05:57.000
You can type check arguments.

05:57.000 --> 06:04.000
You can type check if the implementation is implementing the interface, it states, it implements.

06:04.000 --> 06:08.000
And you can also type check arguments there.

06:08.000 --> 06:11.000
So, there is zero runtime overhead here.

06:11.000 --> 06:16.000
The only thing you need to check is if this object reference is valid.

06:16.000 --> 06:25.000
And client process didn't made it up to try to forge some malware or something like that.

06:25.000 --> 06:29.000
And this still has one problem.

06:29.000 --> 06:38.000
If client takes all his memory to the server, how this RPC will, for example, access its own memory.

06:38.000 --> 06:49.000
For this RPC was extended to in a way that client doesn't take all its memory.

06:49.000 --> 06:54.000
It only takes whatever is in the shared book.

06:54.000 --> 07:00.000
Therefore, client has an option to opt what he wants to share.

07:00.000 --> 07:08.000
And the remaining memory accessible during the remote procedure call comes from the server.

07:08.000 --> 07:13.000
This way, you can copy data between server and client.

07:13.000 --> 07:16.000
And the only overhead is the copy operation itself.

07:16.000 --> 07:20.000
And you only have the overhead when you actually use it.

07:25.000 --> 07:30.000
It has quite good, not best but quite good granularity.

07:30.000 --> 07:38.000
Because the shared data are available only to the forced by the hardware to implement is lower lower lower

07:38.000 --> 07:46.000
Interabrequest lines management because this cannot be done in the user space.

07:46.000 --> 07:51.000
Everything else is run as a user space server.

07:51.000 --> 07:57.000
Not one you may have as many user space servers as you want.

07:57.000 --> 08:08.000
And for example, we don't have a bunch of calls in RTOS or features like Qs, NewTaxis.

08:08.000 --> 08:13.000
We don't have anything of this in the kernel because everything can be implemented as a server.

08:13.000 --> 08:17.000
So for example, we have a Q server which provides Qs.

08:17.000 --> 08:19.000
It's completely a user space.

08:19.000 --> 08:25.000
We don't even have newTaxis because this is provided by the hardware itself.

08:25.000 --> 08:35.000
And this is pretty much all we have right now because it turns out we don't need anything else for the task we use this operating system for.

08:35.000 --> 08:38.000
And this makes this kernel extremely portable.

08:38.000 --> 08:47.000
Because for ARM you only need like maybe 10 objects out of CMC's headers.

08:47.000 --> 08:51.000
And this is provided by pretty much any hell how out there.

08:51.000 --> 08:56.000
So as long as you have these headers you are able to port CMRX there.

08:56.000 --> 09:05.000
Currently it runs on pretty much anything from Cortex M0 plus to Cortex M33.

09:05.000 --> 09:10.000
And I mentioned that it is my kernel.

09:10.000 --> 09:16.000
So drivers also have to run in the user space and it turns it is possible.

09:16.000 --> 09:28.000
You can give a process another region or segment and define addresses where it has the access to.

09:28.000 --> 09:38.000
And if this memory region appears to be a memory region of some peripheral what you have a driver.

09:38.000 --> 09:46.000
As of now with most of the hardware all the code of the driver can reside inside the process.

09:46.000 --> 09:49.000
The only exception are interrupt service handlers.

09:49.000 --> 09:57.000
And the reason for that is that if we dispatched them it would compromise latency.

09:58.000 --> 10:14.000
You have this mechanism in for example in ZFIRE but it still has an option to do short path because sometimes this is fetching increases latency into an acceptable level.

10:14.000 --> 10:21.000
And as for the API of the driver it is just another RPC service.

10:21.000 --> 10:31.000
So we tested this and we were able to port tiny USB to run almost entirely in the user space.

10:31.000 --> 10:39.000
And we only needed like five lines of modification inside the data and it was be code based aside from the operating system abstraction layer.

10:39.000 --> 10:48.000
So I consider this to be a huge success.

10:48.000 --> 10:52.000
And you already met a driver for CMRX.

10:52.000 --> 10:57.000
This one line turns an ordinary process into the driver.

10:57.000 --> 11:07.000
This line happens the addresses in this line happen to be a GPIO peripheral in RP2040 microcontroller.

11:07.000 --> 11:13.000
So and so the Blinky needs this access to actually be able to bring with a lot of dialed.

11:13.000 --> 11:16.000
So Blinky is actually a driver.

11:16.000 --> 11:23.000
A very poor design of the driver but it's a driver.

11:23.000 --> 11:31.000
And for use cases use cases on the right side are those which are already in production.

11:31.000 --> 11:41.000
So naturally if you have a mixed criticality system where you have two parts.

11:41.000 --> 11:47.000
One part may be communicating with the outside world and another part is drive something critical.

11:47.000 --> 11:50.000
You may want to isolate them.

11:50.000 --> 11:53.000
This works quite well in CMRX.

11:53.000 --> 12:00.000
Components are easy to implement also due to the properties of the RPC mechanism.

12:00.000 --> 12:16.000
And new use cases which emerged as the system was developed and are not entirely written are is the use of CMRX as a hypervisor.

12:16.000 --> 12:24.000
Because the kernel itself is so small under 8 kilobytes that you could actually use it as a hypervisor.

12:24.000 --> 12:29.000
And you may say that a process is actually a virtual machine.

12:29.000 --> 12:34.000
And stuff and other operating system into the virtual machine.

12:34.000 --> 12:41.000
You have two modes. One mode would be a par virtualization because you will have the same problem as with drivers.

12:41.000 --> 12:49.000
That interrupt service handleers of the virtual machine would run with privileges of the hypervisor.

12:49.000 --> 12:53.000
And that's not a nice thing but trust zone can help here.

12:53.000 --> 12:58.000
I have, I want to replicate what QNX does.

12:58.000 --> 13:03.000
But there is more allowable work to be done yet.

13:03.000 --> 13:05.000
So this is in the pipeline.

13:05.000 --> 13:07.000
Question?

13:07.000 --> 13:13.000
It's not exactly exactly to me how remapping things work with RPCs.

13:13.000 --> 13:15.000
If you can go back to that slide.

13:15.000 --> 13:20.000
So how remapping works with the RPC?

13:21.000 --> 13:24.000
You have a slide?

13:24.000 --> 13:25.000
One slide?

13:25.000 --> 13:29.000
One more. You have shared in the stack.

13:29.000 --> 13:30.000
Yes.

13:30.000 --> 13:37.000
Are all RPCs synchronous and you must carry the stack of the client to the server and the server can spot for the client stack?

13:37.000 --> 13:39.000
Yes.

13:39.000 --> 13:49.000
So when you do this RPC call, this thread essentially runs with the rights of the server process.

13:49.000 --> 14:03.000
And for this MMU regions stack is carried over because you still have to be able to return.

14:03.000 --> 14:09.000
And there's one another region here called shared and this is also carried over.

14:09.000 --> 14:13.000
And the data and BSS are replaced by the server process.

14:13.000 --> 14:18.000
So to answer your question, it is synchronous.

14:18.000 --> 14:25.000
And yes, server process has full access to the client stack.

14:25.000 --> 14:28.000
It's one of the downsides of not having an MMU.

14:28.000 --> 14:31.000
Yeah, it's a constraint.

14:31.000 --> 14:36.000
It's a result of constraints of the hardware.

14:36.000 --> 14:40.000
So then if this is, you don't have your client and your server having separate stacks.

14:40.000 --> 14:44.000
It's a single stack that you would be pushing about to when you call it wrong.