WEBVTT

00:00.000 --> 00:19.320
So I'd like to start with three numbers. 26.9 million, as reported by Wikipedia, is the number

00:19.320 --> 00:25.400
of software engineers that were in the world in 2022.

00:25.960 --> 00:34.280
10174 is the number of commits to the GCC code base between GCC 13.1 and GCC 14.1.

00:36.120 --> 00:43.960
And 49 is the number of contributors to that work who contributed more than once a week on average.

00:45.400 --> 00:49.720
So each of them's response would for about half a million software engineers.

00:49.720 --> 00:57.720
And the purpose I'm Jeremy Bennett, I run in because I'm in this talk.

00:57.720 --> 01:00.720
I'm going to show you about one aspect of GCC development.

01:00.720 --> 01:07.720
In the hope that in time you'll be able to add to that number and the load on those 49 can be shared a bit wider.

01:09.720 --> 01:18.720
I hope when you get to the end of this talk, you'll have an idea about building functions and how you could add them to GCC to improve GCC's performance.

01:20.720 --> 01:31.720
So just to put the context, here's all the bits that make up a good new tool chain, but we're going to worry about the compiler today.

01:31.720 --> 01:41.720
And we're going to use as reference support for a derivative of risk 5, the core 5, which comes from the open hardware foundation.

01:41.720 --> 01:49.720
And that adds eight custom instruction extensions adding a few hundred instructions to the architecture.

01:49.720 --> 01:56.720
And we're going to look at how we initially support some of those using built-in functions.

01:58.720 --> 02:01.720
So first of all, what's a built-in function?

02:01.720 --> 02:10.720
And the answer is it looks like a regular C function, but it's important to understand it isn't an actual function.

02:10.720 --> 02:15.720
It's actually built into the compiler, the compiler understands it.

02:15.720 --> 02:22.720
And it's represented not by writing it in C code, but as patterns within the pattern match of GCC.

02:22.720 --> 02:31.720
And they're used for a number of reasons to access unique functionality, which doesn't map to a programming language.

02:31.720 --> 02:34.720
And we'll look at initially an example like that.

02:35.720 --> 02:52.720
But in the short term, when particularly with risk 5, you add something to the hardware, it's a quick way of exposing that functionality before you get to the full blown generating the code automatically from the C programming language.

02:52.720 --> 02:55.720
And as I say, they're called built-in functions.

02:55.720 --> 03:01.720
They are not functions. You can't take their address or pass them around as pointers as over.

03:05.720 --> 03:09.720
Here's an example, the built-in square root function.

03:09.720 --> 03:12.720
Okay, that's a standard pass. See, every architecture provides it.

03:12.720 --> 03:19.720
And it's designed to do square root, not using the standard library way of doing it, which is well established works on a broad range of architectures.

03:19.720 --> 03:23.720
But it says, please do square root as efficiently as you can on my architecture.

03:23.720 --> 03:31.720
And if your architecture is something like an old vax that has a square root instruction, then you can do it in a single instruction.

03:31.720 --> 03:33.720
Okay.

03:33.720 --> 03:37.720
So there's an example of a built-in function.

03:37.720 --> 03:41.720
And it's built-in, so you don't need a header.

03:42.720 --> 03:48.720
The standard built-in, you can find inside the GCC source in the built-ins.dev function.

03:48.720 --> 03:51.720
And here we are, there's the spec of built-in square root function.

03:51.720 --> 03:55.720
It's got a bit saying what the arguments are and what it's returned is.

03:55.720 --> 03:59.720
It's a built-in function, float float means it takes a float argument.

03:59.720 --> 04:04.720
It returns a float and it's got some attributes something to do with match rounding there.

04:04.720 --> 04:09.720
Custom built-ins, going your architecture specific configuration.

04:09.720 --> 04:12.720
So with that we'll be in config slash risk five for risk five.

04:12.720 --> 04:14.720
So we've got config risk five, risk five.

04:14.720 --> 04:17.720
Hive and built-ins, actually dot cc.

04:17.720 --> 04:20.720
That's the c code, but there's deaths as well.

04:20.720 --> 04:22.720
Okay.

04:22.720 --> 04:26.720
Why don't you just use in line assembly code?

04:26.720 --> 04:28.720
And here's an example.

04:28.720 --> 04:29.720
I've picked one instruction.

04:29.720 --> 04:31.720
This is from Core 5.

04:31.720 --> 04:36.720
It's the external load word.

04:36.720 --> 04:40.720
It looks like a load, but it's used to load between cores.

04:40.720 --> 04:41.720
You're trying to synchronization.

04:41.720 --> 04:44.720
It's actually a synchronization instruction.

04:44.720 --> 04:45.720
Okay.

04:45.720 --> 04:49.720
And the problem with the assembly function is it's a bit of a black box.

04:49.720 --> 04:51.720
The compiler knows to put it in here.

04:51.720 --> 04:56.720
You tell it a bit about what might be where the arguments are expected to go.

04:56.720 --> 04:58.720
And anything that might get clubbed.

04:58.720 --> 05:03.720
But it really does spoil your data flow through the program because you don't know much else.

05:03.720 --> 05:05.720
And the compiler must take a cautious view.

05:05.720 --> 05:06.720
Okay.

05:06.720 --> 05:11.720
When you do built-in functions and it's a black box, that instruction will appear.

05:11.720 --> 05:12.720
Come what may.

05:12.720 --> 05:13.720
Okay.

05:13.720 --> 05:14.720
It's nothing to be done about it.

05:14.720 --> 05:19.720
Built-in functions offer some advantages because they do understand the data flow properly.

05:19.720 --> 05:23.720
And the patterns they have can be used and recognized and used elsewhere in gcc.

05:23.720 --> 05:25.720
So unlike to be for this instruction.

05:25.720 --> 05:29.720
But for other instructions you may see, oh, that's a pattern I can fit it in somewhere else.

05:29.720 --> 05:31.720
So you get a bit of code generation coming out.

05:31.720 --> 05:33.720
And they're generally available to be optimized.

05:33.720 --> 05:36.720
And we'll look at an example of that in a moment.

05:36.720 --> 05:41.720
So let's look at implementing a simple built-in.

05:41.720 --> 05:45.720
So the event load is like event load where I'm not external load.

05:45.720 --> 05:46.720
Event load word.

05:46.720 --> 05:50.720
So it's a synchronization load that you use to synchronize multiple cores in a multi-fix.

05:50.720 --> 05:52.720
So you'd never generate it from C.

05:52.720 --> 05:55.720
So you need to have a built-in function to generate it.

05:55.720 --> 05:59.720
And it looks like a typical load function.

05:59.720 --> 06:05.720
You've got a destination register and a constant index on a base register.

06:05.720 --> 06:10.720
And here's how you'd write that built-in in the source code.

06:10.720 --> 06:17.720
You'd say I'd call built-in, Rys5C, the ELW and then the address, which is presumably a semaphore.

06:17.720 --> 06:18.720
Okay.

06:18.720 --> 06:23.720
And with those zero, it would generate that code.

06:23.720 --> 06:26.720
You may as well have used in line assembly.

06:27.720 --> 06:32.720
However, if I turn on O2, the compiler understands what's going on in the patterns.

06:32.720 --> 06:36.720
It's able to up to Rys round it and give me much more efficient code.

06:36.720 --> 06:41.720
I was doing this within line assembly, it would struggle to do that.

06:41.720 --> 06:46.720
Naming built-ins, there's a convention for naming them.

06:46.720 --> 06:51.720
If you're doing for your architecture something that's already used as a standard built-in,

06:51.720 --> 06:53.720
across all our architectures use that.

06:53.720 --> 06:59.720
But typically, you name it in an architecture specific way to avoid confusing things.

06:59.720 --> 07:04.720
So for Rys5, the credentials and score built-in, underscore Rys5, underscore vendor name.

07:04.720 --> 07:06.720
So in that case, CV for score 5.

07:06.720 --> 07:11.720
And then, in our case, ISR extension, because we've got multiple ISR extensions and then the name.

07:11.720 --> 07:18.720
So hence, ELW, ELW, because the ISR extension is ELW and the instruction is a ELW.

07:18.720 --> 07:26.720
And there we are, and specification is that, and there's the example we saw earlier.

07:26.720 --> 07:32.720
So, let's go and look inside Rys5 built-in.cc.

07:32.720 --> 07:38.720
And we've got a macro called Define Rys5 built-in, which is sitting on top of some standard stuff here.

07:38.720 --> 07:44.720
And we're going to tell it, the pattern in the machine description file,

07:44.720 --> 07:46.720
and we'll come to that in a moment.

07:46.720 --> 07:50.720
The name of the built-in, where we've already talked about, in the built-in type,

07:50.720 --> 07:59.720
is it a built-in direct, or there's a whole load of them there.

07:59.720 --> 08:01.720
You can have there.

08:01.720 --> 08:03.720
And the return type and argument types.

08:03.720 --> 08:06.720
So in this case, it's a void return.

08:06.720 --> 08:07.720
No, it isn't a void.

08:07.720 --> 08:10.720
It's got a returns of value, but it takes a void argument.

08:10.720 --> 08:12.720
Okay, and the name of the availability predicate.

08:12.720 --> 08:18.720
That's because this instruction is only available with 34 bit, Rys5, but not 32 bit, but not 64 bit.

08:18.720 --> 08:21.720
So you have a general availability, is it available?

08:21.720 --> 08:25.720
So, that's what we've got there.

08:25.720 --> 08:27.720
This is a direct built-in.

08:27.720 --> 08:28.720
Okay?

08:28.720 --> 08:31.720
Our instance is the import one, and this is what I'm going to focus on.

08:31.720 --> 08:33.720
This is the instruction pattern.

08:33.720 --> 08:37.720
This is how the whole of GCC is driven through these patterns,

08:37.720 --> 08:41.720
written in a list by a list-like language.

08:41.720 --> 08:44.720
And you basically have where you define an instruction,

08:44.720 --> 08:46.720
say what the name of that instruction.

08:46.720 --> 08:48.720
That's the name we put in the macro earlier.

08:48.720 --> 08:53.720
An RTL template that tells it what it's looking for and try to fit to any conditions.

08:53.720 --> 08:57.720
An output template to tell it how to generate the assembler and any attributes.

08:57.720 --> 09:04.720
And they are in, for Core 5, which is at their own side, the Rys5 convict, but they're Core 5.MD for machine description.

09:04.720 --> 09:07.720
So here we've got here's the event load word built-in.

09:07.720 --> 09:10.720
Let's take you through the various bits there.

09:10.720 --> 09:11.720
Okay?

09:11.720 --> 09:13.720
What happened there?

09:13.720 --> 09:14.720
There we go there.

09:14.720 --> 09:15.720
There we go.

09:15.720 --> 09:16.720
But the name.

09:16.720 --> 09:17.720
So there's the name.

09:17.720 --> 09:20.720
There's the RTL template, we'll come back to that in a minute.

09:20.720 --> 09:22.720
There's a condition.

09:22.720 --> 09:26.720
So that extension, a better being enabled, and it's 64 bit.

09:26.720 --> 09:28.720
There's the output template.

09:28.720 --> 09:30.720
That does look a bit like an assembly instruction.

09:30.720 --> 09:32.720
And there there's the attributes at the bottom.

09:32.720 --> 09:34.720
So let's go into a bit more detail.

09:34.720 --> 09:36.720
The RTL template.

09:36.720 --> 09:37.720
Okay?

09:37.720 --> 09:42.720
So this is saying, this is logically what's going on.

09:42.720 --> 09:43.720
It's a set instruction.

09:43.720 --> 09:45.720
I'm setting something from memory.

09:45.720 --> 09:46.720
Okay?

09:46.720 --> 09:50.720
It's, and I'm trying to match the different operands.

09:50.720 --> 09:54.720
So the first one I've got is set something where the address is.

09:54.720 --> 09:59.720
Something to store about, and then what you want to store at that address.

09:59.720 --> 10:00.720
Feel it.

10:00.720 --> 10:02.720
Go through and look a bit.

10:02.720 --> 10:03.720
Okay?

10:03.720 --> 10:07.720
So first of all, this is where we're saying where we want to store.

10:07.720 --> 10:08.720
Okay?

10:08.720 --> 10:11.720
We're saying set, and the thing I'm going to set, it's SI single integer.

10:11.720 --> 10:12.720
It's 32 bits wide.

10:12.720 --> 10:14.720
SI is what GCC is in.

10:14.720 --> 10:15.720
It's operands zero.

10:15.720 --> 10:17.720
They count from zero.

10:17.720 --> 10:20.720
It's a register or a plan, because it has to be a register.

10:20.720 --> 10:22.720
And it's equals our, what to mean?

10:22.720 --> 10:25.720
R is it's a register operand equal before it says being written.

10:25.720 --> 10:26.720
Okay?

10:26.720 --> 10:27.720
Okay?

10:27.720 --> 10:34.720
And then we'll come back to the aspect of volatile in a minute.

10:34.720 --> 10:38.720
But then we're talking about where are we going to get the second operand.

10:38.720 --> 10:39.720
So it's operand.

10:39.720 --> 10:41.720
It's 32 bits.

10:41.720 --> 10:42.720
It's integer, but the address.

10:42.720 --> 10:45.720
Remember we showed it was index on a base register.

10:45.720 --> 10:48.720
And I've type P, and that's for a pointer.

10:48.720 --> 10:49.720
P for a pointer.

10:49.720 --> 10:50.720
Okay?

10:50.720 --> 10:54.720
And the use of that mem in front of SI says dereffancy.

10:54.720 --> 10:56.720
It's a right value, not a left value.

10:56.720 --> 11:02.720
And lastly, we've got, let's come back to this unspec volatile.

11:02.720 --> 11:06.720
Unspec volatile indicates a side effect.

11:06.720 --> 11:10.720
Now, of course, in this side effect is to do with synchronizing architectures.

11:10.720 --> 11:11.720
Okay?

11:11.720 --> 11:14.720
And we put at the end of it a bit more detail.

11:14.720 --> 11:19.720
And that's just allow us to divide these up particular architectures.

11:19.720 --> 11:20.720
Okay?

11:20.720 --> 11:24.720
There's no real attempt, because it has side effects.

11:24.720 --> 11:30.720
We see won't try and say, oh, I've spotted this sort of pattern in general code and put it in there.

11:30.720 --> 11:31.720
Okay?

11:31.720 --> 11:33.720
So now let's have a look at the condition.

11:33.720 --> 11:34.720
That's very simple.

11:34.720 --> 11:37.720
The condition just says, I've got to have this extension enabled.

11:37.720 --> 11:40.720
And I'm not interested in 64 bit architectures.

11:40.720 --> 11:43.720
That's just to see, there's just to see expression.

11:43.720 --> 11:44.720
Okay?

11:44.720 --> 11:46.720
Those are hash defined in the spec of this five.

11:46.720 --> 11:47.720
Okay?

11:47.720 --> 11:51.720
And then we come to, oh.

11:51.720 --> 11:53.720
The output template.

11:53.720 --> 11:54.720
Okay?

11:54.720 --> 11:59.720
And that's just a string or fragment of code returning string.

11:59.720 --> 12:03.720
And the operands, remember, we numbered those operands north for the thing I was setting.

12:03.720 --> 12:05.720
One for the actual address.

12:05.720 --> 12:06.720
I was getting it from.

12:06.720 --> 12:08.720
Those are referred to with percent.

12:08.720 --> 12:09.720
Okay?

12:09.720 --> 12:10.720
So percent zero.

12:10.720 --> 12:13.720
That's the thing I'm trying to send to the right.

12:13.720 --> 12:15.720
And percent one.

12:15.720 --> 12:16.720
That's the op-amp.

12:16.720 --> 12:19.720
But I've qualified it with an eight sets and a dress operand.

12:19.720 --> 12:22.720
So build something that looks like an address operand.

12:23.720 --> 12:24.720
Okay?

12:25.720 --> 12:29.720
And that and the attributes, they're mostly other effects.

12:29.720 --> 12:32.720
You only really need to worry about the attributes.

12:32.720 --> 12:34.720
I've put some places.

12:34.720 --> 12:38.720
The mode is, sites of 32 bit behavior, and it's a load type.

12:38.720 --> 12:43.720
Honestly, for initial plugin you don't have to worry about that.

12:45.720 --> 12:50.720
So there are more for optimization passes, which we're not going to go into today.

12:50.720 --> 12:55.560
So, and then we come back and you see we put everything together and that's where we started

12:55.560 --> 12:56.560
off.

12:56.560 --> 13:01.960
That's the macro we call and it's referring to this instance and the associated instruction.

13:01.960 --> 13:07.040
So, I say the two big ones are built in direct and built in direct no target.

13:07.040 --> 13:12.760
If I correct, no target is effectively void functions.

13:12.760 --> 13:22.360
So, the return and argument types, the built in.

13:22.360 --> 13:27.320
So, these are built up by a complex set of macros where what you're trying to do is just

13:27.320 --> 13:31.840
get a set of macros that generate to say what the type of each argument is and they construct.

13:31.840 --> 13:42.160
So, if you look here, if we can go back, yeah, we're trying to get these function type

13:42.240 --> 13:44.760
is the return type and arguments, okay.

13:44.760 --> 13:53.360
So, if we go on fruit, there we are, you'll see we've defined a type here which is, we've

13:53.360 --> 14:01.280
got types here for having unsigned single integer, then a void return and unsigned integer,

14:01.280 --> 14:04.680
unsigned integer void points are in so forth.

14:04.760 --> 14:15.400
So, lastly, we talked about the availability predicate and we've got some examples here,

14:15.400 --> 14:21.360
see the early W, which is the target and it's not 64 bit.

14:21.360 --> 14:23.720
So, we end up there.

14:23.720 --> 14:32.120
So, here is the constructed type, the constructed type is it's USI, unsigned 32 bit and the

14:32.120 --> 14:39.120
function type is it takes, so that's the return and it takes a void pointer as argument.

14:39.120 --> 14:43.480
It's rather the cunkey, but it's sort of living within the framework we've got the NGCC.

14:43.480 --> 14:49.040
All you're trying to say is it takes in returns 32 bit integer and it takes a void pointer.

14:49.040 --> 14:55.080
And the predicate, we've put their CVLW, that's the check it's got the isre extension enabled

14:55.080 --> 14:57.280
and that it's not 64 bit.

14:57.960 --> 15:06.200
So, that's actually all you need to do and all this stuff is freely available in the call 5 upstream.

15:06.200 --> 15:12.680
And the biggest problem there is going to be getting the hang of the patterns that you're putting together.

15:12.680 --> 15:16.680
Let's show you a bit more about those patterns.

15:16.680 --> 15:25.280
What happens here, if I've got two assembly functions in the assembly, okay, what to do

15:25.280 --> 15:28.800
is scale or add across a SIMD register, okay.

15:28.800 --> 15:38.040
One where it's add, you've got RS1, it's a SIMD, so potentially four bytes and you can

15:38.040 --> 15:42.560
either add a constant from RS2 or you can add an immediate constant, so two instructions.

15:42.560 --> 15:49.920
But we don't have two plugins, we've got a single built in, until the single built in,

15:49.960 --> 15:55.200
which says here's the thing I want to add using this built in and here's this argument.

15:55.200 --> 16:00.400
Now, is that something I need to put in a register or is it something that I can do as an

16:00.400 --> 16:01.400
immediate?

16:01.400 --> 16:05.040
And the answer is if you call this built in, remember it's not really a function, I can look

16:05.040 --> 16:06.880
at exactly what's being put there.

16:06.880 --> 16:12.680
If you give me a constant argument, that will fit into a six bit signed integer, then I'll

16:12.680 --> 16:16.280
generate the second instruction, but if it's not, I'll put it in a register and use the

16:16.280 --> 16:19.880
first instruction, okay.

16:19.880 --> 16:22.760
So how do we do that?

16:22.760 --> 16:26.160
Well, the answer is we've got the same defining the built in, let's skip over that.

16:26.160 --> 16:30.920
But the answer is in the pattern here, because you'll see the register operand, now I've

16:30.920 --> 16:35.240
got not, it's a register, but it's a register comma register, okay.

16:35.240 --> 16:42.720
So actually, I'm going to specify a pair here and then on the operand, I've said the

16:42.720 --> 16:48.360
first operand, operand one, that's a register register, so that's my input.

16:48.360 --> 16:58.880
And then my constant is either a six-bit constant that I can use and see V6 is a constant

16:58.880 --> 17:04.280
spec, I've done it because there isn't a standard one for it, or it's a register, okay.

17:04.280 --> 17:07.520
And this pattern match will look at the code as it comes through as it's going through

17:07.520 --> 17:16.240
the compiler and say, oh, does it match a six-bit constant for that second operand, if so,

17:16.240 --> 17:20.600
I will treat this as though it's R, R, CV6, and if it doesn't, I'm going to have to put

17:20.600 --> 17:23.680
in, I'll treat it as R, R, R, okay.

17:23.680 --> 17:27.920
And then going down when we look at the pattern, you'll see that we've got two patterns

17:27.920 --> 17:28.920
there.

17:28.920 --> 17:33.800
If I match the first of the pairs, I'll take the first pattern, and if I match the second

17:33.800 --> 17:37.600
of the pairs, I'll take the second patterns of the first pair, will give me the constant

17:37.600 --> 17:41.920
version, the second pair will give me the load into a red straw brand or a red straw

17:41.920 --> 17:42.920
version.

17:42.920 --> 17:43.920
Okay.

17:43.920 --> 17:46.920
So, as I say, there we are, let's go through.

17:46.920 --> 17:48.920
No, sorry.

17:48.920 --> 17:51.920
So, there we are.

17:51.920 --> 17:57.480
Now, you'll notice here, I don't like, I need to say something, well, this could be a six-bit

17:57.480 --> 17:58.480
integer register.

17:58.480 --> 18:02.760
So, I need a special thing to say, is it a six-bit, it can be either, that's in the

18:02.760 --> 18:05.560
success operand, and as I've got the CV6.

18:05.560 --> 18:12.320
So, what we find is in the success, so it's either a constant or it's a register, and

18:12.320 --> 18:16.920
the constant we define it as, actually, it's got to be a constant integer, and it's got

18:16.920 --> 18:17.920
to be in that range.

18:17.920 --> 18:23.360
So that's how you write a custom constant there.

18:23.360 --> 18:30.280
And, as I say, the output template, there's two, so the first one for the first set

18:30.280 --> 18:35.120
of things, and the second for the second set of constraints.

18:35.120 --> 18:40.080
And so, here's an example here, the first one here, I've got a variable x.

18:40.080 --> 18:44.160
It's got to generate the register version of the operand, okay.

18:44.160 --> 18:48.760
The second version, I've just given it a constant argument, and it goes, oh, that's 28,

18:48.760 --> 18:54.720
that fits in a CV6, a six-bit assigned constant, so I can generate the six-bit here.

18:54.720 --> 18:59.160
And remember, this is sitting in the infrastructure GCC, I didn't have to do anything about

18:59.160 --> 19:01.320
telling it, put the value in registers.

19:01.320 --> 19:04.600
It knows it's got to do that, the register allocated also, sort that all out.

19:04.600 --> 19:09.520
So putting it in a register all happens for me automatically.

19:09.520 --> 19:20.440
We use in arguments, the multiplier accumulating structure actually uses its first operand,

19:20.440 --> 19:24.280
is also, it's both the destination and one of the operation, because it's doing a multiplier,

19:24.280 --> 19:25.760
it's accumulating.

19:25.760 --> 19:33.440
So you notice here, in this example here, oh, sorry, I've got operand 0, operand 1, the second

19:33.440 --> 19:38.680
operand 2, because it's a three-operand instruction, the third operand, though, you'll notice

19:38.680 --> 19:44.400
its type is not anything, it's just 0, it's the same as operand 0, and that's how you specify

19:44.400 --> 19:46.400
duplication.

19:46.400 --> 19:52.280
And then the pattern is 0, 1, 2, and of the actual first operand, in that instruction,

19:52.280 --> 19:58.840
is used, both to supply the base thing you're adding to and is the result.

19:58.840 --> 20:01.600
Okay, so what next?

20:01.600 --> 20:06.400
So if you want to look at this, because in 25 minutes you have to go through quite fast,

20:06.400 --> 20:09.120
I've given you some basic things.

20:09.120 --> 20:12.880
The source codes there in the open hardware group get repositories, much of that's now

20:12.880 --> 20:16.080
going upstream, so you can find it in upstream.

20:16.080 --> 20:19.560
The open hardware group, which is now the open hardware foundation, but that should still

20:19.560 --> 20:23.400
redirect, because it's become a part of eclipse this month.

20:23.400 --> 20:26.120
And lastly, this is your Bible.

20:26.120 --> 20:32.920
All of this is in the GCC internals manual, and that will tell you how to do it all.

20:32.920 --> 20:37.880
Getting a use, as a user, you can just download the tool chains and use them as a developer.

20:37.880 --> 20:42.320
If you want to work on open hardware group, you can join the tools channel as a monthly

20:42.320 --> 20:48.720
training meeting for engineers, and you can submit your patches against the development

20:48.720 --> 20:52.200
branch, if it's out of tree, some of it's now upstream, you can then just put them in

20:52.200 --> 20:55.480
as ordinary GCC patches.

20:55.480 --> 21:01.920
And lastly, acknowledgements, I didn't do this all on my own, there's a team behind me who's

21:01.920 --> 21:02.920
done this.

21:02.920 --> 21:08.600
There is something very unusual about this slide, can anyone spot what it is?

21:08.600 --> 21:16.240
Well, a Charlie might be upset about that, but you're quite right, yeah, Charlie is actually

21:16.240 --> 21:19.680
not a good whole Charlie, it's a boy Charlie, but yeah, exactly, so much this, the team

21:19.680 --> 21:23.160
worked on this, as it happened, they weren't chosen for this purpose, but they happened

21:23.160 --> 21:31.200
to be almost all not male, so thank you very much, and I will take any questions.

21:31.200 --> 21:32.200
Yes.

21:32.200 --> 21:38.200
If I have a height of a gaming console that has a floating-point dart product approximation,

21:38.200 --> 21:39.200
yes.

21:39.200 --> 21:40.200
What?

21:40.200 --> 21:45.200
It takes him from fixed registers and certain numbers, would that be a problem?

21:45.200 --> 21:52.200
So, if you have a hypothetical gaming console that had a floating-point instruction,

21:52.200 --> 21:58.000
I can't remember what the instruction was, dot product instruction, but actually the

21:58.000 --> 22:04.720
op-ants had to be in specific registers, that shouldn't be a problem, because if you look

22:04.720 --> 22:12.720
back here, when I did my custom constraints, where did I get it here, the CV6, now there

22:12.720 --> 22:17.280
I was constraining a register or a particular constant, I think I will look round because

22:17.280 --> 22:20.880
people here and know this probably better than me, you could write your own constraint

22:20.880 --> 22:25.600
to say what the registers were, the permitted registers were there, and hopefully the

22:25.600 --> 22:29.720
registration would then try and get your op-lands into the right place, so you didn't

22:29.720 --> 22:33.360
act yet and up with the blow-up on the register allocation, but yeah, you could just be able

22:33.360 --> 22:34.360
to constrain.

22:34.360 --> 22:35.360
Yeah.

22:35.360 --> 22:40.200
You could also force it into an actual register directly.

22:40.200 --> 22:47.200
So, you can just say, it only works in this register and just say, it's that register.

22:47.200 --> 22:54.200
No, it's not, yeah, it's nice if you can let the compiler do all the hard work, yeah.

22:54.200 --> 22:56.200
Here, over here.

22:56.200 --> 23:02.760
For the same dot product instruction, how do you be able to say that the resident register

23:02.760 --> 23:16.600
can not be used for the next 15 cycles or something like that, so the question is,

23:16.600 --> 23:21.160
the dot product instruction, the hypothetical we've just heard at, how would you then extend

23:21.160 --> 23:24.680
the pattern say, and by the way, having used it, you cannot use this register again

23:24.680 --> 23:26.680
for another 15 cycles.

23:27.000 --> 23:31.000
You can make a scheduling distribution for that matter.

23:31.000 --> 23:35.720
Right, so it can be done, and the answer is, you need to make a scheduling description, and

23:35.720 --> 23:41.640
then we get into the whole question of how GCC they're scheduling, and that's in, that's

23:41.640 --> 23:49.720
in the next tutorial about about 27, but there are people in this room who can point

23:49.720 --> 23:50.720
you in the right direction.

23:51.600 --> 23:55.600
Well, I'll get the door, right?

23:59.600 --> 24:02.080
Any more questions?

24:02.080 --> 24:10.080
Yes, I did have another one, but the term is whether the building is used, if it's in a proximation,

24:10.080 --> 24:16.320
do you typically have to have fast math enabled for, if you choose the built-in, and you

24:16.320 --> 24:20.880
don't have fast math, would you expect it to use an approximate built-in?

24:20.880 --> 24:26.320
So the built-in is, so the question is, are the constraints that say you must have something

24:26.320 --> 24:32.480
for the built-in to be run, and if you look at the built-in here, when I call the built-in

24:32.480 --> 24:37.440
is a particular name, if I say I want to use that built-in, that built-in will be used.

24:37.440 --> 24:42.480
If you've got an exact version and a proximate version, then either it's a very complex

24:42.480 --> 24:46.720
built-in that may be is able to work out which to use, a bit like I worked out whether to do

24:46.720 --> 24:51.920
the constant or the register, which will be run possibly, or you have two built-ins built-in

24:51.920 --> 24:59.360
exact and built-in approximate. You could do it either way round, I think it would be there.

25:00.400 --> 25:02.560
Any other questions?

25:02.560 --> 25:08.880
How would you go about implementing a lot of instruction level built-in, but something that affects

25:09.200 --> 25:16.560
the more the language for that, so you've got to do a constant feature that's a special point of time?

25:17.600 --> 25:21.200
All right, I think those are probably, so the question is, how would you go about doing something

25:21.200 --> 25:25.920
to do much higher level in a built-in, perhaps looking at the front end and the different

25:25.920 --> 25:31.760
function type? I think you're out of built-in technology there, but you can have built-ins that are

25:31.840 --> 25:39.120
quite high level, that may be quite complex instruction generation. I think if you're looking at

25:39.120 --> 25:43.120
about changing the type system, you're probably looking at other parts of the compiler,

25:43.120 --> 25:52.240
you're probably not doing it with a built-in, that would be my, and very likely attributes are

25:52.240 --> 25:55.040
probably your friend in this case, okay?

25:55.040 --> 26:05.040
Yeah, yeah, it's fair comment, it is front end thing, and this built-ins are a back end technology,

26:05.040 --> 26:07.040
okay?

26:07.040 --> 26:13.600
So you want global pointers that are, have the addresses included because of the relative

26:13.600 --> 26:19.600
of addresses and so, absolutely addresses, which are definitely also needed to touch something

26:19.600 --> 26:24.880
in a little bit. So you want global pointers, but how you generate them depends?

26:26.160 --> 26:30.400
Positioning independent point. Positioning independent point is, I think that sounds like using

26:31.280 --> 26:37.360
type constraints, all the stuff that's in the embedded extensions to see,

26:38.320 --> 26:42.240
sort of helps you in that space, and that's sort of all fun-tending and stuff. I don't think the

26:42.240 --> 26:47.120
built-ins built-ins may then use that when it percolates through to choose what code they generate,

26:47.120 --> 26:52.000
but I don't think you can do it just with built-ins. And I've run out of time. Thank you all very much.

26:52.960 --> 27:03.120
These tutorials get improved regularly, so don't hesitate to send me email and say,

27:03.120 --> 27:07.280
you could make it better by doing something, and then next time I give it, it will get better.

