Endpoint Security , Governance & Risk Management , Internet of Things Security
Memory Safety by Design: How Emerging Hardware Blocks Bugs
Hardware for IT/OT Runs C Code, Blocks Exploitation of Numerous Vulnerability Types Mathew J. Schwartz (euroinfosec) • March 18, 2024What if the world had access to memory-safe hardware designed to run existing C code and to block numerous types of vulnerabilities from being exploited?
See Also: SASE: Recognizing the Challenges of Securing a Hybrid Workforce
Enter Britain's Digital Security by Design, sponsored by government-funded UK Research and Innovation and backed by many organizations - including RISC chipmaker Arm, the University of Cambridge, Google, Microsoft and the National Cyber Security Center.
"We're changing the instructions that a computer uses to run software," said John Goodacre, the director of DSbD, who is also a professor of computer architectures at the University of Manchester.
The initiative includes building and delivering memory-safe hardware that uses CHERI, or Capability Hardware Enhanced RISC Instructions and is designed to prevent unsafe C or C++ patterns from running and to mitigate security vulnerabilities. The initiative also includes preparing training for the hardware and bolstering required tools.
"We are now starting to see the first commercial adoptions - mostly in the operational technology rather than the IT area," Goodacre said, adding that existing development tools are being upgraded. As a result, they can be used not just to compile the code but also to debug existing code that runs on non-memory-safe hardware. Debuting soon: A "Tested on Morello" program will enable developers to take their code, compile it for the platform, run it on this type of computer and "it will then feed back where your problems are, so you then deploy that less buggy software on the traditional systems," he said.
In this video interview with Information Security Media Group, Goodacre also discussed:
- Immediate applications for memory-safe hardware in IT and OT environments;
- How memory safety reduces the attack surface - potentially by up to 70% - while adding performance improvements and increased developer productivity, likely for little or no additional cost;
- How memory-safe hardware complements the use of memory-safe programming languages.
Goodacre spent 17 years as its director of technology and systems at Arm, where he defined and introduced the first multicore processors and other widely deployed technologies. His research interests include new processing paradigms, web-scale servers, exascale-efficient systems, and secure and ubiquitous computing.
Transcript
This transcript has been edited for clarity.
Mathew Schwartz: How can we make hardware and software harder to hack? Hi, I'm Mathew Schwartz, executive editor with Information Security Media Group. Joining me to discuss this question is John Goodacre, a professor of computer architectures at the University of Manchester. He's also director of the Digital Security by Design Challenge, sponsored by UK research and Innovation. Thank you so much for being here today,
John Goodacre: Thanks for having me on today.
Mathew Schwartz: I'm excited because we're going be talking about some really interesting sounding initiatives revolving around the fact that so many of the attacks we see today against digital systems involve exploitable vulnerabilities. Theoretically, I suppose any software we code now or in the future, definitely the past, is always going to have bugs. I know that to help combat this, the Digital Security by Design Challenge is testing the use of purpose-built microprocessors designed to prevent and mitigate memory safety issues, such as we see in C, and C++. That is a very long-winded intro to try to get some of the baseline discussion points down. And so my open question for you, if it's okay, would be what is required to re-architect existing code to work with these new processors that you've been pursuing or experimenting with?
John Goodacre: Yes, so, let me just back up a little bit, because what actually is changing here is the definition of how a computer, how this process, actually runs software. Now, obviously, there's lots of ways of running software, but there's a dominant one that has the trillions and trillions of lines of existing code against it at the moment. So if you're going to want to make how computers run software more secure, so they can't be exploited, etc., you can't throw all that software away. So part of the design characteristics - and this is a technology that was researched in a program called CHERI from the Cambridge University, that we're talking about, if people want to search for CHERI.
Mathew Schwartz: C-H-E-R-I.
John Goodacre: Exactly, that's the one. So the CHERI architecture, what it does is it looks at the existing architecture and the existing way that software runs on a computer, and then modifies it for minimal impact to the software itself. So, for example, in well-written code, and I'll come back to what I mean by that in a moment, so well written C code, it's actually just a recompilation. You just recompile it. In other words, rather than saying "run on CPU type one," you say, "run on CPU type two." And there you go, you've got memory-safe C code. So there's lots of talk about writing and memory safe languages. But clearly, there's I think they estimate that as of 2020, there's just over 2 trillion lines of C code, in active use. You're not going to be rewriting that anytime soon.
If the minimal entry point is just recompilation, life is good. Now, as you mentioned in your introduction, unfortunately, there's no such thing as perfect software. And we could probably spend all of our allocated time on what that means. But in essence, even working code can have issues in it in the way that it's been written. And those do need changing. But to just to give a data point, we had a small company work with one of our programs, and their tech challenge was, oh, I'm going to make the whole of the stack of an operating system called BSD - so it's like Linux, slightly different version - from the kernel all the way up into a desktop, full graphical desktop with a browser, memory safe. And this one engineer did it in three months. So if you think that that's the 10,000, 20,000, 30,000 packages of software, that one engineer could - basically a fraction of a percentage of the code needed changing, undoing issues that were inherently in the code anyway, which in some cases, were actually exploitable vulnerable bugs in the shipping software - that's the entry point. Now, the technology also has the ability to bring some performance and finer-grain protections in it as well. And at that stage, the architecture will be going, I'm using processes at the moment, they're very heavyweight, they really slow my system down are going to only have so many. And at that point, they are having to refactor or the program the architecture of their application design. So: simplest case, recompile good code, it's now protected.
More challenging, and this is where obviously, we've been investing in a lot of projects: where can you change it to actually see the performance benefits and the high levels of protection beyond just memory safety?
Mathew Schwartz: So you do need to have access, for your ideal scenario here, to an actively maintained code base. You might not be able to grab your COBOL or something if it's deprecated?
John Goodacre: That's right. If you're running a system that's got what you would call legacy binaries on it, those will still run; this system is backwards compatible. But the things around it may be protected. The other thing that we're seeing people doing is putting that code into boxes. That legacy code is put in a protected box, so even if there's something unpleasant, that can't get out. So there are additional protections that such systems can have. But to have the protection within the code of that application, it is a developer activity and having to decide, one, to run a platform that's got this technology in, and recompile, the code to use it.
Mathew Schwartz: You also mentioned the ability to do some more sophisticated types of things. As I understand it, I might get the terminology wrong here, but the processors that you're working with give you the ability to put certain kinds of functionality or perhaps certain kinds of data into more protected sections of the processor?
John Goodacre: Yes, that's one way of looking at it. So what we're talking about is, we're not really talking about a change to a specific processor. So it's not like, oh, the processor in my phone is going to have this. What we're doing is we're changing the instructions that a computer uses to run software. So in essence, today, there's really only three such architectures, the Arm architecture that a lot of people know from their mobile phones and things, the x86 architecture that people see in their desktops and things, and the evolving architecture called RISC-V. So if you change those three architectures, then all the chips change. So it's not a specific process, and it's less, change the architecture and then the implementations for all chips.
Going back to the specific question, then, is yes, today, really what you've got is if you have a vulnerability in your code, and this is what we've seen all the time - so I think this week there was some really serious ones on VMware, out of the four, three of them were of a class that would have been stopped and not exist, if there would have been running on this level of software. Those were memory safety issues. Now the other one, which is where there's a problem in the design, where an architectural problem or there's a design implementation issue where the data basically says, there's a problem, but I can now go do something that's much larger influence.
We call it the blast radius. So if you have a single entry point today, they can really get across pretty much all of your computer and potentially your network, because they have that one entry point.
This new technology is called compartmentalization. You can compartmentalize today. So you can put things in separate processes, which helps, you can put in user space and kernel space. But there's only, in that case, two, virtual machine, three. But in essence, there's not many compartments.
Now imagine if we look at something like the last slew of bugs that we've had on our mobile phones, where somebody could send you an image in an advert or in a text message, that if you ever view the image, they find a vulnerability in the image decoder, and bang, they've got through to the root of your machine and you're in a malware situation. So those entry points, what you can do is you can start putting boundaries around little functions of code. If you don't trust this third party code quite as much as you might want to, you can put it in a box. And the beauty of that is, you can do it without losing the performance of today's isolation. So you can actually gain performance, where people are putting boundaries through processes or virtual machines.
So we've had people, for example, changing the cloud stack. If you're running a cloud service, you've got some container componentized software, so Docker and Kubernetes stuff, you've got virtual machines with Dockers in them, they said, none of that, that's too heavyweight, we'll just put it in one of these boxes. And they were managing to spin them up - the microkernels - 300 times faster. This could be could be quite a significant impact. But again, that's changing the software. So it's still what I'd class as research-level, but the idea that we could see orders of magnitude performance uplift because of something that's more secure, It could actually be a win-win for the developers as well as the end users who are obviously suffering at the moment.
Mathew Schwartz: That's a bit of a holy grail in security land to be actually saying, look, there's a really strong business case, and by the way, the security is going to improve. Everyone goes: do it now.
John Goodacre: Yes, and for any of your financial people, I can throw another one in, which is: If you ask for it, it probably won't cost you any more either. Because it's just an inherent way of how software is being run on a chip, it can run it this way, or it can run it that way. This is part of the problem, and why the program that I'm running exists, which is that there's a market failure, because there's not a way of charging for the benefit. Okay, so the benefit will just arrive and if you if you asked for it to be in your solution, then it will be the same cost. It's not a product that will need support and maintenance, it's not something you will buy, to add. It's just an inflection a change in the way the way computers work.
So the biggest challenge now is to make people change their mindset. You're not buying a new firewall or a new bit of solution, you're just saying: Are you sending me a solution that can actually stop memory vulnerabilities? You know, it's that kind of question. Well, how are you going to stop this? We didn't really talk about it at the beginning but Google, Microsoft, Cisco are the big players on the IT side, we can talk later about the OT side if you like, but about 70% of the vulnerabilities that have been happening in the last five years would not have occurred. Now Microsoft actually went a bit further and said: Of the patches that we actually issued in the last five years, there would have been 70% less of them.
So now think about the cost of deploying Patch Tuesdays, checking and implementing. Take 70% of that out, just by the recompilation running on the new hardware, and you can see that there's an immediate operational benefit to a business. They couldn't care less what the technology is, they just know that they're not having to spend all this time testing as many patches or if you're a provider running a service, worst case scenario, you have to get in a car to go patch a meter at the end of the road, terrible. Seventy percent less often? Now you're starting to see, actually, it's an economic benefit as well, even though that is just the choice of technology you use when you're doing the procurement.
Mathew Schwartz: It becomes an always on option that you can tap. What about from a development perspective? Would there be some kind of education or would the compilation software have to change in order for organizations to make best use of this?
John Goodacre: Yes, so the program started, it was a bid that we made in 2018, it started to be funded in 2019. The government at that stage invested £70 million, large businesses also invested another £200 million. All told, we're running at about a £300 million program. And what that program has been doing is making sure we've started to build the skillsets for the understanding, so if you want to go more than just compilation. We've been making sure that tools are available.
We are now starting to see the first commercial adoptions - mostly in more of the operational technology rather than the IT area. The idea that you can now start either building chips, or using chips that have this in, and the tools are available? That's now bringing the commercial tool upgrades so that you can now securely debug your system while you're doing it as well.
As well as the availability of tools, what we're finding is there's a developer productivity benefit as well. Quite a few decades ago now, I used to be a developer. And if I had a memory vulnerability, that just randomly scribbled somewhere, occasionally on some interrupt timing thing, it could take months to find. Whereas now what you've got is a predictable failure, potentially at compilation time, definitely at runtime. So case example that we have coming out of one of our demonstrator projects, it was a cloud provider, their virus monitoring software kept failing and stopping. Now, last thing you want to do is to have not your firewall, but your wall of protection, failing periodically. They ran it on our test systems. Within two minutes, it hit the error, fixed, and now the thing's been running in production.
So one of the other things that we'll be announcing - can I preannounce something; I guess I can - shortly is a program that's badly named, but it's called Tested on Morello. And what it is, is you can take your existing code, compile it on this platform, run it on a model of this type of computer, .
it will then feed back where your problems are, you then deploy that less buggy software on the traditional systems.
So that is something that we'll be announcing, that's generally available, to all developers shortly, just to raise their productivity and awareness of this capability.
Mathew Schwartz: Fantastic. This sounds very exciting, and like there might be also opportunities coming up that haven't even been unlocked yet. But you mentioned OT, operational technology, environments, and I wanted to definitely make sure we touch on that. The timeline for the technology in some of these environments can be so long, so anything you can do to better secure it when you deploy it in the first place, in case it's getting used for 20 or 30 or more years, or whatever, sounds like a huge upside. You mentioned that there's a push to adopt this for OT environments. What does that push? What are they hoping to use it for?
John Goodacre: Yes, so ff I was to do a very broad stroke classification of the semiconductor market, you've got these very small embedded systems. Microsoft actually developed a RISC-V core called CHERIoT, that fills that space for their internal use, but they also open sourced it. And now we're seeing commercial adoption of that open source RISC-V core. So that will allow your - what I would call the sensor actuator part of your network - to be able to adopt that microcontroller. Also announced recently is a company called Codasip, they would be the embedded real time operating system, Linux-y type of market. So we're now talking anything from gateways, or low intelligence, somewhat operational, they're an IP vendor. So again, their customers will be able to then build those chips.
What we then need is obviously the people who buy chips to be able to find those more secure chips and put them on their system integrations and boards have given them to developers. Really, that's where we're at in terms of adoption. We're obviously working on accelerating and helping people move that hopefully in the not too distant future as well.
Mathew Schwartz: So I know it can be difficult to predict the future. But you are a few years into this program. Are there any timelines that you foresee? I mean, people must ask this all the time: When can I have this? When can I see this in production so that I can play with it and hopefully, roll it out?
John Goodacre: Let's do the easy one first: When can I play with it? So the program has a website called dsbd.tech - Digital Security by Design dot Tech - and there's a menu that says, "request a board." There's availability of one board, which is the Morello board at the moment, that's an Arm IT-class machine. And there's a project that's also building a microcontroller, OT-level board as well, and that should be getting put up on the website in the next month, I would have thought.
That one will actually also be available through places like Mouser and the likes, as well. So it will be generally available. But those are evaluation systems, not products.
There are, again, a couple of companies that haven't announced it yet, so I can't tell you, where they have the actual silicon parts, later this year. So that will obviously then allow the system integrators and the people who want to put them into their smart meters and their heat pumps and whatever other embedded OT kind of controllers, can then start making that choice.
For the rest of the market, because obviously the vision of the whole program is that every computer that runs software in the world, in the future, will have this in. Now clearly, that's going to be a 20 or 30 year initiative, but there's nothing to stop any critical sectors, any sectors that have long lifetimes, or long churns - anybody who wants to make sure it's right for the next 10 or 15 years kind of thing - to be asking their supply chain now: How can we get this in our whenever that timeframe is? The idea that hardware has been proven and the software and the tools and the fact it runs? Tick. We're now in the commercial adoption challenge of how do we make something that hasn't changed for 50 years, basically, the actual vulnerability in the way software runs on hardware has been there since the '40s, it was first documented by the American government in the '70s as a "this isn't good." And now we're at, what, 30,000 vulnerabilities last year, something like that?
It's just getting ridiculous, unsustainable for today's Ciscos, we've got to take that 70% of noise out and then fix the blast radius. No silver bullet for everything, because obviously, unfortunately, there's always going to be a way in. But if you can get it so that it's so much more difficult and it's more, more steps - put everything in a box and you have to find a different bug every time rather than one and you're in, which is what the state of the attack is today, it'll be a much more trustworthy environment in which we can prosper.
Mathew Schwartz: Very well said. Yes, the more we have to force attackers to have to chain vulnerabilities together to try to get onto a system. My final question for you, there's been a lot of chat about memory-safe programming languages lately. This is a whole separate discussion, and it isn't going to magically replace all of the code that's already out there. As developers potentially adopt this, though, do you see this as a complementary strategy for what you're doing?
John Goodacre: Clearly if you're writing a new solution today, and you've got the skills and the availability, etc., there's a lot of challenges around it, definitely use memory safe language.
Now, the reality is the reason that software does so many good functions for us is it's layered on legacy. Even a good language like Rust has this section called "unsafe," or it might be "linking with a library to do the image decoding" or the decryption stuff. As soon as you've got those holes, it's not going to be safe anymore. So really, a good way of thinking of CHERI is, is it makes the existing code in C, a memory-safe language.
If you want to, you know, obviously, there's been fairly significant vocal publications out of the U.S. recently, there was an article out of the White House just this week, actually, again saying: Look, we recommend that you really start thinking about memory safety. And by the way, did you know CHERI you can do it for your existing code? So in that regard, it's complementary. In other words, it can help protect the bits of the memory safe languages that you can't protect. It helps protect the legacy. So, Java is a memory safe language, but most Java is written in C. So the environment itself is an issue. And then there's the other parts. A memory safe language is just the balance checking and things like that, what about the blast radius?
What we're finding is that by putting compartmentalization in a memory safe language, where the unsafe is also safe, it's actually complementary. But we're talking future, Rust you can do now; work at it.
With two of the projects, now, we've actually got a two versions of the Rust compiler that have been made memory safe for the unsafe parts, and we're also working at what it means in Rust, to be able to do these boxes of high protection.
Mathew Schwartz: So, beyond programming with memory-safe languages, there has been a huge push by the UK government and others for "secure by design." How does this help with those initiatives and getting to where we need to get to with building things more securely?
John Goodacre: Yes, well, so there's things here: there's secure by design, the term, but there's also another emerging one, which started off as secure by default.
What I've been doing in the program is trying to describe today's cybersecurity via what I call the reverse pyramid. So we've got cybersecurity right across the billions of people at the top. So the people that use digital systems, they're responsible for their cybersecurity, aren't they? Yeah. As we come down to the people that build systems, they are giving you the footprint of your vulnerability, so you're trying to reduce it.
By default, you want to your software vendors to reduce your attack surface. They can do it through configuration, they go do it through programs. So that for me is by default, the product you're using is more secure, because the vendor that built it didn't give you a large footprint of attack.
Now by design, for me, it's a technique where you're actually actively protecting yourself against the inevitable bugs. In other words, if you see something or something that happens then it is it has a null effect. So in the CHERI technology, what we say is we block the vulnerabilities from exploitation. So by design, even if there is a vulnerability, this will mean you cannot exploit it.
So it could be, for example, if I learned the password for your web server, it doesn't mean I can get into your database server. So by design of the system, you've protected and constrained the elements. That's more what I mean by design. I think a lot of people say "by design," but they mean "by default." So I think it's interesting that people will actually still grab hold of that distinction - from what cybersecurity is generally - in protection of software.
Mathew Schwartz: I salute our compartmentalized future here, this is great news, and thank you so much for all of the effort that you've been putting into this and taking the time today to share your insights with us.
John Goodacre: Well, thank you very much for your time and patience with me. Thank you, Mathew.
Mathew Schwartz: I've been speaking with Professor John Goodacre. I'm Mathew Schwartz with ISMG. Thank you very much for joining us.