Sandboxing Technology: Why Good vs. Bad Isn’t Enough
Here’s the backstory on Mimecast’s new acquisition, Solebit.
Editor's note: Earlier this year, Mimecast acquired Solebit, a company with technology geared toward static file analysis malware detection. As explained here, Mimecast had already integrated Solebit’s technology into Targeted Threat Protection – Attachment Protect, improving the performance and efficacy of the offering.
To give some background on what Solebit does and how its technology is different from others, Cyber Resilience Insights recently sat down with Solebit co-founder Menj Farjon, Mimecast’s Chief Scientist for Advanced Threat Detection.
What’s your background in cybersecurity, and how did Solebit get started?
My professional background starts with the Israeli army. I was recruited to a central cybersecurity unit for the Israeli army with emphasis on security and vulnerability research and exploits, mostly around Windows platforms and apps.
This is where I met Boris Vaynberg, who was co-founder and CEO of Solebit. We were in the same unit and he was the chief. Since we were sharing the same ideas and same concepts in the army, we said, “Why don’t we try to do something on our own?” Boris introduced me to Yossi Sara, our third cofounder.
As most of our experience was in the vulnerability and exploitation field, we were thinking of an idea around exploit prevention. We wanted to work a proof of concept (POC) and develop something around that. We tried to come up with something, and we also tried to figure out all the problems why something like this might not work. And about three months later we had a POC running that proved that it worked pretty much as we thought it would at the beginning. In fact, it covered more than we thought.
We were thinking about one solution to combat all exploits and back at that time in 2014 the leading security concept was around sandboxing. Every big company was acquiring some kind of sandboxing solution. We knew the difficulties around sandboxing. We knew the not-so-accurate parts of the sandbox from our time in the army because we used to bypass those.
So, it was appealing to us to do that in a different way. Not just do another sandbox but do something that doesn’t have the cons of the sandbox and has better advantages over the sandbox. First and foremost, that was the cost effectiveness: the ability to give a verdict that is as accurate or better than the sandbox but do it in milliseconds. For a lot of the sandboxes at the time it would take two minutes, three, five. Some even took 15 or half an hour, because it wasn’t that mature. And we said, “let’s do it in milliseconds to produce the same verdict.” We then challenged the idea with different attacks for different environments and operating systems, because sandboxes can only stretch to some applications and one or two operating systems. We wanted to produce a prevention system that goes beneath that, so it is independent of operating system or client application.
Then, we climbed up to start doing an even better verdict. We found that since we were using a very different technology, the bypassing mechanism that used to work for the sandbox didn’t work on this.
Q: What made your malware detection technology so different?
A: What we noticed was that exploits were taking advantage of using common data files such as Microsoft Excel, Word, PowerPoint, images and videos. These are the kinds of things you would trust when you get it in an email.
Modern attacks are using exploits inside those data files, so you open up a Word document and without you knowing there would be code running on your system that would take privileges and download an additional Trojan that would sniff network activity or steal passwords and—in later stages starting in 2016 or 2017—would infect it with ransomware. So, that was one common thing we noted.
And then as we were diving into it, the differentiator was data files should contain data objects and data binaries, basically data-only bytes. You do not expect a data file to contain any executable code. So, what we said was, “let’s try to get a system that can differentiate between data and code.” Every exploit uses data files and injects code into the file. If we can detect any code inside any piece of data, then we can stop every possible exploit without the need to understand what the exploit is and what is it targeting. We do it differently because we do it statically. You expect a file to contain text or images or videos but you don’t expect it to contain any executable code. This is where attackers are taking advantage of these vulnerabilities.
So for example a text might look like: “Hello my name is.. Nice to meet you…”, but executable machine code will look like: push eax, xor ebx, or in hex: 45 0B 85.
Sounds too simple, right? It proves a little more difficult to produce this system because a lot of data objects look like code. The ability to figure out if something looks like code is by you trying to disassemble it and try to understand whether or not it’s logically possible. Everything is disassembled to code. But the question is, does that code make sense? If you go across the street and you look to the right and the left and instead of going forward you start going back, then something doesn’t make sense. Code is like that.
At first, we had a lot of false positives because a lot of things look like code, but they aren’t. So, we built the system to tell whether the flow of execution makes sense—not a single machine instruction but a series of machine instructions.
It doesn’t mean we say it’s malicious, and we don’t need to. This is where other solutions fail, because when you try to figure out the good from the bad, someone is going to try and hide behind something that’s legitimate. This is how you do bypassing of sandbox or anti-virus solutions. So, it’s not about good or bad. It’s about the presence of anything that can do something. If anything can happen to a data file, it’s bad. If you find executable code in a data file, and you can validate that this code can run, you should not open this file!
We are not in the race with attackers. We don’t need to differentiate between good and bad. That’s going to make our technology much more difficult for anyone to try and bypass and it’s a huge differentiator for us.
Subscribe to Cyber Resilience Insights for more articles like these
Get all the latest news and cybersecurity industry analysis delivered right to your inbox
Sign up successful
Thank you for signing up to receive updates from our blog
We will be in touch!