When Edward Snowden worked with the press to release classified documents in 2013, it was a shock to both technical and non-technical people. While the general public began questioning how much their government was monitoring them, many in the security community were taken aback by technical aspects: the NSA seemingly was bypassing security measures with ease and at an unimaginable scale, exploiting weaknesses in systems we rely on every day. When asked if there was any way regular people could communicate privately, Snowden answered:
Encryption works. Properly implemented strong crypto systems are one of the few things that you can rely on. Unfortunately, endpoint security is so terrifically weak that NSA can frequently find ways around it.
The key take-away from this quote is that while we know how to use cryptography to protect information, the mathematics of cryptography doesn't just sit there on its own. It is programmed in software (which must implement the cryptographic algorithms properly) and it is used as part of larger systems that sit at the endpoints of any secure communication. I found the properly implemented part of that statement particularly interesting — if the algorithms are clearly stated, then surely they would be implemented properly by talented programmers. Right? In the few months following this statement, critical bugs were found in key cryptographic software in Apple systems, in the most widely-used web servers, in Linux systems (and again), and in Microsoft systems. In other words, serious implementation errors (producing security vulnerabilities) were found in the cryptographic software on pretty much every computing platform in the world. Talk about a wake-up call!
Clearly, we need to find ways to make security software more secure, so how can we do that? There are two approaches I have been looking in to: techniques to locate bugs/vulnerabilities in programs, and techniques to prove that there aren't any bugs in a given program (so called "Certified Software"). While my administrative responsibilities haven't given me much time to produce research results in either area, I have done what academics often do when entering a new area: I created and taught some special topics classes, and I have worked with graduate students on projects that explore the topics. Information on the approaches I'm interested in, and things that I have done, is below.
Static analysis is a powerful technique that allows for analysis at the source code level, without executing the program. While software testing can show whether software works on specific input values, static analysis can reason about all values along particular execution paths. Since there are some common types of bugs that programmers tend to make over and over, static analysis tools that look for these patterns can uncover security vulnerabilities before the program is ever run! A perfect programmer would not make common mistakes, such as code with buffer overflow vulnerabilities, but – alas – perfect programmers do not exist. Some of the things I have done related to vulnerability detection and static analysis are:
Failing to detect vulnerabilities with static analysis is not the same thing as determining the absence of vulnerabilities in the software. In the past 10 years, advances in both algorithms and computer speed have made "provably correct" software something that is increasingly within reach. While any attempt to prove correctness of software must fail in some cases (because of Rice's Theorem), we can certainly prove interesting properties about a wide range of real software. This is particularly true of cryptographic software, because it generally deals with very structured and inherently bounded computations on fixed-size values. The research community has seen a lot of advances in this area lately, and one of my favorite is a 2015 paper in which researchers from Princeton, Harvard, and MIT collaborated to prove the correctness and security of the OpenSSL HMAC implementation — the proof was computationally verified all the way from the high-level security properties of the algorithm all the way down through the correctness of the machine code generated by the compiler. This is amazing!
My own work is not so spectacular, obviously, but includes the following activities.