A little backstory: you're probably aware of the recent /proc/pid/mem local privilege escalation vulnerability (CVE-2012-0056). It was an unfortunate example of upstream security failures, from Linus attempting the silent fix, to the incorrect information regarding the effectiveness of ASLR on non-grsec systems against exploitation of the vulnerability, to the failure of SELinux to prevent exploitation, to most distros still shipping with non-PIE suid root binaries (what kind of performance issues can they claim in this case?)
In grsecurity, we had several features with the potential to effectively prevent exploitation, though these were generally rendered useless by lazy userland hardening on the part of distributions. The nuances of the applicable grsecurity/PaX features aren't widely known, so I'll discuss each of them here.
PaX's MPROTECT feature, for mappings that are currently non-writable, have both their VM_WRITE and VM_MAYWRITE flags removed. This means that not only can the mapping not be written to, but it cannot have its protection changed later to allow it to be written to. It happens to be important in the case of CVE-2012-0056 because the /proc/pid/mem code uses the same internal routines as ptrace for writing to a target process' memory. The routines depend on VM_MAYWRITE and not VM_WRITE for determining whether a given mapping can be modified. Thus, whereas on a vanilla kernel the code in the suid root binary could be directly targeted and modified (as was done in the released exploits), on a grsec kernel either some data-only attack would have to be feasible or a writable function pointer would have to be modified, combined with some reuse of existing code in the image for exploitation to be successful. This in itself isn't enough to stop a skilled/determined attacker, but it's an interesting almost side-effect of the consistency in implementation of MPROTECT.
PaX's ASLR feature in combination with grsecurity's GRKERNSEC_BRUTE feature deters exploitation by not only introducing a higher amount of entropy into the mapping layout than vanilla's ASLR, but actually makes ASLR useful against local attackers by providing a strong disincentive for bruteforcing. Upon crashing a suid root binary, the attacker will have their attempt logged, all of their processes terminated, and be banned from the system for 15 minutes. This response occurs for any similar subsequent attempt. This is in stark contrast to ASLR's usefulness against a local attacker on a vanilla system where it essentially provides no protection. The same exploit can be run against the same binary repeatedly in a short amount of time to exhaust the low amount of randomization. The reduced entropy associated with the use of an ascii-armor range by some distros also comes into play here.
While everyone else was focused on the normal patch specific vuln/update/forget cycle, our focus with these high-profile vulnerabilities has always been to look at tangential issues that are unlikely to be resolved upstream: exploitation techniques that either made certain strategies easier or possible in the first place. In the case of CVE-2012-0056, that issue revealed itself during a discussion on the full-disclosure mailing list on how to reliably exploit systems that changed the permission of the suid root binaries to deny reading. While such a permission change prevented the use of objdump in initial exploits, it was mentioned that a ptrace followed by an exec of the suid root binary allows one to effectively read the contents of the mapped binary. This might be surprising, as a ptrace of an existing suid root process would be denied. When execing a privileged binary while ptracing though, the binary is run without the extra privileges. When the goal is reading out the binary, however, this is irrelevant.
So I set out to eliminate this inconsistent behavior, the result being GRKERNSEC_PTRACE_READEXEC. Its configuration help currently reads as follows:
- Code: Select all
If you say Y here, unprivileged users will not be able to ptrace unreadable
binaries. This option is useful in environments that
remove the read bits (e.g. file mode 4711) from suid binaries to
prevent infoleaking of their contents. This option adds
consistency to the use of that file mode, as the binary could normally
be read out when run without privileges while ptracing.
If the sysctl option is enabled, a sysctl option with name "ptrace_readexec"
Note that ability to read the targeted binary was a key component in released exploits because regardless of whether ASLR was enabled on the system or not, distros were still shipping non-PIE suid root binaries. If we could guarantee that all suid root apps (including those from third parties) were PIE, there wouldn't be as much of a need for this feature. Unfortunately, others' lack of an attention to detail in this regard continues years after the demonstrated benefits of full ASLR, so part of our mission is to identify and fill such gaps in security.
One down. Let's go back a bit to discuss a bug class and oddity that's surprising to many. Credentials in Linux are not per-process entities -- they are per-thread. Therefore, it's perfectly valid for one thread in a process started by root to call setuid(1), which then runs with real uid 1, while all other threads in the process continue to run as uid 0 -- even though all threads are sharing the same memory! Arbitrary code execution in the non-privileged thread can immediately lead to privileged arbitrary code execution. But if you were to write some code of your own on the average system to verify this behavior, you'd think I was lying. After the one call to setuid, all other threads will report that same uid. What's going on here? The answer is that glibc is performing some trickery behind the scenes -- when the one thread calls setuid, the glibc wrapper signals to the other threads in the process which then issue their own setuid calls, giving the impression of process-wide uids. The key point to note here is that it's glibc performing this operation silently. Other libcs may not (and don't). This has also plagued programs in other languages, like Google Go.
To remove any possibility of inconsistency here, and urged by Andrew Griffiths (andrewg) who has reported several security issues related to this in the past, I created GRKERNSEC_SETXID. The configuration help for the feature currently reads as follows:
- Code: Select all
If you say Y here, a change from a root uid to a non-root uid
in a multithreaded application will cause the resulting uids,
gids, supplementary groups, and capabilities in that thread
to be propagated to the other threads of the process. In most
cases this is unnecessary, as glibc will emulate this behavior
on behalf of the application. Other libcs do not act in the
same way, allowing the other threads of the process to continue
running with root privileges. If the sysctl option is enabled,
a sysctl option with name "consistent_setxid" is created.
On the implementation side, I queue up a copy of the credentials of a thread making a root->nonroot transition, mark all the other threads in the process as needing a reschedule, and add a hook to install the new credentials when the thread is scheduled. The downside to the technique is that rescheduling occurs on syscall exit, so it's possible for at most one more syscall to execute with privilege in each thread active on a CPU after the initial thread performs its setuid. This shouldn't be a problem in real life, as the other threads in the process should be sleeping in the kernel at this point and will be intercepted on their return back down to userland. If you have other threads executing code concurrently, all bets would be off even in the glibc case as you couldn't interrupt/prevent the continued execution of kernel-level critical sections in the other threads.
The last feature I'll discuss is an enhancement to GRKERNSEC_PROC_MEMMAP, a feature designed to close up information leaks of ASLR in processes (not just privileged processes) via /proc and other areas. It was revealed in the wake of CVE-2012-0056 that an issue still existed where some suid binaries could be tricked into outputting their own /proc/self/maps contents, effectively defeating ASLR. Whether that information could then be abused or not is situation-specific, but I wanted to remove it from being something to worry about. The PaX Team and I both came up with potential solutions to this problem, though after talking each of them through, I decided on my own idea for its simplicity and ease of maintenance.
The basic idea was implemented in a few minutes and can be seen here. It's been modified since this initial idea to add logging and fix some bugs. The idea of the patch might not be obvious: I aim to ensure that the opener of a "sensitive" /proc/pid entry is the same task as the one performing the subsequent read/write of that entry. I do this by introducing a global 64-bit atomic counter, incremented and assigned to the task on an exec. Upon opening a "sensitive" /proc/pid entry, I copy the task's counter value to a new entry in the seq_file struct created to track the file. This seq_file struct is available when further operations are performed against the entry, which means we can compare the value in the struct against the current task's counter value. If the task changed via an exec, the counter won't match and the read/write will be ignored, preventing the information leakage. You can look at the published exploit here to think about how it prevents the attack. Note that this vector has yet to be addressed by anyone upstream.
---Update after initial post (Feb 16th 2012), with the correct name this time! (thanks to longld)---
VNSecurity released details today on how they exploited the recent sudo format string vulnerability on Fedora 16, bypassing both the ASLR implementation used by Fedora 16 and the FORTIFY_SOURCE protection created by RedHat. If anyone from VNSecurity happens to be reading this, I'd be interested in seeing the changes required to the exploit to make it work reliably against a proper ASLR implementation (preferably with a one-shot as we don't want those pesky remote logs, but if that's not possible I'd still like to see how many tries it takes fighting against the 15minute lockout from GRKERNSEC_BRUTE). Notably, PaX is immune to the weakening from prelink and the ulimit -s unlimited trick has no effect. With regards to the known technique of exhausting stack entropy by filling up argv/env with predictable data, I introduced tonight (in patches available right now) a further enhancement to GRKERNSEC_PROC_MEMMAP which limits the total size of strings copied for argv+env to 1MB for suid/sgid binaries.
Hopefully this was a useful look into some new grsecurity features and our mindset and motivations in creating them
If you see a benefit to this kind of research or you're using grsecurity at your company, please consider talking with your company about sponsoring this work. Basic information (and our current sponsor list) is available here. For any other questions, feel free to contact me at firstname.lastname@example.org.
Until next time!