Process freezes with numad or kernel option numa_balancing=e

Discuss usability issues, general maintenance, and general support issues for a grsecurity-enabled system.

Process freezes with numad or kernel option numa_balancing=e

Postby chrroessner » Sat Aug 22, 2015 5:14 am

Hi,

I am using grsecurity on Gentoo Linux with their already patched kenels hardened-sources. Stable releases. I encountered interesting problems since kernel 3.14 up to 4.1.4. My server is a HP ProLiant SE316M1-R2 (aka DL360), which is a two NUMA node system. Each Xeon CPU L5520 has access to 24GB RAM on all three memory channels. So both CPUs have a total of 48GB RAM.

I tried numad userspace program months ago. I was using this server mainly for monitoring purposes. So I also installed ntop-ng, which is very memory consuming. It happened from time to time, when memory got less and less (caches!) that suddenly proccesses seemed to hang. Being on the command line and typing some command could lead to full freeze, like the connection was cut. Seconds or even minutes later the process worked again. This was so damn problematic that I disabled the numad and uninstalled even ntop-ng.

Currently I use the server with Zabbix monitoring and also 7 virtual KVM guests. 3 weeks ago, I only had 24GB RAM. And it happened that Zabbix told me that I am running out of memory. I did not believe it, as there were nearly 6GB cached data. So I ignored it. Days later, I got a call in the morning that people could not deliver mail. I investigated the problem and saw that my virtual machine with my mail system was not running anymore. So I thought: Well, maybe Zabbix was right and I should buy more memory. So I did and now it has 48GB RAM.

Yesterday I wrote a script that shall do live backups with libvirt. For this, it creates an external snapshot and copies the base disk image to a different location. After that it merges changes from the snapshot back into the base image and removes the snapshot. While merging my mail server, the guest died. And again I found out that my server had all 48GB RAM in use. I guess the copy command did use all memory for caching.

And I used a kernel with numa_balancing turned on automatically at boot.

Could it be that some grsecurity stuff has problem, if a process is moved from one NUMA node to another?

Which options could cause such a problem? Is this grsecurity related or hardware or other linux problem? Are my thoughts right or is it totally wrong direction?

Thanks in advance

Christian
chrroessner
 
Posts: 5
Joined: Sat Aug 22, 2015 4:59 am

Re: Process freezes with numad or kernel option numa_balanci

Postby PaX Team » Sat Aug 22, 2015 10:27 am

in general, we don't intend to change NUMA or node balancing code. since this seems to be reproducible, you could test a vanilla kernel first and see what happens. you can also try to disable all grsec options and see if that improves the behaviour then we can try to narrow it down further.
PaX Team
 
Posts: 2310
Joined: Mon Mar 18, 2002 4:35 pm

Re: Process freezes with numad or kernel option numa_balanci

Postby chrroessner » Sat Aug 22, 2015 11:05 am

Ok, I will disable Grsecurity for tests and come back later to report things.
chrroessner
 
Posts: 5
Joined: Sat Aug 22, 2015 4:59 am

Re: Process freezes with numad or kernel option numa_balanci

Postby chrroessner » Sat Aug 22, 2015 7:16 pm

I ran tests several hours now. The problem reappeared with grsecurity turned off. So this is not grsecurity related. Thanks anyways
chrroessner
 
Posts: 5
Joined: Sat Aug 22, 2015 4:59 am

Re: Process freezes with numad or kernel option numa_balanci

Postby PaX Team » Sat Aug 22, 2015 7:35 pm

turning off grsec (while it's still patched in) is not enough, you'll have to try an unpatched vanilla kernel too. this is because we have changes that are not under .config control.
PaX Team
 
Posts: 2310
Joined: Mon Mar 18, 2002 4:35 pm

Re: Process freezes with numad or kernel option numa_balanci

Postby chrroessner » Sun Aug 23, 2015 3:28 am

Thanks for your reply. I just installed vanilla sources 4.1.6 and started tests (hourly) again. I will wait this day and check the logs. I come back later…
chrroessner
 
Posts: 5
Joined: Sat Aug 22, 2015 4:59 am

Re: Process freezes with numad or kernel option numa_balanci

Postby PaX Team » Sun Aug 23, 2015 7:50 am

can you test the same kernel version as well? it's just that even if 4.1.6 vanilla works we won't know if it's because a fix went in after 4.1.4.
PaX Team
 
Posts: 2310
Joined: Mon Mar 18, 2002 4:35 pm

Re: Process freezes with numad or kernel option numa_balanci

Postby chrroessner » Sun Aug 23, 2015 10:40 am

I compared 4.1.6 vanilla with 4.1.6 grsec patched. Unfortunately the problem reappears. So I think we can stop looking at grsec side. I just turned off numa_balancing and ksmd now and retry. It's really pain. While my backup script runs, a machine suddenly is destroyed. No damn logging anywhere.

I only can guess that something is wrong with memory. Not with the physical memory, but with libvirt/qemu and kvm. But this would become off-topic right now. Thanks for your friendly support
chrroessner
 
Posts: 5
Joined: Sat Aug 22, 2015 4:59 am


Return to grsecurity support