XFS corruption on XEN platform

Discuss usability issues, general maintenance, and general support issues for a grsecurity-enabled system.

XFS corruption on XEN platform

Postby mdcja » Sun Apr 05, 2015 12:34 am

Hello, I have been plagued with this error for the past while now. I am running several servers in the cloud using XEN virtualization. I have compiled linux 3.14.37 with the grsec patchset (as well as the previous 3.14.36 kernels and patches) and every single time I try and load the kernel I have this error: (full error log at http://pastebin.com/kQCEuazG )

Code: Select all
XFS (xvda1): Internal error xfs_agi_read_verify at line 1580 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffffa013e4c6
CPU: 0 PID: 295 Comm: kworker/0:1H Not tainted 3.14.37-grsec #1
...
XFS (xvda1): Corruption detected. Unmount and run xfs_repair
XFS (xvda1): metadata I/O error: block 0x7ffc02 ("xfs_trans_read_buf_map") error 117 numblks 1
...
BUG: unable to handle kernel NULL pointer dereference at


I originally compiled using the automatic configuration using the guest/xen option. I have tried various different custom configurations with grsec but each time results in this error. If I don't enable any of the grsec patches the kernel successfully boots. Any ideas?
mdcja
 
Posts: 3
Joined: Tue Mar 31, 2015 12:06 am

Re: XFS corruption on XEN platform

Postby spender » Tue Apr 07, 2015 7:53 am

Hi,

Could you try disabling GRKERNSEC_RANDSTRUCT and see if that fixes the issue?

Thanks,
-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Re: XFS corruption on XEN platform

Postby mdcja » Fri Apr 17, 2015 1:30 am

Hi thanks for the reply, and sorry about the delay!

Unfortunately disablinbg GRKERNSEC_RANDSTRUCT didn't fix the issue.

The error always occurs after "creating volatile files and directories" without fail:
Code: Select all
[  OK  ] Reached target Local File Systems.
         Starting Trigger Flushing of Journal to Persistent Storage...
         Starting Security Auditing Service...
         Starting Create Volatile Files and Directories...
[    5.526325] ffff88002e445040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.531550] ffff88002e445050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.536209] ffff88002e445060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.540818] ffff88002e445070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.545740] XFS (xvda1): Internal error xfs_agi_read_verify at line 1580 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffffa013e3c5
[    5.552146] CPU: 0 PID: 295 Comm: kworker/0:1H Not tainted 3.14.37-grsec #1

(Full log: http://pastebin.com/2ED8ih9m)

I was able to get the system to boot successfully once with the randomization off, however I haven't been able to reproduce it since.
mdcja
 
Posts: 3
Joined: Tue Mar 31, 2015 12:06 am

Re: XFS corruption on XEN platform

Postby spender » Fri Apr 17, 2015 7:15 am

Hi,

Could you try the latest PaX patch alone with the same configuration you used for the last grsec kernel you tested?
https://grsecurity.net/~paxguy1/pax-lin ... st41.patch

This will help eliminate a number of features as being the culprit, as we'll now have to bisect to find out what's at fault (since neither of us directly modify any XFS code).

Thanks,
-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Re: XFS corruption on XEN platform

Postby mdcja » Wed Apr 22, 2015 4:46 am

Hi,

Using and enabling the latest PaX patch alone with the 3.14.18 kernel allows me to boot without experiencing any problems. With the grsec configuration I still experience the same issue.

Thanks,
- Julian
mdcja
 
Posts: 3
Joined: Tue Mar 31, 2015 12:06 am

Re: XFS corruption on XEN platform

Postby spender » Wed Apr 22, 2015 7:20 am

Can you provide your kernel .config? I'll give you back a number of configurations I'd like you to test. Could you also provide the .config for the PaX kernel you used?

Thanks,
-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Re: XFS corruption on XEN platform

Postby kamm » Tue Jul 28, 2015 6:39 pm

I had similar problems with XEN+XFS in domUs with various xen+kernel+grsec versions.
Took some time to track it down. I never would have guessed, but it is definitely caused by turning on PAX_MEMORY_SANITIZE.
(Seems like PAX_MEMORY_STACKLEAK doesn't have any effect)
Tested with 4.1.3 + grsecurity-3.1-4.1.3-201507261932.patch
kamm
 
Posts: 1
Joined: Tue Jul 28, 2015 6:00 pm

Re: XFS corruption on XEN platform

Postby PaX Team » Wed Jul 29, 2015 2:20 pm

if it's SANITIZE then there's likely a use-after-free bug underneath somewhere. if anyone has a reproducible/debuggable test case please let the upstream xfs developers know as we won't be able to figure this out easily i'm afraid.
PaX Team
 
Posts: 2310
Joined: Mon Mar 18, 2002 4:35 pm

Re: XFS corruption on XEN platform

Postby minipli » Thu Jul 30, 2015 1:46 pm

If it's PAX_MEMORY_SANITIZE related, can you please try booting with the following kernel command line option: pax_sanitize_slab=0? That disables the slab based sanitization but still leaves the page based sanitization enabled. If this still triggers the bug, it's related to the page based sanitization. If not, it's probably the slab based one.
minipli
 
Posts: 21
Joined: Mon Jan 03, 2011 6:39 pm

Re: XFS corruption on XEN platform

Postby minipli » Thu Jul 30, 2015 3:42 pm

Well, I guess I found something. The XFS code handles its inodes in an RCU-like fashion. It uses a constructor that should be run only once and handles RCU-delayed free()s via marking the inodes with invalid numbers so they won't match on searches; but relies on the object itself to stay intact all the time (spinlocks, flags, etc.). The PaX slab sanitization, however, will violate both invariants. It'll sanitize the object on kmem_cache_free() time (destroying spinlocks, flags, etc.) and will call the constructor afterwards. That'll make the object valid again, but opens up a race window where the object is invalid. PaX's slab sanitize will normally ignore RCU slabs but that one isn't marked as such and therefore falls through the cracks.

Can you please try the following patch? It'll mark the offending slab as RCU so PaX's sanitize will know to ignore this slab:

http://r00tworld.net/~minipli/grsec/pax-sanitize-xfs_inode_rcu.diff

What's still strange, though, that in the object dump it's all 0xff -- not 0xfe, as one would expect from a PaX slab sanitized object. :/
minipli
 
Posts: 21
Joined: Mon Jan 03, 2011 6:39 pm


Return to grsecurity support

cron