Page 1 of 1

XFS corruption on XEN platform

PostPosted: Sun Apr 05, 2015 12:34 am
by mdcja
Hello, I have been plagued with this error for the past while now. I am running several servers in the cloud using XEN virtualization. I have compiled linux 3.14.37 with the grsec patchset (as well as the previous 3.14.36 kernels and patches) and every single time I try and load the kernel I have this error: (full error log at http://pastebin.com/kQCEuazG )

Code: Select all
XFS (xvda1): Internal error xfs_agi_read_verify at line 1580 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffffa013e4c6
CPU: 0 PID: 295 Comm: kworker/0:1H Not tainted 3.14.37-grsec #1
...
XFS (xvda1): Corruption detected. Unmount and run xfs_repair
XFS (xvda1): metadata I/O error: block 0x7ffc02 ("xfs_trans_read_buf_map") error 117 numblks 1
...
BUG: unable to handle kernel NULL pointer dereference at


I originally compiled using the automatic configuration using the guest/xen option. I have tried various different custom configurations with grsec but each time results in this error. If I don't enable any of the grsec patches the kernel successfully boots. Any ideas?

Re: XFS corruption on XEN platform

PostPosted: Tue Apr 07, 2015 7:53 am
by spender
Hi,

Could you try disabling GRKERNSEC_RANDSTRUCT and see if that fixes the issue?

Thanks,
-Brad

Re: XFS corruption on XEN platform

PostPosted: Fri Apr 17, 2015 1:30 am
by mdcja
Hi thanks for the reply, and sorry about the delay!

Unfortunately disablinbg GRKERNSEC_RANDSTRUCT didn't fix the issue.

The error always occurs after "creating volatile files and directories" without fail:
Code: Select all
[  OK  ] Reached target Local File Systems.
         Starting Trigger Flushing of Journal to Persistent Storage...
         Starting Security Auditing Service...
         Starting Create Volatile Files and Directories...
[    5.526325] ffff88002e445040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.531550] ffff88002e445050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.536209] ffff88002e445060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.540818] ffff88002e445070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[    5.545740] XFS (xvda1): Internal error xfs_agi_read_verify at line 1580 of file fs/xfs/xfs_ialloc.c.  Caller 0xffffffffa013e3c5
[    5.552146] CPU: 0 PID: 295 Comm: kworker/0:1H Not tainted 3.14.37-grsec #1

(Full log: http://pastebin.com/2ED8ih9m)

I was able to get the system to boot successfully once with the randomization off, however I haven't been able to reproduce it since.

Re: XFS corruption on XEN platform

PostPosted: Fri Apr 17, 2015 7:15 am
by spender
Hi,

Could you try the latest PaX patch alone with the same configuration you used for the last grsec kernel you tested?
https://grsecurity.net/~paxguy1/pax-lin ... st41.patch

This will help eliminate a number of features as being the culprit, as we'll now have to bisect to find out what's at fault (since neither of us directly modify any XFS code).

Thanks,
-Brad

Re: XFS corruption on XEN platform

PostPosted: Wed Apr 22, 2015 4:46 am
by mdcja
Hi,

Using and enabling the latest PaX patch alone with the 3.14.18 kernel allows me to boot without experiencing any problems. With the grsec configuration I still experience the same issue.

Thanks,
- Julian

Re: XFS corruption on XEN platform

PostPosted: Wed Apr 22, 2015 7:20 am
by spender
Can you provide your kernel .config? I'll give you back a number of configurations I'd like you to test. Could you also provide the .config for the PaX kernel you used?

Thanks,
-Brad

Re: XFS corruption on XEN platform

PostPosted: Tue Jul 28, 2015 6:39 pm
by kamm
I had similar problems with XEN+XFS in domUs with various xen+kernel+grsec versions.
Took some time to track it down. I never would have guessed, but it is definitely caused by turning on PAX_MEMORY_SANITIZE.
(Seems like PAX_MEMORY_STACKLEAK doesn't have any effect)
Tested with 4.1.3 + grsecurity-3.1-4.1.3-201507261932.patch

Re: XFS corruption on XEN platform

PostPosted: Wed Jul 29, 2015 2:20 pm
by PaX Team
if it's SANITIZE then there's likely a use-after-free bug underneath somewhere. if anyone has a reproducible/debuggable test case please let the upstream xfs developers know as we won't be able to figure this out easily i'm afraid.

Re: XFS corruption on XEN platform

PostPosted: Thu Jul 30, 2015 1:46 pm
by minipli
If it's PAX_MEMORY_SANITIZE related, can you please try booting with the following kernel command line option: pax_sanitize_slab=0? That disables the slab based sanitization but still leaves the page based sanitization enabled. If this still triggers the bug, it's related to the page based sanitization. If not, it's probably the slab based one.

Re: XFS corruption on XEN platform

PostPosted: Thu Jul 30, 2015 3:42 pm
by minipli
Well, I guess I found something. The XFS code handles its inodes in an RCU-like fashion. It uses a constructor that should be run only once and handles RCU-delayed free()s via marking the inodes with invalid numbers so they won't match on searches; but relies on the object itself to stay intact all the time (spinlocks, flags, etc.). The PaX slab sanitization, however, will violate both invariants. It'll sanitize the object on kmem_cache_free() time (destroying spinlocks, flags, etc.) and will call the constructor afterwards. That'll make the object valid again, but opens up a race window where the object is invalid. PaX's slab sanitize will normally ignore RCU slabs but that one isn't marked as such and therefore falls through the cracks.

Can you please try the following patch? It'll mark the offending slab as RCU so PaX's sanitize will know to ignore this slab:

http://r00tworld.net/~minipli/grsec/pax-sanitize-xfs_inode_rcu.diff

What's still strange, though, that in the object dump it's all 0xff -- not 0xfe, as one would expect from a PaX slab sanitized object. :/