Page 1 of 1

PAX size overflow on raid10 resync in kernel 4.4

PostPosted: Wed Oct 05, 2016 12:39 pm
by eswierk
On a system running 4.4.20 with grsecurity, I am getting the following PAX assertion when a raid10 array is resynced during boot.

Code: Select all
     PAX: size overflow detected in function sync_request .../drivers/md/raid10.c:3181 cicus.674_1200 max, count: 135, decl: sectors; num: 0; context: r10bio;
     CPU: 2 PID: 922 Comm: md127_resync Not tainted 4.4.20-grsec #1
     Hardware name: Intel S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      0000000000000000 ffffc9001237b960 ffffffff813ece08 00000000000000d2
      ffffffffa0b23c68 0000000000000c6d ffffc9001237b990 ffffffff81212d56
      000000010f800000

      000000010f800000 000000010f800000 ffff8810264589c0
     Call Trace:
      [<ffffffff813ece08>] dump_stack+0x9a/0xe2
      [<ffffffffa0b23c68>] ? __param_str_max_queued_requests+0x68/0x56f0 [raid10]
      [<ffffffff81212d56>] report_size_overflow+0x66/0x80
      [<ffffffffa0b1c2da>] sync_request+0x1d1a/0x3150 [raid10]
      [<ffffffff810263b3>] ? sched_clock+0x13/0x20
      [<ffffffff810b7f8c>] ? local_clock+0x1c/0x20
      [<ffffffff810d3b5d>] ? trace_hardirqs_off+0xd/0x10
      [<ffffffff810263b3>] ? sched_clock+0x13/0x20
      [<ffffffff810263b3>] ? sched_clock+0x13/0x20
      [<ffffffff810b7f8c>] ? local_clock+0x1c/0x20
      [<ffffffff81784d12>] ? _raw_spin_unlock+0x22/0x30
      [<ffffffff8109b047>] ? __queue_work+0x187/0x530
      [<ffffffff810d77ad>] ? trace_hardirqs_on_caller+0x13d/0x1d0
      [<ffffffff8159c383>] md_do_sync+0xa93/0x1230
      [<ffffffff810cbb20>] ? wake_up_atomic_t+0x30/0x30
      [<ffffffff81596f28>] md_thread+0x128/0x130
      [<ffffffff81596e00>] ? find_pers+0x70/0x70
      [<ffffffff81596e00>] ? find_pers+0x70/0x70
      [<ffffffff810a3e7c>] kthread+0xfc/0x120
      [<ffffffff810a3d80>] ? kthread_create_on_node+0x240/0x240
      [<ffffffff81785dee>] ret_from_fork+0x3e/0x70
      [<ffffffff810a3d80>] ? kthread_create_on_node+0x240/0x240


I am able to work around it by adding an __intentional_overflow(-1) annotation to sync_request() in drivers/md/raid10.c, but I don't know if this is the correct fix or merely papering over a bug.

Code: Select all
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 2be20c1..3610ecc 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2805,7 +2805,7 @@ static int init_resync(struct r10conf *conf)
  *
  */

-static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
+static sector_t __intentional_overflow(-1) sync_request(struct mddev *mddev, sector_t sector_nr,
                             int *skipped)
 {
        struct r10conf *conf = mddev->private;

Re: PAX size overflow on raid10 resync in kernel 4.4

PostPosted: Wed Oct 05, 2016 3:56 pm
by PaX Team
1. the triggering code is this:
Code: Select all
r10_bio->sectors = (sector_nr | chunk_mask) - sector_nr + 1;
can you instrument it to print out the values of sector_nr and chunk_mask?
2. is the underlying filesystem bigger than 1TB? i'm asking it because the ->sectors field is an int so it can't represent a sector number for larger filesystems.

Re: PAX size overflow on raid10 resync in kernel 4.4

PostPosted: Wed Oct 05, 2016 5:50 pm
by eswierk
When the overflow is detected, sector_nr=0x10f800000 and chunk_mask=0x3ff.

The raid10 array in question is composed of four 1 TB disks.

Code: Select all
/dev/md127:
        Version : 1.2
  Creation Time : Sat Sep 24 01:49:57 2016
     Raid Level : raid10
     Array Size : 2343963648 (2235.38 GiB 2400.22 GB)
  Used Dev Size : 1171981824 (1117.69 GiB 1200.11 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Oct  5 21:45:04 2016
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

           Name : storage
           UUID : 42c3a203:5a23b5cd:c4c5fdcb:68482e2f
         Events : 1230

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync set-A   /dev/sda
       1       8       16        1      active sync set-B   /dev/sdb
       2       8       32        2      active sync set-A   /dev/sdc
       3       8       48        3      active sync set-B   /dev/sdd

Re: PAX size overflow on raid10 resync in kernel 4.4

PostPosted: Wed Oct 05, 2016 8:03 pm
by PaX Team
eswierk wrote:When the overflow is detected, sector_nr=0x10f800000 and chunk_mask=0x3ff.
thanks, so it's an otherwise harmless integer truncation (in this specific case) that gcc moves earlier inside the subtraction which then triggers the size overflow check since that sector number doesn't fit an int. the usual way we work around this is to change the code slightly so that the frontend transformation doens't kick in, something like this:
Code: Select all
--- a/drivers/md/raid10.c     2016-04-18 23:40:51.148543335 +0200
+++ b/drivers/md/raid10.c 2016-10-06 01:52:09.405977022 +0200
@@ -3153,6 +3153,7 @@
        } else {
                /* resync. Schedule a read for every block at this virt offset */
                int count = 0;
+               sector_t sectors;

                bitmap_cond_end_sync(mddev->bitmap, sector_nr, 0);

@@ -3178,7 +3179,8 @@
                r10_bio->sector = sector_nr;
                set_bit(R10BIO_IsSync, &r10_bio->state);
                raid10_find_phys(conf, r10_bio);
-               r10_bio->sectors = (sector_nr | chunk_mask) - sector_nr + 1;
+               sectors = (sector_nr | chunk_mask) - sector_nr + 1;
+               r10_bio->sectors = sectors;

                for (i = 0; i < conf->copies; i++) {
                        int d = r10_bio->devs[i].devnum;