dup_mm() kernel panic on 3.2.36

a forum for discussing usability issues, general maintenance, and general support for a grsecurity-enabled system.

Moderators: spender, PaX Team

dup_mm() kernel panic on 3.2.36

Postby BeiKed9o » Sun Jan 13, 2013 8:45 am

thanks to someone bringing this up...I had a hard time to get a clue for the reboots/resets starting in december...today i finally managed to get a netconsole log...

question to all, is this related? Or is this something different?
Code: Select all
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffff8108709d>] dup_mm+0x27d/0x470
PGD 42252f000
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
CPU 0
Pid: 5557, comm: vsftpd Not tainted 3.2.36-grsec #6 System manufacturer System Product Name/P8H67-M PRO
RIP: 0010:[<ffffffff8108709d>]  [<ffffffff8108709d>] dup_mm+0x27d/0x470
RSP: 0018:ffff88036c741dd0  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88030f9a1500 RCX: 0000000000000000
RDX: ffff88042cbcc000 RSI: ffff8803979d2d10 RDI: ffff88030b84c630
RBP: ffff8803979d2d10 R08: ffff8803979d20b0 R09: 000003bc94518000
R10: 0000000000000002 R11: ffffffffffffffc0 R12: ffff88042c4ebb80
R13: ffff88030b84c630 R14: ffff88030f9a1560 R15: ffff88042c4ebbe0
FS:  000003bc94502700(0000) GS:ffff88043fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000030 CR3: 0000000001469000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process vsftpd (pid: 5557, threadinfo ffff88036c7c2538, task ffff88036c7c2120)
Stack:
 ffff88042ad308a0 0000000000000000 ffff88030b84c4e8 ffff88030b84c508
 ffff88030b84c4d0 ffff88030b84c510 ffff88042d840c00 0000000001200011
 ffff88042ca986a0 0000000000000000 000003bc945029d0 0000000000000000
Call Trace:
 [<ffffffff81087b8d>] ? copy_process+0x8bd/0x1120
 [<ffffffff810884df>] ? do_fork+0xbf/0x290
 [<ffffffff814567a3>] ? stub_clone+0x13/0x20
 [<ffffffff814564ab>] ? system_call_fastpath+0x18/0x1d
Code: 49 8b 95 98 00 00 00 49 c7 45 20 00 00 00 00 49 c7 45 18 00 00 00 00 49 c7 85 a8 00 00 00 00 00 00 00 48 85 d2 74 6e 48 8b 42 18 <48> 8b 48 30 48 8b 82 c8 00 00 00 f0 48 ff 42 30 71 07 f0 48 ff
RIP  [<ffffffff8108709d>] dup_mm+0x27d/0x470
 RSP <ffff88036c741dd0>
CR2: 0000000000000030
---[ end trace eebf28355677bc0e ]---
Kernel panic - not syncing: grsec: halting the system due to suspicious kernel crash caused by root
Rebooting in 3 seconds..
ACPI MEMORY or I/O RESET_REG.
BeiKed9o
 
Posts: 2
Joined: Sun Jan 13, 2013 8:27 am

Re: dup_mm() kernel panic on 3.2.36

Postby PaX Team » Sun Jan 13, 2013 6:28 pm

it's a different bug so we moved it to a new thread. as for debugging it, we'd need the vmlinux (not bzImage) corresponding to this report at least. for future bug reports please follow http://en.wikibooks.org/wiki/Grsecurity/Reporting_Bugs :).
PaX Team
 
Posts: 1897
Joined: Mon Mar 18, 2002 4:35 pm

Re: dup_mm() kernel panic on 3.2.36

Postby BeiKed9o » Mon Jan 14, 2013 8:18 am

damn...ok...will do next time...
Last edited by BeiKed9o on Fri Jan 18, 2013 4:53 pm, edited 1 time in total.
BeiKed9o
 
Posts: 2
Joined: Sun Jan 13, 2013 8:27 am

Re: dup_mm() kernel panic on 3.2.36

Postby PaX Team » Mon Jan 14, 2013 9:06 am

we looked at this with spender and it seems that the exact symptoms existed in vanilla linux as well as early as 3.3 (http://lkml.indiana.edu/hypermail/linux ... 00132.html). now whether it's the same underlying problem they ran into (and never managed to debug it) or something different is hard to tell right now, but since it seems reproducible for you, you could try a vanilla kernel and also a grsec one with CONFIG_GRKERNSEC disabled to see which one reproduces the problem still.
PaX Team
 
Posts: 1897
Joined: Mon Mar 18, 2002 4:35 pm

Re: dup_mm() kernel panic on 3.2.36 / 3.7.1

Postby renton » Wed Jan 16, 2013 10:58 am

Actually this bug is reproduced with 100% possibility.
I launch vsftpd:
Code: Select all
# /etc/init.d/vsftpd start


then run command
Code: Select all
# lftp w_test-l25-apache1_682a95e7:*******@my_local_hostname


It seems ok, but if after the address put "/"
Code: Select all
# lftp w_test-l25-apache1_682a95e7:*******@my_local_hostname/


i get kernel panic:
Code: Select all
Jan 15 17:42:44 l25 kernel: [672372.165622] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
Jan 15 17:42:44 l25 kernel: [672372.174718] IP: [<ffffffff81036246>] dup_mm+0x2b6/0x550
Jan 15 17:42:44 l25 kernel: [672372.180782] PGD 0
Jan 15 17:42:44 l25 kernel: [672372.183234] Oops: 0000 [#1] SMP
Jan 15 17:42:44 l25 kernel: [672372.187045] Modules linked in:
Jan 15 17:42:44 l25 kernel: [672372.190665] CPU 1
Jan 15 17:42:44 l25 kernel: [672372.192829] Pid: 19361, comm: vsftpd Not tainted 3.7.1-1gb-csm-rcu-exp #26 Intel Corporation S2600WP/S2600WP
Jan 15 17:42:44 l25 kernel: [672372.204387] RIP: 0010:[<ffffffff81036246>]  [<ffffffff81036246>] dup_mm+0x2b6/0x550
Jan 15 17:42:44 l25 kernel: [672372.213228] RSP: 0018:ffff880ef253fd70  EFLAGS: 00010286
Jan 15 17:42:44 l25 kernel: [672372.219361] RAX: 0000000000000000 RBX: ffff880811c46b80 RCX: 0000000000000000
Jan 15 17:42:44 l25 kernel: [672372.227626] RDX: ffff8807d76ddc00 RSI: ffff880811c46b80 RDI: ffff8807e4bea8a0
Jan 15 17:42:44 l25 kernel: [672372.235887] RBP: ffff880ef253fde0 R08: ffff88081fc314a0 R09: ffffffff8103619e
Jan 15 17:42:44 l25 kernel: [672372.244141] R10: 0000000000000000 R11: 0000000000011230 R12: ffff880eb5a8cd00
Jan 15 17:42:44 l25 kernel: [672372.252388] R13: ffff8806ce45c200 R14: ffff8807e4bea8a0 R15: ffff880eb5a8cd60
Jan 15 17:42:44 l25 kernel: [672372.260646] FS:  000072215fde8700(0000) GS:ffff88081fc20000(0000) knlGS:0000000000000000
Jan 15 17:42:44 l25 kernel: [672372.269972] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 15 17:42:44 l25 kernel: [672372.276586] CR2: 0000000000000030 CR3: 0000000721170000 CR4: 00000000000407f0
Jan 15 17:42:44 l25 kernel: [672372.284838] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 15 17:42:44 l25 kernel: [672372.293097] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 15 17:42:44 l25 kernel: [672372.301356] Process vsftpd (pid: 19361, threadinfo ffff880eed004e78, task ffff880eed004880)
Jan 15 17:42:44 l25 kernel: [672372.310955] Stack:
Jan 15 17:42:44 l25 kernel: [672372.313392]  0000000000000000 0000000000000000 ffff880eb5a8cd08 0000000000000000
Jan 15 17:42:44 l25 kernel: [672372.321999]  ffff880eb5a8cd00 0000000000000000 ffff8806ce45c260 0000000000000000
Jan 15 17:42:44 l25 kernel: [672372.330610]  0000000000000000 0000000001200011 ffff88080c73ecc0 0000000000000000
Jan 15 17:42:44 l25 kernel: [672372.339211] Call Trace:
Jan 15 17:42:44 l25 kernel: [672372.342146]  [<ffffffff810378dc>] copy_process+0x13bc/0x13f0
Jan 15 17:42:44 l25 kernel: [672372.348686]  [<ffffffff814bbc41>] ? _raw_spin_unlock_bh+0x11/0x20
Jan 15 17:42:44 l25 kernel: [672372.355700]  [<ffffffff8124256f>] ? gr_attach_curr_ip+0x12f/0x140
Jan 15 17:42:44 l25 kernel: [672372.362716]  [<ffffffff810379f0>] do_fork+0xa0/0x2e0
Jan 15 17:42:44 l25 kernel: [672372.368457]  [<ffffffff8103e7f9>] ? sys_wait4+0xa9/0xf0
Jan 15 17:42:44 l25 kernel: [672372.374490]  [<ffffffff8100aa73>] sys_clone+0x23/0x30
Jan 15 17:42:44 l25 kernel: [672372.380328]  [<ffffffff814bc9d3>] stub_clone+0x13/0x20
Jan 15 17:42:44 l25 kernel: [672372.386276]  [<ffffffff814bc789>] ? system_call_fastpath+0x18/0x1d
Jan 15 17:42:44 l25 kernel: [672372.393371] Code: c7 46 20 00 00 00 00 49 89 46 30 49 c7 46 18 00 00 00 00 49 c7 86 b0 00 00 00 00 00 00 00 48 85 d2 0f 84 86 00 00 00 48 8b 42 18 <48> 8b 48 30 48 8b 82 c8 00 00 00 f0 48 ff 42 30 71 07 f0 48 ff
Jan 15 17:42:44 l25 kernel: [672372.415859] RIP  [<ffffffff81036246>] dup_mm+0x2b6/0x550
Jan 15 17:42:44 l25 kernel: [672372.422007]  RSP <ffff880ef253fd70>
Jan 15 17:42:44 l25 kernel: [672372.426107] CR2: 0000000000000030
Jan 15 17:42:44 l25 kernel: [672372.430608] ---[ end trace aa185f53051cb2d5 ]---


On vanilla kernel this bug isn't reproduced.

Kernel - 3.7.1, grsec - 2.9.1-3.7.1-201301041854.patch

If it is needed I can send you my kernel and .config.
renton
 
Posts: 9
Joined: Tue Jan 15, 2013 10:17 am

Re: dup_mm() kernel panic on 3.2.36

Postby spender » Wed Jan 16, 2013 11:58 am

Hi,

This should be fixed in the next patch to be released tonight. Let me know if you still have problems with that patch.

Thanks,
-Brad
spender
 
Posts: 1950
Joined: Wed Feb 20, 2002 8:00 pm
Location: VA, USA

Re: dup_mm() kernel panic on 3.2.36

Postby renton » Wed Jan 23, 2013 7:39 am

The grsecurity-2.9.1-3.7.3-201301181518 patch solved problem with kernel panic. Thanks.

But now I've got another problem, after working for about 10 hours my server hangs without any suspicious logs (this server is used for shared hosting).

B.t.w. On kernel linux-3.4.6/grsecurity-2.9.1-3.4.6-201207242237 it works perfectly without hanging for about a year or so.
renton
 
Posts: 9
Joined: Tue Jan 15, 2013 10:17 am

Re: dup_mm() kernel panic on 3.2.36

Postby spender » Wed Jan 23, 2013 8:14 am

You'll have to enable the various options needed to debug such a problem, as I can't really do anything without any information.

Lockdep, hangcheck, sysrq, NMI watchdog, netconsole, any other lock debugging features.

-Brad
spender
 
Posts: 1950
Joined: Wed Feb 20, 2002 8:00 pm
Location: VA, USA

Re: dup_mm() kernel panic on 3.2.36

Postby renton » Thu Jan 24, 2013 2:37 pm

Well after working just one day the server hangs again.
This time I had IPMI console open and after the server had stopped answering I got the following in it:
Code: Select all
[97711.308560] BUG: soft lockup - CPU#20 stuck for 22s! [httpd:6032]
[97711.348495] BUG: soft lockup - CPU#21 stuck for 22s! [httpd:6627]
[97711.378446] BUG: soft lockup - CPU#22 stuck for 22s! [vsftpd:11144]
[97711.398414] BUG: soft lockup - CPU#23 stuck for 22s! [httpd:17211]
[97725.705447] INFO: rcu_sched self-detected stall on CPU


As far as I understand this is not enough to get it clear why it hangs. For now I booted the box to my old stable kernel 3.4.6, there's about 15 k sites on this box so the clients are really angry with such frequent restarts ;)

In general there's an interesting case with these kernels, I know for sure which kernel works well on servers and which hangs like this. For example, I know that 2.6.24-.28 works without hanging, 2.6.32-.36 hangs after an hour of work, 2.6.35 - once in 2-3 months, 2.6.38 - once a day, 3.0-3.1 - after only an hour of work, 3.3.6 once in three months.
3.4.6 hasn't yet hanged but it has another unpleasant bug that is well disturbing https://lkml.org/lkml/2012/9/25/183, If it wasn't so I wouldn't update it.
b.t.w. It's the first time I learnt from IPMI what caused hanging, before there wasn't any clue at all.
renton
 
Posts: 9
Joined: Tue Jan 15, 2013 10:17 am


Return to grsecurity support