Kernel oops: rcu_sched detected stalls on CPUs/tasks

Discuss and suggest new grsecurity features

Moderators: spender, PaX Team

Kernel oops: rcu_sched detected stalls on CPUs/tasks

Postby hanno » Sun Jul 27, 2014 11:40 am

Since a while we have a problem on a system running a grsecurity kernel. It occurs irregularly and we are not able to reproduce the incident. Kernel is 3.15.5 with grsec patch from 201407100036, however we've been seeing this for a while.

We regularly get either a complete system hang or the system becomes almost unusable with segfaults etc. We see messages like the one below in dmesg.

The system is a virtual kvm machine running on an Intel Xeon system. I can give further info if needed. I found this older forum entry which seems to have somewhat simliar stack traces:
viewtopic.php?t=3060&p=12154

Any ideas what I could do to debug further? As said, it happens irregularly which makes debugging harder.

Jul 26 05:02:01 milch kernel: INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=6002 jiffies, g=6339563, c=6339562, q=6087)
Jul 26 05:02:01 milch kernel: sending NMI to all CPUs:
Jul 26 05:02:01 milch kernel: NMI backtrace for cpu 0
Jul 26 05:02:01 milch kernel: CPU: 0 PID: 20938 Comm: wc Not tainted 3.15.5-grsec #1
Jul 26 05:02:01 milch kernel: Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Jul 26 05:02:01 milch kernel: task: ffff880039188000 ti: ffff880039188510 task.ti: ffff880039188510
Jul 26 05:02:01 milch kernel: RIP: 0010:[<ffffffff810268ef>] [<ffffffff810268ef>] ffffffff810268ef
Jul 26 05:02:01 milch kernel: RSP: 0000:ffff88007fc03e28 EFLAGS: 00000046
Jul 26 05:02:01 milch kernel: RAX: 0000000000000c00 RBX: 0000000000000002 RCX: 0000000000000007
Jul 26 05:02:01 milch kernel: RDX: ffffffff81501640 RSI: 0000000000000002 RDI: 0000000000000086
Jul 26 05:02:01 milch kernel: RBP: 0000000000000c00 R08: ffffffff816a58c4 R09: 0000000000000000
Jul 26 05:02:01 milch kernel: R10: 0000000000000000 R11: ffffffff81026700 R12: 0000000000000003
Jul 26 05:02:01 milch kernel: R13: ffffffff81546f40 R14: 0000000000000000 R15: ffffffff81547000
Jul 26 05:02:01 milch kernel: FS: 000003fe22b6c700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
Jul 26 05:02:01 milch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 26 05:02:01 milch kernel: CR2: 000003fe226cf000 CR3: 00000000013d1000 CR4: 00000000000407f0
Jul 26 05:02:01 milch kernel: Stack:
Jul 26 05:02:01 milch kernel: ffff88007fc0c968 0000000000000086 0000000000002710 ffff88007fc0c470
Jul 26 05:02:01 milch kernel: 0000000000000001 ffffffff81024390 ffffffff81546f40 ffffffff810923c8
Jul 26 05:02:01 milch kernel: 00000000ffffffff 00000000000017c7 0000000000000083 ffffffff8157d740
Jul 26 05:02:01 milch kernel: Call Trace:
Jul 26 05:02:01 milch kernel: <IRQ>
Jul 26 05:02:01 milch kernel: [<ffffffff81024390>] ? arch_trigger_all_cpu_backtrace+0x42/0x6a
Jul 26 05:02:01 milch kernel: [<ffffffff810923c8>] ? rcu_check_callbacks+0x40b/0x4e5
Jul 26 05:02:01 milch kernel: [<ffffffff81065870>] ? update_process_times+0x31/0x56
Jul 26 05:02:01 milch kernel: [<ffffffff81099800>] ? tick_sched_timer+0x30/0x4c
Jul 26 05:02:01 milch kernel: [<ffffffff81077404>] ? __run_hrtimer.isra.27+0x50/0xa9
Jul 26 05:02:01 milch kernel: [<ffffffff81026717>] ? default_inquire_remote_apic+0xf/0xf
Jul 26 05:02:01 milch kernel: [<ffffffff81077a9b>] ? hrtimer_interrupt+0xd9/0x1d4
Jul 26 05:02:01 milch kernel: [<ffffffff810231f3>] ? smp_apic_timer_interrupt+0x36/0x46
Jul 26 05:02:01 milch kernel: [<ffffffff813c6c26>] ? apic_timer_interrupt+0x86/0x90
Jul 26 05:02:01 milch kernel: <EOI>
Jul 26 05:02:01 milch kernel: Code: 5f ff f6 c4 10 75 f2 44 89 e0 c1 e0 18 89 04 25 10 a3 5f ff 89 d8 09 e8 81 cd 00 04 00 00 83 fb 02 0f 44 c5 89 04 25 00 a3 5f ff <57> 9d 66 66 90 66 90 48 83 c4 10 5b 5d 41 5c c3 41 54 55 53 89
Jul 26 05:02:01 milch kernel: NMI backtrace for cpu 1
Jul 26 05:02:01 milch kernel: CPU: 1 PID: 20893 Comm: pidof Not tainted 3.15.5-grsec #1
Jul 26 05:02:01 milch kernel: Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Jul 26 05:02:01 milch kernel: task: ffff88007d388fa0 ti: ffff88007d3894b0 task.ti: ffff88007d3894b0
Jul 26 05:02:01 milch kernel: RIP: 0010:[<ffffffff813c4f04>] [<ffffffff813c4f04>] ffffffff813c4f04
Jul 26 05:02:01 milch kernel: RSP: 0018:ffff880003a8bc90 EFLAGS: 00000097
Jul 26 05:02:01 milch kernel: RAX: 0000000000002502 RBX: ffff8800390439c0 RCX: 0000000000000010
Jul 26 05:02:01 milch kernel: RDX: 0000000000000025 RSI: ffff880003a8bd70 RDI: ffff8800390441c8
Jul 26 05:02:01 milch kernel: RBP: ffff880003a8bd70 R08: 0000000000000001 R09: 0000000000000001
Jul 26 05:02:01 milch kernel: R10: 00000000fffffffc R11: ffff88007b06c650 R12: ffff88007b06c650
Jul 26 05:02:01 milch kernel: R13: ffff88007d32b200 R14: ffffffff81545080 R15: 0000000000000000
Jul 26 05:02:01 milch kernel: FS: 000003a3a5bda700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
Jul 26 05:02:01 milch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 26 05:02:01 milch kernel: CR2: 000003a3a5df8038 CR3: 00000000013d3000 CR4: 00000000000407f0
Jul 26 05:02:01 milch kernel: Stack:
Jul 26 05:02:01 milch kernel: ffffffff81067580 ffff88002bffa550 ffff88007b06c650 0000000000000000
Jul 26 05:02:01 milch kernel: ffffffff8112e320 ffff88007d0db8e0 ffffffff81545080 000000017c824c80
Jul 26 05:02:01 milch kernel: ffffffff810ebbc9 ffff88007d10002c ffff880003a8be20 ffff88007d10002c
Jul 26 05:02:01 milch kernel: Call Trace:
Jul 26 05:02:01 milch kernel: [<ffffffff81067580>] ? __lock_task_sighand+0x45/0x6d
Jul 26 05:02:01 milch kernel: [<ffffffff8112e320>] ? do_task_stat+0x100/0x8cd
Jul 26 05:02:01 milch kernel: [<ffffffff810ebbc9>] ? do_last.isra.43+0xa99/0xbad
Jul 26 05:02:01 milch kernel: [<ffffffff810e96c0>] ? link_path_walk+0x4d/0x79b
Jul 26 05:02:01 milch kernel: [<ffffffff8112a457>] ? proc_single_show+0x43/0x77
Jul 26 05:02:01 milch kernel: [<ffffffff810fcf66>] ? seq_read+0x292/0x571
Jul 26 05:02:01 milch kernel: [<ffffffff810e08c8>] ? vfs_read+0x123/0x1ec
Jul 26 05:02:01 milch kernel: [<ffffffff810e0d70>] ? SyS_read+0x41/0x84
Jul 26 05:02:01 milch kernel: [<ffffffff813c60ca>] ? system_call_fastpath+0x16/0x1b
Jul 26 05:02:01 milch kernel: Code: 8b 7c 24 08 be 00 02 00 00 e8 f2 ab c9 ff 31 c0 5a c3 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 74 0a 8a 07 38 d0 74 04 f3 90 <eb> f6 c3 9c 58 66 66 90 66 90 48 89 c6 fa 66 66 90 66 66 90 ba
Jul 26 05:02:01 milch kernel: INFO: NMI handler (ffffffff810102af) took too long to run: 24.023 msecs
Jul 26 05:06:04 milch kernel: INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=6002 jiffies, g=6340253, c=6340252, q=7283)
hanno
 
Posts: 26
Joined: Thu Dec 16, 2004 4:37 am

Re: Kernel oops: rcu_sched detected stalls on CPUs/tasks

Postby sfs6dzs » Sat Aug 16, 2014 6:56 am

And if you try the same GNU/Linux distribution in another KVM VM on the same host would it still crash like this? It doesn't seem to me its from grsecurity but not 100% sure, this at least is interesting:
Code: Select all
Jul 26 05:02:01 milch kernel: Call Trace:
Jul 26 05:02:01 milch kernel: [<ffffffff81067580>] ? __lock_task_sighand+0x45/0x6d       <-------
Jul 26 05:02:01 milch kernel: [<ffffffff8112e320>] ? do_task_stat+0x100/0x8cd       <-------
Jul 26 05:02:01 milch kernel: [<ffffffff810ebbc9>] ? do_last.isra.43+0xa99/0xbad       <-------
Jul 26 05:02:01 milch kernel: [<ffffffff810e96c0>] ? link_path_walk+0x4d/0x79b
Jul 26 05:02:01 milch kernel: [<ffffffff8112a457>] ? proc_single_show+0x43/0x77       <-------
Jul 26 05:02:01 milch kernel: [<ffffffff810fcf66>] ? seq_read+0x292/0x571
Jul 26 05:02:01 milch kernel: [<ffffffff810e08c8>] ? vfs_read+0x123/0x1ec       <-------
Jul 26 05:02:01 milch kernel: [<ffffffff810e0d70>] ? SyS_read+0x41/0x84       <-------
Jul 26 05:02:01 milch kernel: [<ffffffff813c60ca>] ? system_call_fastpath+0x16/0x1b
sfs6dzs
 
Posts: 11
Joined: Thu Jul 12, 2012 11:47 am

Re: Kernel oops: rcu_sched detected stalls on CPUs/tasks

Postby spender » Sun Aug 17, 2014 12:50 pm

The log from the other post was resolved a while ago, so it's definitely not that. We don't add any additional calls to lock_task_sighand or make any relevant modifications to do_task_stat. Add frame pointer support and more lock debugging options to try to track it down, but it doesn't seem to be caused by grsec to me.

-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm
Location: VA, USA

Re: Kernel oops: rcu_sched detected stalls on CPUs/tasks

Postby hssupport » Fri Oct 10, 2014 8:05 am

Hi, we've got a problem with similar symptoms: System hangs and kernel says rcu_sched detected stalls
- We observe the systems only on KVM-Guests with grsecurity-kernels.
Relevant Kernel messages are given below (One block from kernel version 3.14.15 and one from kernel version 3.2.60) If needed, we can provide our kernel configurations too.

Any ideas ?

Sincerly
Harald

Kernelmessages on system with vanilla-kernel 3.2.60 patched with grsecurity-3.0-3.2.60-201406101410.patch
================================================================================================
INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=6002 jiffies)
Pid: 0, comm: swapper/0 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a06d7>] ? rcu_check_callbacks+0x57/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81009605>] ? sched_clock+0x5/0x10
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
[<ffffffff81a07b0c>] ? 0xffffffff81a07b0b
[<ffffffff81a07369>] ? 0xffffffff81a07368
o2net: No connection established with node 3 after 30.0 seconds, giving up.
INFO: rcu_bh detected stalls on CPUs/tasks: { 1} (detected by 2, t=6003 jiffies)
Pid: 0, comm: swapper/2 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8104cba8>] ? rebalance_domains+0x88/0x160
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a0773>] ? rcu_check_callbacks+0xf3/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81078051>] ? ktime_get+0x61/0xf0
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
connection1:0: detected conn error (1021)
INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=24034 jiffies)
Pid: 0, comm: swapper/0 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a06d7>] ? rcu_check_callbacks+0x57/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81009605>] ? sched_clock+0x5/0x10
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
[<ffffffff81a07b0c>] ? 0xffffffff81a07b0b
[<ffffffff81a07369>] ? 0xffffffff81a07368
INFO: rcu_bh detected stalls on CPUs/tasks: { 1} (detected by 2, t=24035 jiffies)
Pid: 0, comm: swapper/2 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8104cba8>] ? rebalance_domains+0x88/0x160
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a0773>] ? rcu_check_callbacks+0xf3/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81078051>] ? ktime_get+0x61/0xf0
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=42066 jiffies)
Pid: 0, comm: swapper/0 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a06d7>] ? rcu_check_callbacks+0x57/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81009605>] ? sched_clock+0x5/0x10
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
[<ffffffff81a07b0c>] ? 0xffffffff81a07b0b
[<ffffffff81a07369>] ? 0xffffffff81a07368
INFO: rcu_bh detected stalls on CPUs/tasks: { 1} (detected by 3, t=42067 jiffies)
Pid: 0, comm: swapper/3 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8104cba8>] ? rebalance_domains+0x88/0x160
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a0773>] ? rcu_check_callbacks+0xf3/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81078051>] ? ktime_get+0x61/0xf0
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=60098 jiffies)
Pid: 0, comm: swapper/0 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a06d7>] ? rcu_check_callbacks+0x57/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81009605>] ? sched_clock+0x5/0x10
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
[<ffffffff81a07b0c>] ? 0xffffffff81a07b0b
[<ffffffff81a07369>] ? 0xffffffff81a07368
INFO: rcu_bh detected stalls on CPUs/tasks: { 1} (detected by 3, t=60099 jiffies)
Pid: 0, comm: swapper/3 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8104cba8>] ? rebalance_domains+0x88/0x160
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a0773>] ? rcu_check_callbacks+0xf3/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81078051>] ? ktime_get+0x61/0xf0
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
[-- break #0(1) sent -- `\z' -- Thu Oct 9 03:14:57 2014]
INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=78130 jiffies)
Pid: 0, comm: swapper/0 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a06d7>] ? rcu_check_callbacks+0x57/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81009605>] ? sched_clock+0x5/0x10
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
[<ffffffff81a07b0c>] ? 0xffffffff81a07b0b
[<ffffffff81a07369>] ? 0xffffffff81a07368
INFO: rcu_bh detected stalls on CPUs/tasks: { 1} (detected by 3, t=78131 jiffies)
Pid: 0, comm: swapper/3 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8104cba8>] ? rebalance_domains+0x88/0x160
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a0773>] ? rcu_check_callbacks+0xf3/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81078051>] ? ktime_get+0x61/0xf0
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=96162 jiffies)
Pid: 0, comm: swapper/0 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a06d7>] ? rcu_check_callbacks+0x57/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81009605>] ? sched_clock+0x5/0x10
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0
[<ffffffff81a07b0c>] ? 0xffffffff81a07b0b
[<ffffffff81a07369>] ? 0xffffffff81a07368
INFO: rcu_bh detected stalls on CPUs/tasks: { 1} (detected by 2, t=96163 jiffies)
Pid: 0, comm: swapper/2 Not tainted 3.2.60-2squeeze2-hs-kvm-grsec #1
Call Trace:
<IRQ> [<ffffffff810a066d>] ? __rcu_pending+0x3dd/0x3f0
[<ffffffff8104cba8>] ? rebalance_domains+0x88/0x160
[<ffffffff8107e770>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810a0773>] ? rcu_check_callbacks+0xf3/0x100
[<ffffffff8105d6ff>] ? update_process_times+0x3f/0x80
[<ffffffff8107e7cb>] ? tick_sched_timer+0x5b/0xb0
[<ffffffff81072b6d>] ? __run_hrtimer+0x5d/0x120
[<ffffffff8107304f>] ? hrtimer_interrupt+0xef/0x250
[<ffffffff8107eaea>] ? tick_check_idle+0xca/0xd0
[<ffffffff8101b363>] ? smp_apic_timer_interrupt+0x63/0xa0
[<ffffffff816f966f>] ? apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff81078051>] ? ktime_get+0x61/0xf0
[<ffffffff81021432>] ? native_safe_halt+0x2/0x10
[<ffffffff8100a9cd>] ? default_idle+0x1d/0x40
[<ffffffff8100188c>] ? cpu_idle+0x7c/0xc0

================================================================================================




Kernelmessages on system with vanilla-kernel 3.14.15 patched with grsecurity-3.0-3.14.15-201408032014.patch
================================================================================================
rcu_sched detected stalls on CPUs/tasks:
3: (0 ticks this GP) idle=b43/140000000000000/0 softirq=23934020/23934020
(detected by 5, t=6002 jiffies, g=33434851, c=33434850, q=12541)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 21679 Comm: ps Not tainted 3.14.15-0wheezy0-hs-kvm-grsec #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffff8800dae84b60 ti: ffff8800dae85058 task.ti: ffff8800dae85058
RIP: 0010:[<ffffffff81831292>] [<ffffffff81831292>] ffffffff81831292
RSP: 0018:ffff880392027c80 EFLAGS: 00000002
RAX: 0000000000007eb6 RBX: ffff8803fbd1c200 RCX: 0000000000000000
RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8803fbd1ca08
RBP: ffff8800bdeea180 R08: 0000000000001000 R09: 000000000000fffc
R10: 0000000000000000 R11: 0000000000000002 R12: ffff880392027d60
R13: ffff8803fbd1ca08 R14: ffff8803ff40ee10 R15: 0000000000000000
FS: 00006bbf1f4b7700(0000) GS:ffff88040fcc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000003da12b8 CR3: 00000003a3d85000 CR4: 00000000000007b0
Stack:
ffffffff810c9dae 000000000000006d ffff8803f991a880 ffff8800bdeea180
ffff880396af5480 0000000000000002 ffffffff811ee1c4 ffff880000000001
0000000000000000 ffff88030000006c ffff88030000006c ffff88030000006c
Call Trace:
[<ffffffff810c9dae>] ? __lock_task_sighand+0x4e/0xa0
[<ffffffff811ee1c4>] ? proc_pid_status+0x3a4/0x7c0
[<ffffffff811e8394>] ? proc_single_show+0x54/0xa0
[<ffffffff811a65e7>] ? seq_read+0x187/0x590
[<ffffffff81181fe3>] ? vfs_read+0x143/0x230
[<ffffffff811a0a8d>] ? __fd_install+0x2d/0x70
[<ffffffff8118226f>] ? SyS_read+0x4f/0xc0
[<ffffffff818320f5>] ? system_call_fastpath+0x16/0x1b
Code: 00 02 00 00 e9 b0 e7 88 ff b8 00 02 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 01 c3 83 e2 fe 0f b6 f2 b8 00 80 00 00 eb 08 90 f3 90 <83> e8 01 74 0a 0f b6 0f 38 ca 75 f2 66 90 c3 66 66 66 90 66 66
NMI backtrace for cpu 5
INFO: NMI handler (ffffffff8102b3e0) took too long to run: 27.305 msecs
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 3.14.15-0wheezy0-hs-kvm-grsec #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffff8803fd8fece0 ti: ffff8803fd8ff1d8 task.ti: ffff8803fd8ff1d8
RIP: 0010:[<ffffffff8102e686>] [<ffffffff8102e686>] ffffffff8102e686
RSP: 0000:ffff88040fd43e28 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000c00 RCX: 0000000000000006
RDX: 0000000000000c00 RSI: 0000000000000002 RDI: 0000000000000096
RBP: 00000000000000ff R08: 0000000000000086 R09: 0000000000000216
R10: 0000000000000002 R11: 0000000000000215 R12: 0000000000000005
R13: ffffffff81b7a298 R14: 00000000000030fd R15: ffffffff81b2d480
FS: 0000000000000000(0000) GS:ffff88040fd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 00000003fc95e000 CR4: 00000000000007b0
Stack:
0000000000000096 0000000000000002 0000000000000001 0000000000002710
ffffffff81b2d480 ffffffff8102b532 ffff88040fd4c180 ffffffff81105be5
0009299b00000003 0000000000000083 00000001f9d7a180 ffffffff81b2d708
Call Trace:
<IRQ>
[<ffffffff8102b532>] ? arch_trigger_all_cpu_backtrace+0x42/0x70
[<ffffffff81105be5>] ? rcu_check_callbacks+0x5d5/0x600
[<ffffffff8110ed20>] ? tick_nohz_handler+0xa0/0xa0
[<ffffffff810c784f>] ? update_process_times+0x3f/0x80
[<ffffffff8110ed62>] ? tick_sched_timer+0x42/0x70
[<ffffffff810df3b3>] ? __run_hrtimer.isra.30+0x53/0xe0
[<ffffffff810dfb57>] ? hrtimer_interrupt+0xf7/0x250
[<ffffffff810311bf>] ? kvm_clock_read+0x1f/0x30
[<ffffffff8102a176>] ? smp_apic_timer_interrupt+0x36/0x50
[<ffffffff81832cb1>] ? apic_timer_interrupt+0x71/0x80
<EOI>
[<ffffffff8100bb80>] ? hard_enable_TSC+0x20/0x20
[<ffffffff81031542>] ? native_safe_halt+0x2/0x10
[<ffffffff8100c299>] ? arch_cpu_idle+0x9/0x30
[<ffffffff8100bb85>] ? default_idle+0x5/0x10
[<ffffffff810fc97b>] ? cpu_startup_entry+0x7b/0x170
Code: 14 25 00 a3 5f ff 80 e6 10 75 f2 89 ea c1 e2 18 89 14 25 10 a3 5f ff 89 f2 09 da 80 cf 04 83 fe 02 0f 44 d3 89 14 25 00 a3 5f ff <57> 9d 66 66 90 66 90 48 83 c4 18 5b 5d c3 89 74 24 08 48 89 3c
INFO: NMI handler (ffffffff8102b3e0) took too long to run: 53.924 msecs
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.15-0wheezy0-hs-kvm-grsec #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffffffff81abd980 ti: ffffffff81abde78 task.ti: ffffffff81abde78
RIP: 0010:[<ffffffff81031542>] [<ffffffff81031542>] ffffffff81031542
RSP: 0000:ffffffff81ab3f10 EFLAGS: 00000286
RAX: ffffffff8100bb80 RBX: ffffffff81abde78 RCX: 0100000000000000
RDX: ffffffff81b2cfc0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffffff81b7a290 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000400 R12: ffffffff81abde78
R13: ffffffff81abde78 R14: ffffffff81abde78 R15: ffffffff81abde78
FS: 0000000000000000(0000) GS:ffff88040fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000062c20437a000 CR3: 00000003fc01f000 CR4: 00000000000007b0
Stack:
ffffffff8100bb85 ffffffff810fc97b 000000000008c800 ffffffff81c290c0
ffffffff81ab3fa0 ffffffff81a55000 000000000008c800 0000000000000000
0000000000000000 ffffffff81b9ed4b ffffffff81b9e7e7 0000000000000000
Call Trace:
[<ffffffff8100bb85>] ? default_idle+0x5/0x10
[<ffffffff810fc97b>] ? cpu_startup_entry+0x7b/0x170
[<ffffffff81b9ed4b>] ? 0xffffffff81b9ed4a
[<ffffffff81b9e7e7>] ? 0xffffffff81b9e7e6
[<ffffffff81b9e5e2>] ? 0xffffffff81b9e5e1
Code: 2e 0f 1f 84 00 00 00 00 00 fa c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 fb c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 fb f4 <c3> 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 66 66 66 66 66
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.15-0wheezy0-hs-kvm-grsec #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffff8803fd8fcb60 ti: ffff8803fd8fd058 task.ti: ffff8803fd8fd058
RIP: 0010:[<ffffffff81031542>] [<ffffffff81031542>] ffffffff81031542
RSP: 0000:ffff8803fd93fef8 EFLAGS: 00000296
RAX: ffffffff8100bb80 RBX: ffff8803fd8fd058 RCX: 0100000000000000
RDX: ffffffff81b2cfc0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffffff81b7a290 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000400 R12: ffff8803fd8fd058
R13: ffff8803fd8fd058 R14: ffff8803fd8fd058 R15: ffff8803fd8fd058
FS: 0000000000000000(0000) GS:ffff88040fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 00000000db608000 CR4: 00000000000007b0
Stack:
ffffffff8100bb85 ffffffff810fc97b 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8100bb85>] ? default_idle+0x5/0x10
[<ffffffff810fc97b>] ? cpu_startup_entry+0x7b/0x170
Code: 2e 0f 1f 84 00 00 00 00 00 fa c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 fb c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 fb f4 <c3> 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 66 66 66 66 66

.... Many more similar lines
================================================================================================
hssupport
 
Posts: 1
Joined: Fri Oct 10, 2014 7:42 am

Re: Kernel oops: rcu_sched detected stalls on CPUs/tasks

Postby spender » Sat Nov 01, 2014 8:41 am

Just to follow up here -- we believe this issue is related to that in another thread, which was resolved here:
viewtopic.php?f=1&t=4071&start=15#p14600

Please try the latest patches which we hope will fix this issue.

-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm
Location: VA, USA

Re: Kernel oops: rcu_sched detected stalls on CPUs/tasks

Postby hanno » Sat Nov 01, 2014 1:30 pm

I've updated kernel + grsec patch on the affected machine.

As the issue appeared on a very irregular basis and I never really was able to pinpoint what triggered it I can't judge whether it fixed the issue. But you can assume that if you won't hear from me in a while that it is fixed :-)
hanno
 
Posts: 26
Joined: Thu Dec 16, 2004 4:37 am


Return to grsecurity development