Page 1 of 1

Soft lockup with 3.6.6

PostPosted: Thu Nov 08, 2012 10:15 pm
by moseleymark
Got this soft lockup with latest 3.6.6 patch (as well as the patch from 11/05) on Debian Squeeze 64-bit. It only happens during shutdown/reboot, and never does recover. It also seems to only happen if the ACL system is enabled. This box is doing nothing besides serving iSCSI via the LIO stack.

It's long and apparently hitting the post limit. There's more for gradm -D/-R, so I'll post again. I didn't want to leave out any output that might be useful.

[ 1426.493127] BUG: soft lockup - CPU#0 stuck for 22s! [nscd:2541]
[ 1426.499860] Modules linked in: tcm_loop mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler dell_rbu nfsv3 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc target_core_iblock target_core_file target_core_pscsi iscsi_target_mod target_core_mod coretemp joydev kvm_intel hid_generic kvm ghash_clmulni_intel sb_edac edac_core dcdbas aesni_intel ablk_helper cryptd aes_x86_64 microcode tpm_tis mei pcspkr wmi acpi_power_meter acpi_pad lpc_ich usbhid hid raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear tg3 ses enclosure [last unloaded: ipmi_si]
[ 1426.567300] CPU 0
[ 1426.569435] Pid: 2541, comm: nscd Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 1426.579762] RIP: 0010:[<ffffffff81329de7>] [<ffffffff81329de7>] lookup_acl_obj_label.isra.3+0x47/0x90
[ 1426.590367] RSP: 0018:ffff880225559d18 EFLAGS: 00000246
[ 1426.596394] RAX: 00000034c523bd48 RBX: ffffffff8177ce3b RCX: 000000000000000d
[ 1426.604458] RDX: 0000000000000009 RSI: 0000000000800003 RDI: 0000000000055aae
[ 1426.612528] RBP: ffff880225559d38 R08: 0000000000000000 R09: 0000000000000000
[ 1426.620592] R10: fdbdb5bb42ab7402 R11: 0000000000000017 R12: ffffffff812809e1
[ 1426.628656] R13: ffff880225559ce8 R14: ffff880224199c24 R15: 0000000000000000
[ 1426.636726] FS: 000002d309dde700(0000) GS:ffff880227c00000(0000) knlGS:0000000000000000
[ 1426.645885] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1426.652398] CR2: ffffffffff600400 CR3: 0000000001792000 CR4: 00000000000407f0
[ 1426.660469] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1426.668533] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1426.672852] BUG: soft lockup - CPU#1 stuck for 22s! [bash:3036]
[ 1426.672886] Modules linked in: tcm_loop mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler dell_rbu nfsv3 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc target_core_iblock target_core_file target_core_pscsi iscsi_target_mod target_core_mod coretemp joydev kvm_intel hid_generic kvm ghash_clmulni_intel sb_edac edac_core dcdbas aesni_intel ablk_helper cryptd aes_x86_64 microcode tpm_tis mei pcspkr wmi acpi_power_meter acpi_pad lpc_ich usbhid hid raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear tg3 ses enclosure [last unloaded: ipmi_si]
[ 1426.672890] CPU 1
[ 1426.672890] Pid: 3036, comm: bash Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 1426.672900] RIP: 0010:[<ffffffff8177cada>] [<ffffffff8177cada>] _raw_spin_lock+0x2a/0x40
[ 1426.672901] RSP: 0018:ffff880224311b38 EFLAGS: 00000202
[ 1426.672902] RAX: 000000000000c8bd RBX: ffffffff8177ce3b RCX: 0000000000000000
[ 1426.672903] RDX: 000000000000c8be RSI: ffff88022683c920 RDI: ffffffff81e043c4
[ 1426.672904] RBP: ffff880224311b48 R08: 0000000000000001 R09: 2f2f2f2f2f2f2f2f
[ 1426.672905] R10: 0301000003010000 R11: 8cdadfc58cdaffd1 R12: ffffffffa00f6c6d
[ 1426.672907] R13: ffff880224311ab8 R14: ffff880225328000 R15: 00000000ffffff9c
[ 1426.672908] FS: 000002dd5c7d0700(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000
[ 1426.672910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1426.672911] CR2: 0000000000b99008 CR3: 0000000001793000 CR4: 00000000000407f0
[ 1426.672912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1426.672913] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1426.672915] Process bash (pid: 3036, threadinfo ffff880225328530, task ffff880225328000)
[ 1426.672915] Stack:
[ 1426.672919] ffff880224311b58 ffff88022683c920 ffff880224311bc8 ffffffff8132b24e
[ 1426.672921] ffff880224311b78 ffffffff811968ca ffff880224599a00 0000000000000001
[ 1426.672923] ffff880224311b98 ffff88042631dd40 0000000124311bc8 0000000000000000
[ 1426.672924] Call Trace:
[ 1426.672931] [<ffffffff8132b24e>] __chk_obj_label+0x3e/0x330
[ 1426.672937] [<ffffffff811968ca>] ? __inode_permission+0x4a/0xd0
[ 1426.672939] [<ffffffff8132e53a>] gr_search_file+0x5a/0x190
[ 1426.672944] [<ffffffff813327e7>] gr_acl_handle_hidden_file+0x37/0xc0
[ 1426.672946] [<ffffffff81197984>] path_lookupat+0x1c4/0x880
[ 1426.672949] [<ffffffff81198149>] do_path_lookup+0x109/0x130
[ 1426.672952] [<ffffffff8119b398>] user_path_at_empty+0x68/0xc0
[ 1426.672954] [<ffffffff8119b42b>] user_path_at+0x3b/0x60
[ 1426.672959] [<ffffffff8118d231>] vfs_fstatat+0x61/0xa0
[ 1426.672961] [<ffffffff8118d2dd>] vfs_stat+0x2d/0x40
[ 1426.672964] [<ffffffff8118d47c>] sys_newstat+0x2c/0x50
[ 1426.672969] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d
[ 1426.672989] Code: 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 89 fb b8 00 00 01 00 f0 0f c1 03 89 c2 c1 ea 10 66 39 c2 74 0e 0f 1f 40 00 f3 90 <0f> b7 03 66 39 c2 75 f6 48 83 c4 08 5b 5d 48 0f ba 2c 24 3f c3
[ 1426.692822] BUG: soft lockup - CPU#2 stuck for 22s! [K20nscd:3147]
[ 1426.692847] Modules linked in: tcm_loop mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler dell_rbu nfsv3 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc target_core_iblock target_core_file target_core_pscsi iscsi_target_mod target_core_mod coretemp joydev kvm_intel hid_generic kvm ghash_clmulni_intel sb_edac edac_core dcdbas aesni_intel ablk_helper cryptd aes_x86_64 microcode tpm_tis mei pcspkr wmi acpi_power_meter acpi_pad lpc_ich usbhid hid raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear tg3 ses enclosure [last unloaded: ipmi_si]
[ 1426.692851] CPU 2
[ 1426.692851] Pid: 3147, comm: K20nscd Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 1426.692858] RIP: 0010:[<ffffffff81346283>] [<ffffffff81346283>] __read_lock_failed+0x13/0x30
[ 1426.692859] RSP: 0018:ffff8802253a9ad8 EFLAGS: 00000297
[ 1426.692860] RAX: ffff880224433bc0 RBX: ffffffff8177ce3b RCX: 0000000000800003
[ 1426.692861] RDX: 0000000000000002 RSI: ffff88022481e320 RDI: ffffffff820897a0
[ 1426.692862] RBP: ffff8802253a9ad8 R08: ffff880225c43000 R09: ffff8802253a9b90
[ 1426.692863] R10: 0301000003010000 R11: 8cdadfc58cdaffd1 R12: ffffffffffff4111
[ 1426.692864] R13: ffffffffffff4111 R14: ffffffffffff4111 R15: ffffffffffff4111
[ 1426.692866] FS: 000003133a84f700(0000) GS:ffff880227c20000(0000) knlGS:0000000000000000
[ 1426.692867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1426.692869] CR2: 00000000006e7278 CR3: 0000000001794000 CR4: 00000000000407f0
[ 1426.692870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1426.692871] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1426.692872] Process K20nscd (pid: 3147, threadinfo ffff88022532e2f0, task ffff88022532ddc0)
[ 1426.692873] Stack:
[ 1426.692876] ffff8802253a9af8 ffffffff8177cc55 ffffffffffff4111 0000000000000002
[ 1426.692878] ffff8802253a9b48 ffffffff8132b145 ffffffffffff4111 ffff880224433bc0
[ 1426.692880] ffffffffffff4111 ffff88022481e320 ffff880224433bc0 0000000000000000
[ 1426.692881] Call Trace:
[ 1426.692887] [<ffffffff8177cc55>] _raw_read_lock+0x25/0x40
[ 1426.692890] [<ffffffff8132b145>] __full_lookup+0x35/0x100
[ 1426.692893] [<ffffffff8132b3b4>] __chk_obj_label+0x1a4/0x330
[ 1426.692897] [<ffffffff811968ca>] ? __inode_permission+0x4a/0xd0
[ 1426.692900] [<ffffffff8132e53a>] gr_search_file+0x5a/0x190
[ 1426.692903] [<ffffffff813327e7>] gr_acl_handle_hidden_file+0x37/0xc0
[ 1426.692905] [<ffffffff81197984>] path_lookupat+0x1c4/0x880
[ 1426.692910] [<ffffffff81179b12>] ? kmem_cache_alloc+0x132/0x140
[ 1426.692912] [<ffffffff81198149>] do_path_lookup+0x109/0x130
[ 1426.692915] [<ffffffff8119b398>] user_path_at_empty+0x68/0xc0
[ 1426.692918] [<ffffffff8119b42b>] user_path_at+0x3b/0x60
[ 1426.692921] [<ffffffff8118d231>] vfs_fstatat+0x61/0xa0
[ 1426.692924] [<ffffffff8118d2dd>] vfs_stat+0x2d/0x40
[ 1426.692927] [<ffffffff8118d47c>] sys_newstat+0x2c/0x50
[ 1426.692930] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d
[ 1426.692950] Code: 09 f0 81 07 00 00 10 00 cd 04 75 d0 5d 48 0f ba 2c 24 3f c3 0f 1f 40 00 55 48 89 e5 f0 ff 07 71 05 f0 ff 0f cd 04 f3 90 83 3f 01 <78> f9 f0 ff 0f 71 05 f0 ff 07 cd 04 78 e3 5d 48 0f ba 2c 24 3f
[ 1427.320109] Process nscd (pid: 2541, threadinfo ffff880225550530, task ffff880225550000)
[ 1427.329260] Stack:
[ 1427.331600] ffff880225e79c00 000000000000007f 000000000000007f ffff880224038080
[ 1427.340326] ffff880225559d88 ffffffff8132d2f8 ffff88022416e1e0 00800003811a3ac4
[ 1427.349051] 0000000000000000 0000000000055aae 0000000000800003 0000000000055aae
[ 1427.357768] Call Trace:
[ 1427.360596] [<ffffffff8132d2f8>] do_handle_delete.isra.20+0xa8/0x130
[ 1427.367886] [<ffffffff8132dd3e>] gr_handle_delete+0x4e/0x80
[ 1427.374302] [<ffffffff81198a17>] do_unlinkat+0x227/0x250
[ 1427.380427] [<ffffffff8119ba4e>] sys_unlink+0x1e/0x30
[ 1427.386262] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d
[ 1427.393068] [<ffffffff81785c1f>] ? sysret_check+0x22/0x5d
[ 1427.399288] Code: 48 89 da 48 89 d8 48 c1 e0 0d 48 c1 e2 17 44 89 f1 48 01 c2 44 89 e0 c1 e0 09 48 01 d0 44 89 e2 48 01 da 48 31 d0 31 d2 48 f7 f1 <49> 8b 44 d5 00 48 85 c0 75 10 eb 2d 0f 1f 44 00 00 48 8b 40 30
[ 1430.706518] BUG: soft lockup - CPU#3 stuck for 22s! [init:1]
[ 1430.712937] Modules linked in: tcm_loop mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler dell_rbu nfsv3 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc target_core_iblock target_core_file target_core_pscsi iscsi_target_mod[ 1430.736471] BUG: soft lockup - CPU#5 stuck for 22s! [snmpd:1815]
[ 1430.736472] Modules linked in: tcm_loop mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler dell_rbu nfsv3 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc target_core_iblock target_core_file target_core_pscsi iscsi_target_mod target_core_mod coretemp joydev kvm_intel hid_generic kvm ghash_clmulni_intel sb_edac edac_core dcdbas aesni_intel ablk_helper cryptd aes_x86_64 microcode tpm_tis mei pcspkr wmi acpi_power_meter acpi_pad lpc_ich usbhid hid raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear tg3 ses enclosure [last unloaded: ipmi_si]
[ 1430.736504] CPU 5 [ 1430.736507] Pid: 1815, comm: snmpd Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 1430.736508] RIP: 0010:[<ffffffff811a568f>] [<ffffffff811a568f>] d_lookup+0x1f/0x60
[ 1430.736514] RSP: 0018:ffff880224a49b88 EFLAGS: 00000202
[ 1430.736515] RAX: ffff880223880000 RBX: ffffffff8177ce3b RCX: ffff880224a49ca7
[ 1430.736516] RDX: 0000000000000101 RSI: ffff880224a49da8 RDI: ffff880227408240
[ 1430.736517] RBP: ffff880224a49ba8 R08: ffffc90000002000 R09: ffff880224a49da8
[ 1430.736518] R10: ffff880424543006 R11: ffff880227408240 R12: ffffffffffff4111
[ 1430.736519] R13: ffffffffffff4111 R14: ffffffffffff4111 R15: ffffffffffff4111
[ 1430.736521] FS: 000002f203756700(0000) GS:ffff88042fc40000(0000) knlGS:0000000000000000
[ 1430.736522] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1430.736523] CR2: 000003133a8619b8 CR3: 0000000001797000 CR4: 00000000000407f0
[ 1430.736524] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1430.736525] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1430.736527] Process snmpd (pid: 1815, threadinfo ffff880223880530, task ffff880223880000)
[ 1430.736527] Stack:
[ 1430.736528] ffff880224a49bf8 ffff880227408240 ffff880224a49da8 0000000000000101
[ 1430.736531] ffff880224a49bf8 ffffffff8119539c ffff880224a49c28 ffff880426000040
[ 1430.736533] ffff880224a49be8 ffff880224a49d98 ffff880224a49e78 ffff880426000040
[ 1430.736536] Call Trace:
[ 1430.736537] [<ffffffff8119539c>] lookup_dcache+0x3c/0xe0
[ 1430.736540] [<ffffffff81199de8>] do_last+0x448/0x1150
[ 1430.736543] [<ffffffff81196980>] ? inode_permission+0x30/0x70
[ 1430.736546] [<ffffffff8119aba6>] path_openat+0xb6/0x500
[ 1430.736549] [<ffffffff8133606e>] ? gr_log_resource+0x2e/0x110
[ 1430.736552] [<ffffffff81329807>] ? gr_learn_resource+0x47/0x1e0
[ 1430.736555] [<ffffffff8119b49d>] do_filp_open+0x4d/0xb0
[ 1430.736558] [<ffffffff811aa5bf>] ? alloc_fd+0x4f/0x140
[ 1430.736561] [<ffffffff81186a26>] do_sys_open+0x106/0x260
[ 1430.736565] [<ffffffff81186bb9>] sys_open+0x39/0x50
[ 1430.736567] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d
[ 1430.736570] [<ffffffff81785c1f>] ? sysret_check+0x22/0x5d
[ 1430.736573] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 49 89 fd 49 89 f4 <8b> 1d 2b ed c5 00 f6 c3 01 75 2c 4c 89 e6 4c 89 ef e8 4b fe ff

[ 1431.061813] target_core_mod coretemp joydev kvm_intel hid_generic kvm ghash_clmulni_intel sb_edac edac_core dcdbas aesni_intel ablk_helper cryptd aes_x86_64 microcode tpm_tis mei pcspkr wmi acpi_power_meter acpi_pad lpc_ich usbhid hid raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear tg3 ses enclosure [last unloaded: ipmi_si]
[ 1431.104696] CPU 3
[ 1431.106825] Pid: 1, comm: init Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 1431.116866] RIP: 0010:[<ffffffff8177cadd>] [<ffffffff8177cadd>] _raw_spin_lock+0x2d/0x40
[ 1431.126206] RSP: 0018:ffff88022683fb18 EFLAGS: 00000202
[ 1431.132234] RAX: 000000000000c8bd RBX: ffffffff8177ce3b RCX: 0000000000000000
[ 1431.140297] RDX: 000000000000c8bf RSI: ffff88022481e120 RDI: ffffffff81e043c4
[ 1431.148369] RBP: ffff88022683fb28 R08: 0000000000000001 R09: 0000000000000000
[ 1431.156432] R10: ffff880426246900 R11: ff938b9c8b969196 R12: ffffffff81194b7a
[ 1431.164493] R13: ffff88022683fa98 R14: ffff880426008000 R15: 0000000000000000
[ 1431.172558] FS: 000002f9ea9427a0(0000) GS:ffff88042fc20000(0000) knlGS:0000000000000000
[ 1431.181717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1431.188229] CR2: ffffffffff600400 CR3: 0000000001795000 CR4: 00000000000407f0
[ 1431.196293] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1431.204356] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1431.212418] Process init (pid: 1, threadinfo ffff880226850530, task ffff880226850000)
[ 1431.221276] Stack:
[ 1431.223616] ffff88022683fb38 ffff88022481e120 ffff88022683fba8 ffffffff8132b24e
[ 1431.232351] ffff88022683fb58 ffff880426246908 ffff88022683fbb8 ffffffff811a556a
[ 1431.241085] ffff88022683fb78 ffffffff81196980 000000012683fd18 0000000000000000
[ 1431.249821] Call Trace:
[ 1431.252657] [<ffffffff8132b24e>] __chk_obj_label+0x3e/0x330
[ 1431.259064] [<ffffffff811a556a>] ? __d_lookup+0x7a/0x180
[ 1431.265190] [<ffffffff81196980>] ? inode_permission+0x30/0x70
[ 1431.271800] [<ffffffff8132e53a>] gr_search_file+0x5a/0x190
[ 1431.278118] [<ffffffff81196705>] ? lookup_fast+0x1e5/0x360
[ 1431.284437] [<ffffffff813327e7>] gr_acl_handle_hidden_file+0x37/0xc0
[ 1431.291726] [<ffffffff81197984>] path_lookupat+0x1c4/0x880
[ 1431.298044] [<ffffffff81198149>] do_path_lookup+0x109/0x130
[ 1431.304459] [<ffffffff8119b398>] user_path_at_empty+0x68/0xc0
[ 1431.311069] [<ffffffff8119b42b>] user_path_at+0x3b/0x60
[ 1431.317099] [<ffffffff8118d231>] vfs_fstatat+0x61/0xa0
[ 1431.323029] [<ffffffff8118d2dd>] vfs_stat+0x2d/0x40
[ 1431.328668] [<ffffffff8118d47c>] sys_newstat+0x2c/0x50
[ 1431.334598] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d

Re: Soft lockup with 3.6.6

PostPosted: Thu Nov 08, 2012 10:16 pm
by moseleymark
If I try to disable the ACL system or reload, it locks up with this (not completely frozen but terminal is out to lunch). It only locked up for -R after prompting for the admin pw. It locked up with -D without even prompting me, which seemed odd.

[ 176.623585] BUG: soft lockup - CPU#1 stuck for 22s! [gradm:2872]
[ 176.630399] Modules linked in: tcm_loop mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler dell_rbu nfsv3 nfsd nfs_acl auth_rpcgss nfs fscache lockd]
[ 176.696182] CPU 1
[ 176.698320] Pid: 2872, comm: gradm Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 176.708734] RIP: 0010:[<ffffffff8177162a>] [<ffffffff8177162a>] __slab_free+0x106/0x2dc
[ 176.718001] RSP: 0018:ffff880423613d28 EFLAGS: 00000202
[ 176.724018] RAX: 0000000000000000 RBX: ffffffff8177ce3b RCX: 0000000180202892
[ 176.732090] RDX: 0000000180202893 RSI: ffffea000894c700 RDI: ffff880227802800
[ 176.740154] RBP: ffff880423613dd8 R08: 0000000000000001 R09: 0000000000000000
[ 176.748224] R10: ffffffff8132a695 R11: 0000000000000000 R12: ffffffff8177ce3b
[ 176.756295] R13: 0000000000000001 R14: ffffffff8177ce3b R15: 0000000000000001
[ 176.764366] FS: 0000030976734700(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000
[ 176.773521] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 176.780037] CR2: 00000000026ada68 CR3: 0000000001793000 CR4: 00000000000407f0
[ 176.780038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 176.780039] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 176.780041] Process gradm (pid: 2872, threadinfo ffff880423840530, task ffff880423840000)
[ 176.780042] Stack:
[ 176.780045] 0000000000000001 ffffffff8177ce3b 0000000000000001 ffffffff8177ce3b
[ 176.780048] 0000000000000000 ffffffff8132a695 0000000000000001 ffffffff8177ce3b
[ 176.780050] 0000000000000000 ffffffff8132a695 ffffea0000000000 0000000180202892
[ 176.780050] Call Trace:
[ 176.780058] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 176.780060] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 176.780066] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 176.780068] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 176.780070] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 176.780072] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 176.780079] [<ffffffff8117911b>] kfree+0x13b/0x170
[ 176.780081] [<ffffffff8132a695>] free_variables+0x165/0x340
[ 176.780085] [<ffffffff81330798>] write_grsec_handler+0xba8/0xd60
[ 176.780089] [<ffffffff81187980>] vfs_write+0xf0/0x210
[ 176.780092] [<ffffffff81187dbe>] sys_write+0x4e/0xa0
[ 176.780096] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d
[ 176.780122] Code: e8 1c b1 00 00 4c 8b 4d 80 44 8b 85 70 ff ff ff 48 89 45 88 40 8a b5 68 ff ff ff 48 8b 95 78 ff ff ff 41 f6 45 0b 40 48 8b 4d a8 <7
[ 204.579746] BUG: soft lockup - CPU#1 stuck for 22s! [gradm:2872]
[ 204.586551] Modules linked in: tcm_loop mptctl mptbase ipmi_si ipmi_devintf ipmi_msghandler dell_rbu nfsv3 nfsd nfs_acl auth_rpcgss nfs fscache lockd]
[ 204.651877] CPU 1
[ 204.654014] Pid: 2872, comm: gradm Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 204.664436] RIP: 0010:[<ffffffff8177159c>]
[<ffffffff8177159c>] __slab_free+0x78/0x2dc
[ 204.673716] RSP: 0018:ffff880423613d28 EFLAGS: 00000286
[ 204.679744] RAX: 0000000000000000 RBX: ffffffff8177ce3b RCX: ffffffff8132a695
[ 204.687807] RDX: 0000000180207771 RSI: ffffea000894c700 RDI: ffff880227802800
[ 204.695871] RBP: ffff880423613dd8 R08: 0000000000000001 R09: 0000000000000000
[ 204.703942] R10: ffffffff8132a695 R11: 0000000000000000 R12: ffffffff8177ce3b
[ 204.712006] R13: 0000000000000001 R14: ffffffff8177ce3b R15: 0000000000000001
[ 204.720071] FS: 0000030976734700(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000
[ 204.729221] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 204.735738] CR2: 00000000026ada68 CR3: 0000000001793000 CR4: 00000000000407f0
[ 204.743802] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 204.751873] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 204.759944] Process gradm (pid: 2872, threadinfo ffff880423840530, task ffff880423840000)
[ 204.769189] Stack:
[ 204.771530] 0000000000000001 ffffffff8177ce3b 0000000000000001 ffffffff8177ce3b
[ 204.780270] 0000000000000000 ffffffff8132a695 0000000000000000 0000000000000001
[ 204.789008] 0000780000000000 00000001802031c3 ffffffff7fffffff 0000000180207771
[ 204.797737] Call Trace:
[ 204.800565] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 204.807273] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 204.813979] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 204.820598] [<ffffffff8103a605>] ? __phys_addr+0x25/0x50
[ 204.826722] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 204.833333] [<ffffffff8117911b>] kfree+0x13b/0x170
[ 204.838875] [<ffffffff8132a695>] free_variables+0x165/0x340
[ 204.845291] [<ffffffff81330798>] write_grsec_handler+0xba8/0xd60
[ 204.852191] [<ffffffff81187980>] vfs_write+0xf0/0x210
[ 204.858026] [<ffffffff81187dbe>] sys_write+0x4e/0xa0
[ 204.863769] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d
[ 204.870571] Code: 45 31 c9 49 8b 54 24 18 4d 8b 74 24 10 49 63 45 20 48 89 55 a8 44 8b 7d a8 44 8a 45 ab 4c 89 34 03 41 ff cf 41 c0 e8 07 4d 85 f6 <6
[ 212.647084] INFO: rcu_sched self-detected stall on CPU { 1} (t=6000 jiffies)
[ 212.655521] sending NMI to all CPUs:
[ 212.659662] NMI backtrace for cpu 0
[ 212.663662] CPU 0
[ 212.665799] Pid: 0, comm: swapper/0 Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 212.676330] RIP: 0010:[<ffffffff813adc04>] [<ffffffff813adc04>] intel_idle+0xb4/0x150
[ 212.685385] RSP: 0018:ffffffff81e01de8 EFLAGS: 00000046
[ 212.687032] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 3, t=6004 jiffies)
[ 212.701253] RAX: 0000000000000030 RBX: 0000000000000010 RCX: 0000000000000001
[ 212.709322] RDX: 0000000000000000 RSI: ffffffff81e0af50 RDI: ffffffff81e0de00
[ 212.717390] RBP: ffffffff81e01e48 R08: 00000000fffffffb R09: 0000000000000000
[ 212.725460] R10: 00000000ffffddde R11: 0000000000000000 R12: 0000000000000004
[ 212.733529] R13: ffff880227c17470 R14: 0000000000000030 R15: 12c4c8681b4cb6c0
[ 212.741599] FS: 0000000000000000(0000) GS:ffff880227c00000(0000) knlGS:0000000000000000
[ 212.750758] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 212.757266] CR2: ffffffffff600400 CR3: 0000000001792000 CR4: 00000000000407f0
[ 212.765336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 212.773406] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 212.781477] Process swapper/0 (pid: 0, threadinfo ffffffff81e0af50, task ffffffff81e0aa20)
[ 212.790824] Stack:
[ 212.793169] ffffffff81e01e08 0000000000000092 0000000000000000 0000000002f8d8ec
[ 212.801914] ffffffff81e01e68 000000008208d540 ffffffff81e01e78 ffff880227c17470
[ 212.810659] ffffffff8208d540 0000000000000004 0000000000000000 8e17165e32eada0c
[ 212.819394] Call Trace:
[ 212.822239] [<ffffffff815ea4f2>] cpuidle_enter+0x42/0x60
[ 212.828371] [<ffffffff815eac74>] cpuidle_idle_call+0xc4/0x2f0
[ 212.834989] [<ffffffff8100cefa>] cpu_idle+0xca/0x120
[ 212.840731] [<ffffffff8173c996>] rest_init+0x72/0x7c
[ 212.846482] [<ffffffff82212282>] start_kernel+0x45d/0x472
[ 212.852709] [<ffffffff82211a15>] ? repair_env_string+0xa5/0xa5
[ 212.859423] [<ffffffff82211120>] ? early_idt_handlers+0x120/0x120
[ 212.866427] [<ffffffff82211120>] ? early_idt_handlers+0x120/0x120
[ 212.873430] [<ffffffff82211337>] x86_64_start_reservations+0x186/0x191
[ 212.880919] [<ffffffff822114af>] x86_64_start_kernel+0x16d/0x182
[ 212.887822] Code: c8 a8 00 00 48 8b 46 08 a8 08 75 1f 48 8d 46 08 31 d2 48 89 d1 0f 01 c8 0f ae f0 48 8b 46 08 a8 08 75 08 b1 01 4c 89 f0 0f 01 c9 <e
[ 212.914944] NMI backtrace for cpu 1
[ 212.918948] CPU 1
[ 212.918948] Pid: 2872, comm: gradm Tainted: G W 3.6.6 #1 Dell Inc. PowerEdge R420/072XWF
[ 212.918955] RIP: 0010:[<ffffffff81344d46>] [<ffffffff81344d46>] delay_tsc+0x46/0x70
[ 212.918956] RSP: 0018:ffff88042fc03ca8 EFLAGS: 00000092
[ 212.918957] RAX: 000000002cdac8f0 RBX: 000000000002bf24 RCX: 000000002cdac8d0
[ 212.918958] RDX: 0000000000000091 RSI: 0000000000000001 RDI: 000000000002bf24
[ 212.918959] RBP: ffff88042fc03cb8 R08: ffffffff820e0c20 R09: 0000000000000000
[ 212.918960] R10: 00000000000004a9 R11: 00000000000004a8 R12: 8000000000000000
[ 212.918961] R13: 0000000000001000 R14: 0000000000000400 R15: 0000000000000096
[ 212.918963] FS: 0000030976734700(0000) GS:ffff88042fc00000(0000) knlGS:0000000000000000
[ 212.918964] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 212.918965] CR2: 00000000026ada68 CR3: 0000000001793000 CR4: 00000000000407f0
[ 212.918966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 212.918967] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 212.918968] Process gradm (pid: 2872, threadinfo ffff880423840530, task ffff880423840000)
[ 212.918968] Stack:
[ 212.918971] ffff88042fc03d28 0000000000068dbc ffff88042fc03cd8 ffffffff81344c4e
[ 212.918973] 0000000080340001 00000000000003e9 ffff88042fc03d08 ffffffff81028b4a
[ 212.918975] ffffffff8205a340 0000000000000002 000000000000a06a ffffffff820e0c20
[ 212.918976] Call Trace:
[ 212.918978] <IRQ>
[ 212.918980] [<ffffffff81344c4e>] __const_udelay+0x3e/0x50
[ 212.918987] [<ffffffff81028b4a>] native_safe_apic_wait_icr_idle+0x3a/0x80
[ 212.918991] [<ffffffff8102a0c3>] default_send_IPI_mask_sequence_phys+0xd3/0xe0
[ 212.918994] [<ffffffff8102ea9e>] physflat_send_IPI_all+0x1e/0x30
[ 212.918997] [<ffffffff8102a284>] arch_trigger_all_cpu_backtrace+0x74/0xc0
[ 212.919004] [<ffffffff810e8d30>] rcu_pending+0x100/0x5d0
[ 212.919007] [<ffffffff810e9ac1>] rcu_check_callbacks+0xa1/0x190
[ 212.919014] [<ffffffff8105d849>] update_process_times+0x49/0x90
[ 212.919027] [<ffffffff810a473c>] tick_sched_timer+0x6c/0xe0
[ 212.919032] [<ffffffff8107951e>] __run_hrtimer+0x7e/0x210
[ 212.919034] [<ffffffff810a46d0>] ? tick_nohz_handler+0x110/0x110
[ 212.919037] [<ffffffff81079f67>] hrtimer_interrupt+0xf7/0x230
[ 212.919041] [<ffffffff81786efc>] ? call_softirq+0x1c/0x30
[ 212.919043] [<ffffffff81079e70>] ? hrtimer_get_next_event+0xf0/0xf0
[ 212.919046] [<ffffffff8178774d>] smp_apic_timer_interrupt+0x8d/0xcc
[ 212.919049] [<ffffffff817866b9>] apic_timer_interrupt+0x79/0x80
[ 212.919050] <EOI>
[ 212.919052] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 212.919055] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 212.919057] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 212.919059] [<ffffffff81771582>] ? __slab_free+0x5e/0x2dc
[ 212.919061] [<ffffffff8177154e>] ? __slab_free+0x2a/0x2dc
[ 212.919063] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 212.919066] [<ffffffff8177ce3b>] ? retint_restore_args+0x6/0xd
[ 212.919068] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 212.919070] [<ffffffff8103a5e9>] ? __phys_addr+0x9/0x50
[ 212.919072] [<ffffffff8132a695>] ? free_variables+0x165/0x340
[ 212.919075] [<ffffffff8117911b>] kfree+0x13b/0x170
[ 212.919077] [<ffffffff8132a695>] free_variables+0x165/0x340
[ 212.919080] [<ffffffff81330798>] write_grsec_handler+0xba8/0xd60
[ 212.919082] [<ffffffff81187980>] vfs_write+0xf0/0x210
[ 212.919084] [<ffffffff81187dbe>] sys_write+0x4e/0xa0
[ 212.919087] [<ffffffff81785bf3>] system_call_fastpath+0x18/0x1d
[ 212.919106] Code: 0f 1f 00 0f ae e8 0f 31 89 c1 eb 17 66 0f 1f 84 00 00 00 00 00 f3 90 65 8b 14 25 64 a0 00 00 39 d6 75 1d 0f 1f 00 0f ae e8 0f 31 <8

<and then pages and pages more for the other CPUs>

and after a while said: [ 297.366595] Fixing recursive fault but reboot is needed!

Re: Soft lockup with 3.6.6

PostPosted: Fri Nov 09, 2012 8:26 am
by spender
Hi Mark,

Thanks for the report -- I'll look into this. To clarify, these are two separate issues? One with RBAC enabled and the system itself shutting down/rebooting, and the other with disabling/reloading RBAC? Also, what was the last patch you used without these issues?

Also, if it's easily reproduced, could you boot with LOCKDEP and all other lock debugging features enabled? Otherwise, I'll need the vmlinux to pinpoint exactly what locks were involved. Can you also mail me the full logs for the issues?

-Brad

Re: Soft lockup with 3.6.6

PostPosted: Fri Nov 09, 2012 1:26 pm
by moseleymark
Both issues are on the same box and same kernel. This is the first 3.6.x kernel I've really used in any significant fashion. The last I've used was (I think) 3.5.6, so I've missed a bit.

I'm recompiling now with lock debugging on (though btw, does that include mutex debugging), and will grab the vmlinux this time around. Once I get that compiled/installed, I'll send in the logs. Not much of the previous soft lockups' tracebacks actually seem to have made it to the logs, compared to the tons of output on console, so I might need to copy/paste console logs, which will be a lot messier, so apologies in advance :)

Re: Soft lockup with 3.6.6

PostPosted: Tue Nov 13, 2012 8:46 am
by spender
Just to update, this issue should be resolved in the latest patches.

-Brad