Page 1 of 1

PAX: refcount overflow occured at: __netif_receive_skb

PostPosted: Fri Jan 27, 2012 10:21 am
by fabled
Hi,

Just got a "PAX: refcount overflow occured at: __netif_receive_skb+0x425/0x43c" on one of my routers. It would appear that net/core/dev.c does atomic_(long_)inc:s on some statistics e.g. rx_dropped, and these are not properly converted to atomic_*_unchecked causing the abort.

The same problem would appear to be present in the latest patches too. Could you please audit the PAX patch set for atomic_* usage, to see if further conversions to atomic_*_unchecked is needed.

Thanks.

Re: PAX: refcount overflow occured at: __netif_receive_skb

PostPosted: Fri Jan 27, 2012 10:35 am
by ncopa
This might explain why some of those long-running boxes reboots once in a while...

I would be interested in an incremental patch somehow if possible since I will need to backport the fix to the 2.6.38.y kernel (shipped with Alpine Linux v2.2) and 3.0.y kernel (shipped with Alpine Linux v2.3).

Thanks!

Re: PAX: refcount overflow occured at: __netif_receive_skb

PostPosted: Fri Jan 27, 2012 7:15 pm
by PaX Team
fabled wrote:Just got a "PAX: refcount overflow occured at: __netif_receive_skb+0x425/0x43c" on one of my routers. It would appear that net/core/dev.c does atomic_(long_)inc:s on some statistics e.g. rx_dropped, and these are not properly converted to atomic_*_unchecked causing the abort.
thanks for the report, that'd be a 63 bit count overflow on 64 bit archs! ;)
The same problem would appear to be present in the latest patches too. Could you please audit the PAX patch set for atomic_* usage, to see if further conversions to atomic_*_unchecked is needed.
we actually did audit the kernel but this one slipped through and there may be more unfortunately, it's not exactly easy to find these instances without static analysis of the entire tree. one of these days i'll put my LTO work to this use and write a plugin that'll automatically find every candidate but until then i'm afraid we're at the mercy of manual audits and user reports...

Re: PAX: refcount overflow occured at: __netif_receive_skb

PostPosted: Fri Jan 27, 2012 10:01 pm
by spender
This is fixed in the latest test patch.

-Brad