Web lists-archives.org

Re: INFO: task reiserfs/0:1322 blocked for more than 120 seconds




* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Sat, 16 Aug 2008 23:36:03 -0500 "Greg Donald" <gdonald@xxxxxxxxx> wrote:
> 
> > I got this while rsync'ng an NFS share onto a local disk:
> > 
> > [42374.151062] INFO: task reiserfs/0:1322 blocked for more than 120 seconds.
> > [42374.186295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [42374.229433] reiserfs/0    D c1f36180     0  1322      2
> > [42374.265246]        f5dbdedc 00000046 c1f36180 c1f36180 f5e932c0
> > 1c823428 00002669 f5e932c0
> > [42374.273706]        f5e93514 c1f36180 00000000 f5dbc000 f62cc780
> > f5e932c0 00000002 00000001
> > [42374.313709]        00000000 00000000 f5e932c0 c013cc01 00000246
> > f5dbded4 c013cbce e31e12ec
> > [42374.356837] Call Trace:
> > [42374.417842]  [<c013cc01>] ? trace_hardirqs_on+0xb/0xd
> > [42374.451201]  [<c013cbce>] ? trace_hardirqs_on_caller+0xe9/0x111
> > [42374.489735]  [<c02e876b>] mutex_lock_nested+0x14b/0x22b
> > [42374.525760]  [<c01c9727>] ? flush_commit_list+0x119/0x505
> > [42374.560839]  [<c01c9727>] flush_commit_list+0x119/0x505
> > [42374.594183]  [<c01cca8e>] flush_async_commits+0x41/0x4b
> > [42374.629770]  [<c012ec1a>] run_workqueue+0xc3/0x18e
> > [42374.662893]  [<c012ebfe>] ? run_workqueue+0xa7/0x18e
> > [42374.697814]  [<c01cca4d>] ? flush_async_commits+0x0/0x4b
> > [42374.732504]  [<c012f609>] ? worker_thread+0x0/0x8a
> > [42374.765765]  [<c012f688>] worker_thread+0x7f/0x8a
> > [42374.797749]  [<c0131d61>] ? autoremove_wake_function+0x0/0x38
> > [42374.833713]  [<c0131c93>] kthread+0x40/0x69
> > [42374.865772]  [<c0131c53>] ? kthread+0x0/0x69
> > [42374.897774]  [<c010392f>] kernel_thread_helper+0x7/0x10
> > [42374.929777]  =======================
> > [42374.957001] 3 locks held by reiserfs/0/1322:
> > [42374.990140]  #0:  (reiserfs){--..}, at: [<c012ebe1>] run_workqueue+0x8a/0x18e
> > [42375.025754]  #1:  (&(&journal->j_work)->work){--..}, at:
> > [<c012ebfe>] run_workqueue+0xa7/0x18e
> > [42375.062963]  #2:  (&jl->j_commit_mutex){--..}, at: [<c01c9727>]
> > flush_commit_list+0x119/0x505
> > 
> > 
> > I deleted a few GBs of data and ran it again but was unable to
> > reproduce it.  This was on 2.6.27-rc3.
> > 
> > I don't see any corruption.  Fluke?
> > 
> 
> Seems that about 100% of the reports we get of this warning triggering 
> are sys_sync, transaction commit, etc.
> 
> Does kerneloops.org disagree with me?
> 
> If not, I vote we kill it.

ok. How about quadrupling the timeout, as per the patch below?

more than 8 minutes uninterruptible wait, is that a reasonable limit?

I had this warning trigger a couple of times during development, 
alerting me to hung tasks.

	Ingo

------------------>
>From 3fb4198766c38aa03492cc3996475076073c22ea Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@xxxxxxx>
Date: Wed, 20 Aug 2008 11:17:40 +0200
Subject: [PATCH] softlockup: increase hung tasks check from 2 minutes to 8 minutes

Andrew says:

> Seems that about 100% of the reports we get of this warning triggering
> are sys_sync, transaction commit, etc.

increase the timeout. If it still triggers for people, we can kill it.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
 kernel/softlockup.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index b75b492..17a0580 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -164,7 +164,7 @@ unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
 /*
  * Zero means infinite timeout - no checking done:
  */
-unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
+unsigned long __read_mostly sysctl_hung_task_timeout_secs = 480;
 
 unsigned long __read_mostly sysctl_hung_task_warnings = 10;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/