Web lists-archives.org

Re: [RFC][PATCH] CPUSets: Move most calls to rebuild_sched_domains() to the workqueue




Paul Menage wrote:
On Thu, Jun 26, 2008 at 2:34 AM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
On Thu, Jun 26, 2008 at 9:56 AM, Paul Menage <menage@xxxxxxxxxx> wrote:
CPUsets: Move most calls to rebuild_sched_domains() to the workqueue

In the current cpusets code the lock nesting between cgroup_mutex and
cpuhotplug.lock when calling rebuild_sched_domains is inconsistent -
in the CPU hotplug path cpuhotplug.lock nests outside cgroup_mutex,
and in all other paths that call rebuild_sched_domains() it nests
inside.

This patch makes most calls to rebuild_sched_domains() asynchronous
via the workqueue, which removes the nesting of the two locks in that
case. In the case of an actual hotplug event, cpuhotplug.lock nests
outside cgroup_mutex as now.

Signed-off-by: Paul Menage <menage@xxxxxxxxxx>

---

Note that all I've done with this patch is verify that it compiles
without warnings; I'm not sure how to trigger a hotplug event to test
the lock dependencies or verify that scheduler domain support is still
behaving correctly.
You can just do:
	echo 0 > /sys/devices/cpu/cpuN/online
	echo 1 > /sys/devices/cpu/cpuN/online

Vegard, does this fix the problems that you were
seeing? Paul/Max, does this still seem sane with regard to scheduler
domains?
Nope, sorry :-(

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.26-rc8-dirty #39
-------------------------------------------------------
bash/3510 is trying to acquire lock:
 (events){--..}, at: [<c0145690>] cleanup_workqueue_thread+0x10/0x70

but task is already holding lock:
 (&cpu_hotplug.lock){--..}, at: [<c015d9da>] cpu_hotplug_begin+0x1a/0x50

which lock already depends on the new lock.


Does that mean that you can't ever call get_online_cpus() from a
workqueue thread?

In general it should be ok (no different from user-space task calling it). But there is still circular dependency because we're calling into
domain partitioning code.
Below is more detailed lockdep report with your patch applied on top of -rc8.
Looks like this might be a good time to rethink overall locking in there.

[ INFO: possible circular locking dependency detected ]
2.6.26-rc8 #3
-------------------------------------------------------
bash/2836 is trying to acquire lock:
 (cgroup_mutex){--..}, at: [<ffffffff802653e2>] cgroup_lock+0x12/0x20

but task is already holding lock:
 (&cpu_hotplug.lock){--..}, at: [<ffffffff8025e062>] cpu_hotplug_begin+0x22/0x60

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&cpu_hotplug.lock){--..}:
       [<ffffffff8025988f>] __lock_acquire+0x9cf/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d11b4>] mutex_lock_nested+0x94/0x250
       [<ffffffff8025e024>] get_online_cpus+0x24/0x40
       [<ffffffff8022fee1>] sched_getaffinity+0x11/0x80
       [<ffffffff8026e5c9>] __synchronize_sched+0x19/0x90
       [<ffffffff8022ed46>] detach_destroy_domains+0x46/0x50
       [<ffffffff8022f6b9>] partition_sched_domains+0xf9/0x2b0
       [<ffffffff80268eea>] rebuild_sched_domains+0x9a/0x3e0
       [<ffffffff80269243>] delayed_rebuild_sched_domains+0x13/0x30
       [<ffffffff80247d5e>] run_workqueue+0xde/0x220
       [<ffffffff80247f00>] worker_thread+0x60/0xb0
       [<ffffffff8024c069>] kthread+0x49/0x90
       [<ffffffff8020c898>] child_rip+0xa/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (sched_domains_mutex){--..}:
       [<ffffffff8025988f>] __lock_acquire+0x9cf/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d11b4>] mutex_lock_nested+0x94/0x250
       [<ffffffff8022f5e9>] partition_sched_domains+0x29/0x2b0
       [<ffffffff80268eea>] rebuild_sched_domains+0x9a/0x3e0
       [<ffffffff80269243>] delayed_rebuild_sched_domains+0x13/0x30
       [<ffffffff80247d5e>] run_workqueue+0xde/0x220
       [<ffffffff80247f00>] worker_thread+0x60/0xb0
       [<ffffffff8024c069>] kthread+0x49/0x90
       [<ffffffff8020c898>] child_rip+0xa/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (cgroup_mutex){--..}:
       [<ffffffff80259913>] __lock_acquire+0xa53/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d11b4>] mutex_lock_nested+0x94/0x250
       [<ffffffff802653e2>] cgroup_lock+0x12/0x20
       [<ffffffff80269bf1>] cpuset_handle_cpuhp+0x31/0x230
       [<ffffffff804d677f>] notifier_call_chain+0x3f/0x80
       [<ffffffff80250679>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff804c1638>] _cpu_down+0xa8/0x290
       [<ffffffff804c185b>] cpu_down+0x3b/0x60
       [<ffffffff804c2b58>] store_online+0x48/0xa0
       [<ffffffff803a45b4>] sysdev_store+0x24/0x30
       [<ffffffff802eeaaa>] sysfs_write_file+0xca/0x140
       [<ffffffff8029ca2b>] vfs_write+0xcb/0x170
       [<ffffffff8029cbc0>] sys_write+0x50/0x90
       [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

3 locks held by bash/2836:
 #0:  (&buffer->mutex){--..}, at: [<ffffffff802eea23>] sysfs_write_file+0x43/0x140
 #1:  (cpu_add_remove_lock){--..}, at: [<ffffffff804c1847>] cpu_down+0x27/0x60
 #2:  (&cpu_hotplug.lock){--..}, at: [<ffffffff8025e062>] cpu_hotplug_begin+0x22/0x60

stack backtrace:
Pid: 2836, comm: bash Not tainted 2.6.26-rc8 #3

Call Trace:
 [<ffffffff80258c0c>] print_circular_bug_tail+0x8c/0x90
 [<ffffffff802589c4>] ? print_circular_bug_entry+0x54/0x60
 [<ffffffff80259913>] __lock_acquire+0xa53/0xe50
 [<ffffffff8025867d>] ? mark_held_locks+0x4d/0x90
 [<ffffffff804d1345>] ? mutex_lock_nested+0x225/0x250
 [<ffffffff802653e2>] ? cgroup_lock+0x12/0x20
 [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
 [<ffffffff802653e2>] ? cgroup_lock+0x12/0x20
 [<ffffffff804d11b4>] mutex_lock_nested+0x94/0x250
 [<ffffffff802653e2>] cgroup_lock+0x12/0x20
 [<ffffffff80269bf1>] cpuset_handle_cpuhp+0x31/0x230
 [<ffffffff8022edaa>] ? update_sched_domains+0x5a/0x70
 [<ffffffff804d677f>] notifier_call_chain+0x3f/0x80
 [<ffffffff80250679>] __raw_notifier_call_chain+0x9/0x10
 [<ffffffff804c1638>] _cpu_down+0xa8/0x290
 [<ffffffff804c185b>] cpu_down+0x3b/0x60
 [<ffffffff804c2b58>] store_online+0x48/0xa0
 [<ffffffff803a45b4>] sysdev_store+0x24/0x30
 [<ffffffff802eeaaa>] sysfs_write_file+0xca/0x140
 [<ffffffff8029ca2b>] vfs_write+0xcb/0x170
 [<ffffffff8029cbc0>] sys_write+0x50/0x90
 [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/