Gitweb @ Texas Instruments - Open Source Git Repositories - git.TI.com/gitweb

author	Angela Stegmaier <angelabaker@ti.com>
	Wed, 23 Feb 2022 16:55:50 +0000 (10:55 -0600)
committer	Suman Anna <s-anna@ti.com>
	Wed, 23 Feb 2022 16:55:50 +0000 (10:55 -0600)
commit	cbcbf9db04af2a780e596e3e2e55dc9bde294b4c
tree	945e197fc02c8b1d856d73972b996c45c2279ed5	tree \| snapshot (tar.xz tar.gz zip)
parent	06075b7792c169c48cbe3ef517b9de01672cf67b	commit \| diff

rpmsg: fix lockdep warnings in virtio rpmsg bus driver

The virtio rpmsg bus framework uses endpoints as the basis for
sending and receiving messages to/from a remote processor. Each
rpmsg bus device will have a primary endpoint if the corresponding
rpmsg bus driver supports a callback, and secondary child endpoints
associated with the same rpmsg bus device. The life-cycle of these
endpoints are tied to the corresponding rpmsg device. A virtio rpmsg
bus device can also have its own endpoint for supporting name service
announcements from a corresponding remote processor to create and
delete rpmsg devices dynamically.

Each endpoint has a callback lock associated with it to provide
protection/mutual exclusion between threads that process incoming
rpmsg messages and threads that want to delete the endpoint. The
virtio rpmsg name service endpoint callback will run while holding
it's ept->cb_lock to create/delete rpmsg devices for RPMSG_NS_CREATE
and RPMSG_NS_DELETE messages respectively. The latter message
processing will destroy the requested channel, and will ultimately
result in all the secondary rpmsg device endpoints also to be
destroyed. The ept->cb_lock for the channel's endpoint is also
locked during its destruction while setting the callback to NULL.
This results in a seemingly nested locking of the ept->cb_lock even
though the locking is on different mutexes. This will result in a
false warning from the lockdep validator when it is enabled because
the lockdep deals with classes and both are the same class, although
they are different instances.

Similar circular dependency scenarios also exist with remoteproc
error recovery and existing rpmsg driver - rpmsg_proto.

These issues are fixed by replacing the existing mutex_lock() calls
with the mutex_lock_nested() API variation and using different
subclasses for the NameService end-point and for the rpmsg channel
device end-points.

Following are example warning signatures that get fixed by this patch:

1. Recursive locking dependency during RPMSG_NS_DESTROY message processing
============================================
WARNING: possible recursive locking detected
--------------------------------------------
kworker/0:1/20 is trying to acquire lock:
c68c2c44 (&ept->cb_lock){+.+.}-{3:3}, at: __rpmsg_destroy_ept+0x44/0x98 [virtio_rpmsg_bus]

but task is already holding lock:
c69a8344 (&ept->cb_lock){+.+.}-{3:3}, at: rpmsg_recv_done+0x9c/0x2a0 [virtio_rpmsg_bus]

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&ept->cb_lock);
   lock(&ept->cb_lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by kworker/0:1/20:
  #0: c200a6a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x204/0x7c0
  #1: c30a5f20 ((work_completion)(&mq->work)){+.+.}-{0:0}, at: process_one_work+0x204/0x7c0
  #2: c69a8344 (&ept->cb_lock){+.+.}-{3:3}, at: rpmsg_recv_done+0x9c/0x2a0 [virtio_rpmsg_bus]
  #3: c59d74c8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x18/0x1d4

2. Circular locking dependency during error recovery of rpmsg-proto driver
======================================================
WARNING: possible circular locking dependency detected
------------------------------------------------------
kworker/0:3/968 is trying to acquire lock:
c59ba544 (&ept->cb_lock){+.+.}-{3:3}, at: __rpmsg_destroy_ept+0x44/0x98 [virtio_rpmsg_bus]

but task is already holding lock:
bf014038 (rpmsg_channels_lock){+.+.}-{3:3}, at: rpmsg_proto_remove+0x28/0x178 [rpmsg_proto]

which lock already depends on the new lock.

other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(rpmsg_channels_lock);
                                lock(&ept->cb_lock);
                                lock(rpmsg_channels_lock);
   lock(&ept->cb_lock);

  *** DEADLOCK ***

6 locks held by kworker/0:3/968:
  #0: c200a6a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x204/0x7c0
  #1: c5f73f20 ((work_completion)(&rproc->crash_handler)){+.+.}-{0:0}, at: process_one_work+0x204/0x7c0
  #2: c5648364 (&rproc->lock){+.+.}-{3:3}, at: rproc_trigger_recovery+0x30/0x304
  #3: c58eb8f8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x18/0x1d4
  #4: c59a70c8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x18/0x1d4
  #5: bf014038 (rpmsg_channels_lock){+.+.}-{3:3}, at: rpmsg_proto_remove+0x28/0x178 [rpmsg_proto]

Signed-off-by: Angela Stegmaier <angelabaker@ti.com>
[s-anna@ti.com: flip the subclass values, update crash log examples for 5.10]
Signed-off-by: Suman Anna <s-anna@ti.com>
[dfustini: fix trivial conflict in drivers/rpmsg/virtio_rpmsg_bus.c]
Signed-off-by: Drew Fustini <dfustini@baylibre.com>

drivers/rpmsg/virtio_rpmsg_bus.c		diff \| blob \| history
include/linux/rpmsg.h		diff \| blob \| history

RPMsg/IPC kernel repositories for OMAPx & AMx SoCs - TI Linux rpmsg bus and client driver feature and integration tree for OMAP4/OMAP5 & DRA7xx/AM57xx; Keystone K2 & Davinci OMAP-L138 SoCs.

RSS Atom