aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAlex Elder2014-03-25 08:36:02 -0500
committerSage Weil2014-03-29 12:38:14 -0500
commit638c323c4d1f8eaf25224946e21ce8818f1bcee1 (patch)
treee45a1de3d0c8936f14a5933b72dd423abe870e35
parent0414855fdc4a40da05221fc6062cccbc0c30f169 (diff)
downloadkernel-common-638c323c4d1f8eaf25224946e21ce8818f1bcee1.tar.gz
kernel-common-638c323c4d1f8eaf25224946e21ce8818f1bcee1.tar.xz
kernel-common-638c323c4d1f8eaf25224946e21ce8818f1bcee1.zip
rbd: drop an unsafe assertion
Olivier Bonvalet reported having repeated crashes due to a failed assertion he was hitting in rbd_img_obj_callback(): Assertion failure in rbd_img_obj_callback() at line 2165: rbd_assert(which >= img_request->next_completion); With a lot of help from Olivier with reproducing the problem we were able to determine the object and image requests had already been completed (and often freed) at the point the assertion failed. There was a great deal of discussion on the ceph-devel mailing list about this. The problem only arose when there were two (or more) object requests in an image request, and the problem was always seen when the second request was being completed. The problem is due to a race in the window between setting the "done" flag on an object request and checking the image request's next completion value. When the first object request completes, it checks to see if its successor request is marked "done", and if so, that request is also completed. In the process, the image request's next_completion value is updated to reflect that both the first and second requests are completed. By the time the second request is able to check the next_completion value, it has been set to a value *greater* than its own "which" value, which caused an assertion to fail. Fix this problem by skipping over any completion processing unless the completing object request is the next one expected. Test only for inequality (not >=), and eliminate the bad assertion. Tested-by: Olivier Bonvalet <ob@daevel.fr> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
-rw-r--r--drivers/block/rbd.c1
1 files changed, 0 insertions, 1 deletions
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b365e0dfccb..34898d53395 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2109,7 +2109,6 @@ static void rbd_img_obj_callback(struct rbd_obj_request *obj_request)
2109 rbd_assert(img_request->obj_request_count > 0); 2109 rbd_assert(img_request->obj_request_count > 0);
2110 rbd_assert(which != BAD_WHICH); 2110 rbd_assert(which != BAD_WHICH);
2111 rbd_assert(which < img_request->obj_request_count); 2111 rbd_assert(which < img_request->obj_request_count);
2112 rbd_assert(which >= img_request->next_completion);
2113 2112
2114 spin_lock_irq(&img_request->completion_lock); 2113 spin_lock_irq(&img_request->completion_lock);
2115 if (which != img_request->next_completion) 2114 if (which != img_request->next_completion)