race removing iscsi device (on cinder node)

Bug #1437441 reported by Ollie Leahy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Walt Boring

Bug Description

We are seeing a situation where the call to remove_scsi_device() has not removed the entries in /dev/disk/by-path by the time the code gets a list of devices there, so the iscsi session is never logged out.

This is the problem described in https://bugs.launchpad.net/cinder/+bug/1375382, but for our situation the fix there is not sufficient.

To reproduce:
1. Create a bootable volume:

     $ cinder create --display-name bfv1 --image 3d38fb9d-ec96-4c9b-bc78-1a4108e7882d 10

2. The volume is created

     $ cinder list
     | 480311a8-3de2-407c-97b3-392756c19a58 | available | bfv1 | 10 | None | false | |

3. The iscsi session on the the c-vol node remains open:

     root@controller1:~# iscsiadm -msession | grep 480311a8-3de2-407c-97b3-392756c19a58
tcp: [13] 10.23.32.220:3260,1 iqn.2003-10.com.lefthandnetworks:4405bmgmtgroup:32104:volume-480311a8-3de2-407c-97b3-392756c19a58 (non-flash)

And it will remain forever, these open sessions accumulate forever.

The following fragment from cinder-volume.log shows that after the iscsi device is remove, the iscsi logout does not happen:

2015-03-26 17:57:55.120 6887 DEBUG cinder.brick.initiator.linuxscsi [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Remove SCSI device(/dev/sdb) with /sys/block/sdb/device/delete remove_scsi_device /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/brick/initiator/linuxscsi.py:61
2015-03-26 17:57:55.121 6887 DEBUG cinder.openstack.common.processutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf tee -a /sys/block/sdb/device/delete execute /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/processutils.py:158
2015-03-26 17:57:55.228 6887 DEBUG cinder.openstack.common.processutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Result was 0 execute /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/processutils.py:192
2015-03-26 17:57:55.230 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Got semaphore "lefthand" for method "terminate_connection"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:191
2015-03-26 17:57:55.231 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Attempting to grab file lock "lefthand" for method "terminate_connection"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:202
2015-03-26 17:57:55.232 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Got file lock "lefthand" at /tmp/tmp68RQqE/cinder-lefthand for method "terminate_connection"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:232
2015-03-26 17:57:55.324 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Released file lock "lefthand" at /tmp/tmp68RQqE/cinder-lefthand for method "terminate_connection"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:239
2015-03-26 17:57:55.325 6887 DEBUG cinder.volume.driver [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] volume 480311a8-3de2-407c-97b3-392756c19a58: removing export _detach_volume /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/volume/driver.py:262
2015-03-26 17:57:55.325 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Got semaphore "lefthand" for method "remove_export"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:191
2015-03-26 17:57:55.326 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Attempting to grab file lock "lefthand" for method "remove_export"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:202
2015-03-26 17:57:55.327 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Got file lock "lefthand" at /tmp/tmpM9WhTA/cinder-lefthand for method "remove_export"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:232
2015-03-26 17:57:55.328 6887 DEBUG cinder.openstack.common.lockutils [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Released file lock "lefthand" at /tmp/tmpM9WhTA/cinder-lefthand for method "remove_export"... inner /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/openstack/common/lockutils.py:239
2015-03-26 17:57:55.329 6887 DEBUG cinder.volume.flows.manager.create_volume [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Downloaded image 3d38fb9d-ec96-4c9b-bc78-1a4108e7882d ((None, None)) to volume 480311a8-3de2-407c-97b3-392756c19a58 successfully. _copy_image_to_volume /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py:511
2015-03-26 17:57:55.329 6887 DEBUG cinder.volume.flows.manager.create_volume [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Marking volume 480311a8-3de2-407c-97b3-392756c19a58 as bootable. _enable_bootable_flag /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py:430
2015-03-26 17:57:55.378 6887 DEBUG cinder.volume.flows.manager.create_volume [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Copying metadata from image 3d38fb9d-ec96-4c9b-bc78-1a4108e7882d to 480311a8-3de2-407c-97b3-392756c19a58. _handle_bootable_volume_glance_meta /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py:393
2015-03-26 17:57:55.379 6887 DEBUG cinder.volume.flows.manager.create_volume [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Creating volume glance metadata for volume 480311a8-3de2-407c-97b3-392756c19a58 backed by image 3d38fb9d-ec96-4c9b-bc78-1a4108e7882d with: {'container_format': u'bare', 'min_ram': 0, 'disk_format': u'qcow2', 'image_name': u'debian-wheezy-amd64-20140929-disk', 'image_id': u'3d38fb9d-ec96-4c9b-bc78-1a4108e7882d', 'checksum': u'5690939c3f261fac819347a6d8b7eb32', 'min_disk': 0, 'size': 221489152}. _capture_volume_image_metadata /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py:548
2015-03-26 17:57:55.831 6887 DEBUG taskflow.engines.action_engine.runner [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Exiting old state 'SCHEDULING' in response to event 'wait' on_exit /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/taskflow/engines/action_engine/runner.py:156
2015-03-26 17:57:55.832 6887 DEBUG taskflow.engines.action_engine.runner [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Entering new state 'WAITING' in response to event 'wait' on_enter /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/taskflow/engines/action_engine/runner.py:160
2015-03-26 17:57:55.832 6887 DEBUG taskflow.engines.action_engine.runner [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Exiting old state 'WAITING' in response to event 'analyze' on_exit /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/taskflow/engines/action_engine/runner.py:156
2015-03-26 17:57:55.833 6887 DEBUG taskflow.engines.action_engine.runner [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Entering new state 'ANALYZING' in response to event 'analyze' on_enter /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/taskflow/engines/action_engine/runner.py:160
2015-03-26 17:57:55.835 6887 DEBUG cinder.volume.manager [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeFromSpecTask;volume:create' (b9a1de76-58e2-4795-b60a-6d0061588393) transitioned into state 'SUCCESS' with result '<cinder.db.sqlalchemy.models.Volume object at 0x7f6101fc5fd0>' _task_receiver /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/cinder/flow_utils.py:113
2015-03-26 17:57:55.835 6887 DEBUG taskflow.engines.action_engine.runner [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Exiting old state 'ANALYZING' in response to event 'schedule' on_exit /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/taskflow/engines/action_engine/runner.py:156
2015-03-26 17:57:55.836 6887 DEBUG taskflow.engines.action_engine.runner [req-4a1412e1-68b8-48b9-9772-6db243362aa4 31e12fd11e604be782cd83faec12cd81 61302285afc64da881a3eb621150d728 - - -] Entering new state 'SCHEDULING' in response to event 'schedule' on_enter /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/taskflow/engines/action_engine/runner.py:160

Changed in cinder:
importance: Undecided → High
assignee: nobody → Walt Boring (walter-boring)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/168505

Changed in cinder:
status: New → In Progress
Revision history for this message
John Griffith (john-griffith) wrote : Re: race removing iscsi device

@Ollie
Can you clarify if you're seeing this exclusively on the Cinder node, or is it "nova volume-detach"?

Revision history for this message
John Griffith (john-griffith) wrote :

Walt stated this is on the Cinder node an only for cinder operations and it's version of initiator ops. Thanks

summary: - race removing iscsi device
+ race removing iscsi device (on cinder node)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/168505
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=43029941219505bc3861dd285e3393e08c76b433
Submitter: Jenkins
Branch: master

commit 43029941219505bc3861dd285e3393e08c76b433
Author: Walter A. Boring IV <email address hidden>
Date: Fri Mar 27 15:34:40 2015 -0700

    Brick: Fix race in removing iSCSI device

    Some folks are seeing race conditions on overloaded
    or slow machines, where the time between issuing the
    remove volume command, and the volume being from
    /dev/disk/by-path by the kernel isn't long enough.

    This patch adds a simple verify loop for iSCSI only.
    It will try 3 times and wait 2 seconds between tries.

    Also, This patch will only fix attaches that happen
    in cinder for copy volume <--> image operations.
    Nova needs a separate patch to fix any potential
    issues there.

    Change-Id: I5db9744da6b865b7d903503db8a895b409f32f5c
    Closes-Bug: 1437441

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → kilo-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.