Bug 802671 - NFS/DHT: rename fails under load
Summary: NFS/DHT: rename fails under load
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: pre-release
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Rajesh
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 852575 858484
TreeView+ depends on / blocked
 
Reported: 2012-03-13 09:03 UTC by Sachidananda Urs
Modified: 2013-07-04 22:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 852575 (view as bug list)
Environment:
Last Closed: 2013-06-19 06:18:03 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Logs for rename failure (604.57 KB, application/x-bzip)
2012-03-13 09:09 UTC, Sachidananda Urs
no flags Details
Program to reproduce (3.84 KB, application/x-shellscript)
2012-03-13 09:13 UTC, Sachidananda Urs
no flags Details

Description Sachidananda Urs 2012-03-13 09:03:48 UTC
rename fails with errors:

rename: rename(//home-0/rename.dir.4015, //home-0/rename.dir.2.4015): No such file or directory
rename: stat(//home-0/rename.dir.4015): No such file or directory
rename: rmdir(//home-0/rename.dir.4014): No such file or directory
rename: rmdir(//home-0/rename.dir.4013): No such file or directory
rename: rmdir(//home-0/rename.dir.4016): No such file or directory
rename: rmdir(//home-0/rename.dir.4015): No such file or directory

though rmdir, stat, and rename are done synchronously.

Attaching script to reproduce the issue. Please find attached logs as well.

Comment 1 Sachidananda Urs 2012-03-13 09:09:20 UTC
Created attachment 569608 [details]
Logs for rename failure

Comment 2 Sachidananda Urs 2012-03-13 09:13:48 UTC
Created attachment 569611 [details]
Program to reproduce

Comment 3 Sachidananda Urs 2012-03-13 16:40:03 UTC
dmesg:

Call Trace:
 [<ffffffff814dc4be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8117f701>] ? path_put+0x31/0x40
 [<ffffffff814dc35b>] mutex_lock+0x2b/0x50
 [<ffffffff8117ee63>] lock_rename+0xd3/0xe0
 [<ffffffff81183363>] sys_renameat+0x113/0x260
 [<ffffffff810500b0>] ? __dequeue_entity+0x30/0x50
 [<ffffffff810097ac>] ? __switch_to+0x1ac/0x320
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811834cb>] sys_rename+0x1b/0x20
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task rename:32401 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rename        D 0000000000000000     0 32401  32246 0x00000000
 ffff880078b47cf8 0000000000000082 0000000000000000 0000000000000000
 ffff880078b47c88 00000000a3393d40 ffff880078b47c78 000000012a4d6516
 ffff88007a067078 ffff880078b47fd8 000000000000f598 ffff88007a067078
Call Trace:
 [<ffffffff814dc4be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8117f701>] ? path_put+0x31/0x40
 [<ffffffff814dc35b>] mutex_lock+0x2b/0x50
 [<ffffffff8117ee63>] lock_rename+0xd3/0xe0
 [<ffffffff81183363>] sys_renameat+0x113/0x260
 [<ffffffff8120e314>] ? inode_has_perm+0x54/0xa0
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811834cb>] sys_rename+0x1b/0x20
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

Comment 4 Sachidananda Urs 2012-03-13 16:42:51 UTC
Fails with I/O error. Log snippet:

DHT:

[2012-03-13 02:32:45.280183] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280208] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280219] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280264] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280345] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280373] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280384] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077326] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077397] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077430] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077441] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077461] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077473] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389

=====================


[2012-03-13 12:33:48.935239] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 4fa512be: /rename.dir.2371 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.935269] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 4fa512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
[2012-03-13 12:33:48.935454] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.935545] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.935634] W [client3_1-fops.c:2598:client3_1_lookup] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.935717] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.936464] W [client3_1-fops.c:3757:client3_1_opendir] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.937219] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 54a512be: /rename.dir.2378 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.937248] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 54a512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
[2012-03-13 12:33:48.937522] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937638] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937710] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937834] W [client3_1-fops.c:2598:client3_1_lookup] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.938613] W [client3_1-fops.c:3757:client3_1_opendir] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.939300] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 59a512be: /rename.dir.2374 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.939323] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 59a512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)

Comment 5 Sachidananda Urs 2012-03-15 07:13:03 UTC
Application errors:

a.out: rename: Device or resource busy
a.out: rename: File exists
a.out: fopen: Remote I/O error
a.out: rename: File exists
a.out: rename: Device or resource busy

==============================
Logs:


[2012-03-14 05:43:15.423293] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.30 on nfs-test-1-client-1
[2012-03-14 05:43:31.235064] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.17
[2012-03-14 05:43:49.817049] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.13
[2012-03-14 05:44:05.672628] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.6
[2012-03-14 05:44:10.342640] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.42 on nfs-test-1-client-0
[2012-03-14 05:44:17.704338] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.40
[2012-03-14 05:44:31.873421] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.37 on nfs-test-1-client-3
[2012-03-14 05:44:42.682963] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.33
[2012-03-14 05:45:00.765599] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.4 on nfs-test-1-client-2
[2012-03-14 05:45:21.322249] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.11
[2012-03-14 05:45:32.088259] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.8
[2012-03-14 05:45:44.789369] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.30 on nfs-test-1-client-1
[2012-03-14 05:45:50.327663] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.31
[2012-03-14 05:46:25.356643] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.1
[2012-03-14 05:46:32.370776] W [client3_1-fops.c:2180:client3_1_lookup_cbk] 0-nfs-test-1-client-1: remote operation failed: Success. Path: /nfsrename.17
[2012-03-14 05:46:40.739286] W [client3_1-fops.c:511:client3_1_unlink_cbk] 0-nfs-test-1-client-0: remote operation failed: No such file or directory
[2012-03-14 05:46:40.739328] W [dht-rename.c:340:dht_rename_unlink_cbk] 0-nfs-test-1-dht: /nfsrename.nGG7a0: unlink on nfs-test-1-client-0 failed (No such file or directory)
[2012-03-14 05:47:37.471179] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.38
[2012-03-14 05:47:41.392221] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.48
[2012-03-14 05:48:12.094273] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.40
[2012-03-14 05:48:23.639715] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.16
[2012-03-14 05:49:11.593940] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.28 on nfs-test-1-client-2
[2012-03-14 05:49:38.661165] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.35
[2012-03-14 05:50:00.017499] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.33 on nfs-test-1-client-2
[2012-03-14 05:50:07.370571] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.21 on nfs-test-1-client-1
[2012-03-14 05:50:37.808788] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.9
[2012-03-14 05:50:57.532082] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.12
[2012-03-14 05:51:19.719767] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.21
[2012-03-14 05:51:30.566440] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.19 on nfs-test-1-client-3
[2012-03-14 05:51:34.585923] W [client3_1-fops.c:2180:client3_1_lookup_cbk] 0-nfs-test-1-client-3: remote operation failed: Success. Path: /nfsrename.18
[2012-03-14 05:51:34.586472] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.18
[2012-03-14 05:51:46.509069] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.12
[2012-03-14 05:51:57.722177] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.20 on nfs-test-1-client-1
[2012-03-14 05:52:04.270368] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.36
[2012-03-14 05:53:04.852505] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.10

Comment 6 Vijay Bellur 2012-05-18 13:11:36 UTC
Pushing it to post 3.3.0

Comment 7 Rajesh 2013-06-19 06:18:03 UTC
/mnt/m1/test# ~/x/rename/rename.sh /mnt/m1/test
/mnt/m1/test# 

I tried to reproduce the issue on the latest HEAD, and after this script ran for about a day, there were no errors found in the application or logs. Please reopen if this issue is seen again.


Note You need to log in before you can comment on or make changes to this bug.