I've run across another problem - this one I'm pretty sure is a problem with Gluster. I'm using Oracle DNFS still and it's erroring out on some of its logfiles: ARC3: Error 19508 Closing archive log file '/db/flash_recovery_area/ALTUS/archivelog/2013_04_22/o1_mf_1_1093__1366653401581181_.arc' Gluster is reporting: [2013-04-22 13:57:22.073354] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-9: remote operation failed: Permission denied [2013-04-22 13:57:22.073496] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-8: remote operation failed: Permission denied [2013-04-22 13:57:22.073805] W [nfs3.c:889:nfs3svc_truncate_cbk] 0-nfs: 8b534455: /fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc => -1 (Permission denied) [2013-04-22 13:57:22.082594] E [nfs3.c:3408:nfs3_remove_resume] 0-nfs-nfsv3: Unable to resolve FH: (192.168.10.3:46391) gv0 : 82c4c5ec-f3ad-4074-ac66-c5a455146d71 Immediately prior to this, that file has attributes: Regular File mode:0640 uid:500 gid:1000, size: 476959744 The actual NFS RPC causing this error is [1]. Briefly: Remote Procedure Call, Type:Call XID:0x8b534455 Network File System, SETATTR Call FH:0x5c191ad8 new_attributes mode: value follows set_it: value follows (1) Mode: 0440, S_IRUSR, S_IRGRP size: value follows set_it: value follows (1) size: 476959744 In other words, a "truncate" and "chmod 440" in the same call. Gluster is replying with [2]: Remote Procedure Call, Type:Reply XID:0x8b534455 Network File System, SETATTR Reply Error:NFS3ERR_ACCES Status: NFS3ERR_ACCES (13) What's happening is that gluster is processing the mode change before the truncate, causing the truncate to fail. Incidentally, this also causes gluster to think that these files need healing: Gathering Heal info on volume gv0 has been successful … Brick fearless1:/export/bricks/500117310007a7ec/glusterdata /fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc … Brick fearless2:/export/bricks/500117310007a74c/glusterdata /fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc So, arguably gluster should be doing the truncate before the chmod. Perhaps the Most Correct thing is to always chmod last if removing permissions. That's a longer discussion :p [1] Full RPC Call Remote Procedure Call, Type:Call XID:0x8b534455 Fragment header: Last fragment, 172 bytes 1... .... .... .... .... .... .... .... = Last Fragment: Yes .000 0000 0000 0000 0000 0000 1010 1100 = Fragment Length: 172 XID: 0x8b534455 (2337490005) Message Type: Call (0) RPC Version: 2 Program: NFS (100003) Program Version: 3 Procedure: SETATTR (2) [The reply to this request is in frame 293325] Credentials Flavor: AUTH_UNIX (1) Length: 52 Stamp: 0xabcdefab Machine Name: fleming1.netdirect.ca length: 21 contents: fleming1.netdirect.ca fill bytes: opaque data UID: 500 GID: 1000 Auxiliary GIDs GID: 1000 GID: 1030 Verifier Flavor: AUTH_NULL (0) Length: 0 Network File System, SETATTR Call FH:0x5c191ad8 [Program Version: 3] [V3 Procedure: SETATTR (2)] object length: 36 [hash (CRC-32): 0x5c191ad8] [Name: .o1_mf_1_1093__1366653401581181_.arc] [Full Name: 192.168.10.1:/gv0/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc] decode type as: unknown filehandle: 3a4f474c20117b487f884f169490a0349afacf71e16a95fc... new_attributes mode: value follows set_it: value follows (1) Mode: 0440, S_IRUSR, S_IRGRP .... .... .... .... .... 0... .... .... = S_ISUID: No .... .... .... .... .... .0.. .... .... = S_ISGID: No .... .... .... .... .... ..0. .... .... = S_ISVTX: No .... .... .... .... .... ...1 .... .... = S_IRUSR: Yes .... .... .... .... .... .... 0... .... = S_IWUSR: No .... .... .... .... .... .... .0.. .... = S_IXUSR: No .... .... .... .... .... .... ..1. .... = S_IRGRP: Yes .... .... .... .... .... .... ...0 .... = S_IWGRP: No .... .... .... .... .... .... .... 0... = S_IXGRP: No .... .... .... .... .... .... .... .0.. = S_IROTH: No .... .... .... .... .... .... .... ..0. = S_IWOTH: No .... .... .... .... .... .... .... ...0 = S_IXOTH: No uid: no value set_it: no value (0) gid: no value set_it: no value (0) size: value follows set_it: value follows (1) size: 476959744 atime: don't change set_it: don't change (0) mtime: don't change set_it: don't change (0) guard: no value check: no value (0) [2] Full Reply Ethernet II, Src: Ibm_36:f7:d0 (5c:f3:fc:36:f7:d0), Dst: IntelCor_38:e7:58 (00:1e:67:38:e7:58) Internet Protocol Version 4, Src: 192.168.10.1 (192.168.10.1), Dst: 192.168.10.3 (192.168.10.3) Transmission Control Protocol, Src Port: 38467 (38467), Dst Port: 46391 (46391), Seq: 1230671698, Ack: 2230824272, Len: 40 Remote Procedure Call, Type:Reply XID:0x8b534455 Fragment header: Last fragment, 36 bytes 1... .... .... .... .... .... .... .... = Last Fragment: Yes .000 0000 0000 0000 0000 0000 0010 0100 = Fragment Length: 36 XID: 0x8b534455 (2337490005) Message Type: Reply (1) [Program: NFS (100003)] [Program Version: 3] [Procedure: SETATTR (2)] Reply State: accepted (0) [This is a reply to a request in frame 293324] [Time from request: 0.001547000 seconds] Verifier Flavor: AUTH_NULL (0) Length: 0 Accept State: RPC executed successfully (0) Network File System, SETATTR Reply Error:NFS3ERR_ACCES [Program Version: 3] [V3 Procedure: SETATTR (2)] Status: NFS3ERR_ACCES (13) obj_wcc before attributes_follow: no value (0) after attributes_follow: no value (0)
AFR has also gotten confused about the status of these files when this happens. I suspect it's the "0-gv0-client-9: remote operation failed: Permission denied" failures throwing it out of whack: fearless1# getfattr -m . -d -e hex /export/bricks/*/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc # file: export/bricks/500117310007a7ec/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-8=0x000000010000000000000000 trusted.afr.gv0-client-9=0x000000010000000000000000 trusted.gfid=0xe16a95fc3e3b4e6abb9cc6c449db80ca fearless2# getfattr -m . -d -e hex /export/bricks/*/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc # file: export/bricks/500117310007a74c/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-8=0x000000010000000000000000 trusted.afr.gv0-client-9=0x000000010000000000000000 trusted.gfid=0xe16a95fc3e3b4e6abb9cc6c449db80ca I mistakenly tried to read this attribute on the file within the fuse mount and the following command completely froze and I can't kill it: fearless2# getfattr -d -e text /gv0/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653909363181_.arc -n trusted.afr
OK! I've created a SIMPLE test case by porting nfsshell to NFSv3 so I can test it against GlusterFS. Behaviour replicated, tomorrow I'll dig through the GlusterFS code and fix it. FreeBSD 9.1: [michael@freebsd /export/scratch]$ ls -al test -rw-r--r-- 1 michael wheel 6 Apr 26 02:28 test >>> sent command: chmodtrunc 664 test 512 [michael@freebsd /export/scratch]$ ls -al test -rw-rw-r-- 1 michael wheel 512 Apr 26 02:28 test >>> sent command: chmodtrunc 440 test 1024 [michael@freebsd /export/scratch]$ ls -al test -r--r----- 1 michael wheel 1024 Apr 26 02:28 test Linux kNFS (Debian): [tla /storage/public/scratch]$ echo test > test [tla /storage/public/scratch]$ ls -al -rw-r--r-- 1 michael michael 5 Apr 26 02:18 test >>> sent command: chmodtrunc 664 test 512 [tla /storage/public/scratch]$ ls -al -rw-rw-r-- 1 michael michael 512 Apr 26 02:18 test >>> sent command: chmodtrunc 440 test 1024 [tla /storage/public/scratch]$ ls -al -r--r----- 1 michael michael 1024 Apr 26 02:18 test Gluster 3.3.1 NFS: [michael@fearless1 test]$ ls -al test -rw-rw-r--. 1 michael michael 7 Apr 26 02:23 test >>> sent command: chmodtrunc 644 test 512 [michael@fearless1 test]$ ls -al test -rw-r-----. 1 michael michael 512 Apr 26 02:24 test >>> sent command: chmodtrunc 440 test 1024 <<< Set attributes failed: Permission denied [michael@fearless1 test]$ ls -al test -r--r-----. 1 michael michael 512 Apr 26 02:24 test [2013-04-26 02:24:33.670628] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-11: remote operation failed: Permission denied [2013-04-26 02:24:33.670679] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-10: remote operation failed: Permission denied [2013-04-26 02:24:33.670985] W [nfs3.c:889:nfs3svc_truncate_cbk] 0-nfs: 67bdf2f2: /scratch/test/test => -1 (Permission denied)
I took the liberty of graphing out the calls - I was initially hoping to understand it enough to make the change myself, but modifying this seems to require a larger architectural change. http://www.websequencediagrams.com/files/render?link=Iy1dl46ejLv0p2srrCEH
Reproducible with chmodtrunc in nfsshell from this branch: - https://github.com/nixpanic/nfsshell/tree/chmodtrunc
Your suggestion from comment #0 will likely work: > So, arguably gluster should be doing the truncate before the chmod. Perhaps > the Most Correct thing is to always chmod last if removing permissions. > That's a longer discussion :p But, it will require some more logic to when a read-only file gets something like SETATTR(chmod=0644, size=0). Trying to do a truncate() before and/or after a setattr() feels a little hacky. Instead, I'll propose a change in the posix-acl xlator, where the permissions are checked and in the case of SETATTR(size=...) -> truncate() get denied.
REVIEW: http://review.gluster.org/8889 (gNFS: allow truncate() from SETATTR over NFS for owner) posted (#1) for review on master by Niels de Vos (ndevos)
Michael, this is currently a bug against glusterfs-3.3. Could you let us know for what versions of Gluster you would like to see a fix? 3.3 is not actively maintained anymore, but we can include this in 3.4 and more recent.
On Twitter Michael mentioned that his project does not need this fix anymore.: - https://twitter.com/Supermathie/status/516941222863437826 I'm moving this to 'mainline', and we can backport this change on request: - http://www.gluster.org/community/documentation/index.php/Backport_Wishlist
COMMIT: http://review.gluster.org/8889 committed in master by Niels de Vos (ndevos) ------ commit f2131b8c79641c1bf9e20657757bcc9a62a0625a Author: Niels de Vos <ndevos> Date: Mon Sep 29 20:03:58 2014 +0200 gNFS: allow truncate() from SETATTR over NFS for owner NFSv3 does not have a TRUNCATE procedure, instead it is part of the SETATTR (change the 'size' attribute). SETATTR with a new 'size' succeeds on other NFS-servers, even when the owner of the file does not have write permissions. Make Gluster/NFS behave the same way, by checking if the RPC/pid comes from the NFS-server, and allow truncate() when the file is owned by the user calling SETATTR. BUG: 955753 Change-Id: I4b7cb8efe5a2032c6cd2eef6af610032f76d8b39 Signed-off-by: Niels de Vos <ndevos> Reviewed-on: http://review.gluster.org/8889 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Kaleb KEITHLEY <kkeithle> Reviewed-by: soumya k <skoduri>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user