Bug 955753 - NFS SETATTR call with a truncate and chmod 440 fails
Summary: NFS SETATTR call with a truncate and chmod 440 fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-23 17:28 UTC by Michael Brown
Modified: 2015-05-14 17:42 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 950121
Environment:
2 × gluster servers: 2×E5-2670, 128GB RAM, RHEL 6.4 64-bit, glusterfs-server-3.3.1-1.el6.x86_64 (from EPEL) 4 × NFS clients: 2×E5-2660, 128GB RAM, RHEL 5.7 64-bit, glusterfs-3.3.1-11.el5 (from kkeithley's repo, only used for testing) bricks are 400GB SSDs with ext4 (and dir_index off) common network is 10GbE, replication between servers happens over direct 10GbE link. gluster> volume info gv0 Volume Name: gv0 Type: Distributed-Replicate Volume ID: 20117b48-7f88-4f16-9490-a0349afacf71 Status: Started Number of Bricks: 8 x 2 = 16 Transport-type: tcp Bricks: Brick1: fearless1:/export/bricks/500117310007a6d8/glusterdata Brick2: fearless2:/export/bricks/500117310007a674/glusterdata Brick3: fearless1:/export/bricks/500117310007a714/glusterdata Brick4: fearless2:/export/bricks/500117310007a684/glusterdata Brick5: fearless1:/export/bricks/500117310007a7dc/glusterdata Brick6: fearless2:/export/bricks/500117310007a694/glusterdata Brick7: fearless1:/export/bricks/500117310007a7e4/glusterdata Brick8: fearless2:/export/bricks/500117310007a720/glusterdata Brick9: fearless1:/export/bricks/500117310007a7ec/glusterdata Brick10: fearless2:/export/bricks/500117310007a74c/glusterdata Brick11: fearless1:/export/bricks/500117310007a838/glusterdata Brick12: fearless2:/export/bricks/500117310007a814/glusterdata Brick13: fearless1:/export/bricks/500117310007a850/glusterdata Brick14: fearless2:/export/bricks/500117310007a84c/glusterdata Brick15: fearless1:/export/bricks/500117310007a858/glusterdata Brick16: fearless2:/export/bricks/500117310007a8f8/glusterdata Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on nfs.disable: off
Last Closed: 2015-05-14 17:25:28 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Michael Brown 2013-04-23 17:28:15 UTC
I've run across another problem - this one I'm pretty sure is a problem with Gluster.

I'm using Oracle DNFS still and it's erroring out on some of its logfiles:
ARC3: Error 19508 Closing archive log file '/db/flash_recovery_area/ALTUS/archivelog/2013_04_22/o1_mf_1_1093__1366653401581181_.arc'

Gluster is reporting:
[2013-04-22 13:57:22.073354] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-9: remote operation failed: Permission denied
[2013-04-22 13:57:22.073496] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-8: remote operation failed: Permission denied
[2013-04-22 13:57:22.073805] W [nfs3.c:889:nfs3svc_truncate_cbk] 0-nfs: 8b534455: /fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc => -1 (Permission denied)
[2013-04-22 13:57:22.082594] E [nfs3.c:3408:nfs3_remove_resume] 0-nfs-nfsv3: Unable to resolve FH: (192.168.10.3:46391) gv0 : 82c4c5ec-f3ad-4074-ac66-c5a455146d71

Immediately prior to this, that file has attributes:
Regular File mode:0640 uid:500 gid:1000, size: 476959744

The actual NFS RPC causing this error is [1]. Briefly:
Remote Procedure Call, Type:Call XID:0x8b534455
Network File System, SETATTR Call FH:0x5c191ad8
    new_attributes
        mode: value follows
            set_it: value follows (1)
            Mode: 0440, S_IRUSR, S_IRGRP
        size: value follows
            set_it: value follows (1)
            size: 476959744

In other words, a "truncate" and "chmod 440" in the same call.

Gluster is replying with [2]:
Remote Procedure Call, Type:Reply XID:0x8b534455
Network File System, SETATTR Reply  Error:NFS3ERR_ACCES
    Status: NFS3ERR_ACCES (13)

What's happening is that gluster is processing the mode change before the truncate, causing the truncate to fail.

Incidentally, this also causes gluster to think that these files need healing:
Gathering Heal info on volume gv0 has been successful
…
Brick fearless1:/export/bricks/500117310007a7ec/glusterdata
/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc
…
Brick fearless2:/export/bricks/500117310007a74c/glusterdata
/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc

So, arguably gluster should be doing the truncate before the chmod. Perhaps the Most Correct thing is to always chmod last if removing permissions. That's a longer discussion :p

[1] Full RPC Call
Remote Procedure Call, Type:Call XID:0x8b534455
    Fragment header: Last fragment, 172 bytes
        1... .... .... .... .... .... .... .... = Last Fragment: Yes
        .000 0000 0000 0000 0000 0000 1010 1100 = Fragment Length: 172
    XID: 0x8b534455 (2337490005)
    Message Type: Call (0)
    RPC Version: 2
    Program: NFS (100003)
    Program Version: 3
    Procedure: SETATTR (2)
    [The reply to this request is in frame 293325]
    Credentials
        Flavor: AUTH_UNIX (1)
        Length: 52
        Stamp: 0xabcdefab
        Machine Name: fleming1.netdirect.ca
            length: 21
            contents: fleming1.netdirect.ca
            fill bytes: opaque data
        UID: 500
        GID: 1000
        Auxiliary GIDs
            GID: 1000
            GID: 1030
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
Network File System, SETATTR Call FH:0x5c191ad8
    [Program Version: 3]
    [V3 Procedure: SETATTR (2)]
    object
        length: 36
        [hash (CRC-32): 0x5c191ad8]
        [Name: .o1_mf_1_1093__1366653401581181_.arc]
        [Full Name: 192.168.10.1:/gv0/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc]
        decode type as: unknown
        filehandle: 3a4f474c20117b487f884f169490a0349afacf71e16a95fc...
    new_attributes
        mode: value follows
            set_it: value follows (1)
            Mode: 0440, S_IRUSR, S_IRGRP
                .... .... .... .... .... 0... .... .... = S_ISUID: No
                .... .... .... .... .... .0.. .... .... = S_ISGID: No
                .... .... .... .... .... ..0. .... .... = S_ISVTX: No
                .... .... .... .... .... ...1 .... .... = S_IRUSR: Yes
                .... .... .... .... .... .... 0... .... = S_IWUSR: No
                .... .... .... .... .... .... .0.. .... = S_IXUSR: No
                .... .... .... .... .... .... ..1. .... = S_IRGRP: Yes
                .... .... .... .... .... .... ...0 .... = S_IWGRP: No
                .... .... .... .... .... .... .... 0... = S_IXGRP: No
                .... .... .... .... .... .... .... .0.. = S_IROTH: No
                .... .... .... .... .... .... .... ..0. = S_IWOTH: No
                .... .... .... .... .... .... .... ...0 = S_IXOTH: No
        uid: no value
            set_it: no value (0)
        gid: no value
            set_it: no value (0)
        size: value follows
            set_it: value follows (1)
            size: 476959744
        atime: don't change
            set_it: don't change (0)
        mtime: don't change
            set_it: don't change (0)
    guard: no value
        check: no value (0)

[2] Full Reply
Ethernet II, Src: Ibm_36:f7:d0 (5c:f3:fc:36:f7:d0), Dst: IntelCor_38:e7:58 (00:1e:67:38:e7:58)
Internet Protocol Version 4, Src: 192.168.10.1 (192.168.10.1), Dst: 192.168.10.3 (192.168.10.3)
Transmission Control Protocol, Src Port: 38467 (38467), Dst Port: 46391 (46391), Seq: 1230671698, Ack: 2230824272, Len: 40
Remote Procedure Call, Type:Reply XID:0x8b534455
    Fragment header: Last fragment, 36 bytes
        1... .... .... .... .... .... .... .... = Last Fragment: Yes
        .000 0000 0000 0000 0000 0000 0010 0100 = Fragment Length: 36
    XID: 0x8b534455 (2337490005)
    Message Type: Reply (1)
    [Program: NFS (100003)]
    [Program Version: 3]
    [Procedure: SETATTR (2)]
    Reply State: accepted (0)
    [This is a reply to a request in frame 293324]
    [Time from request: 0.001547000 seconds]
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
    Accept State: RPC executed successfully (0)
Network File System, SETATTR Reply  Error:NFS3ERR_ACCES
    [Program Version: 3]
    [V3 Procedure: SETATTR (2)]
    Status: NFS3ERR_ACCES (13)
    obj_wcc
        before
            attributes_follow: no value (0)
        after
            attributes_follow: no value (0)

Comment 1 Michael Brown 2013-04-24 16:30:01 UTC
AFR has also gotten confused about the status of these files when this happens. I suspect it's the "0-gv0-client-9: remote operation failed: Permission denied" failures throwing it out of whack:

fearless1# getfattr -m . -d -e hex  /export/bricks/*/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc
# file: export/bricks/500117310007a7ec/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.gv0-client-8=0x000000010000000000000000
trusted.afr.gv0-client-9=0x000000010000000000000000
trusted.gfid=0xe16a95fc3e3b4e6abb9cc6c449db80ca

fearless2# getfattr -m . -d -e hex  /export/bricks/*/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc
# file: export/bricks/500117310007a74c/glusterdata/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653401581181_.arc
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.gv0-client-8=0x000000010000000000000000
trusted.afr.gv0-client-9=0x000000010000000000000000
trusted.gfid=0xe16a95fc3e3b4e6abb9cc6c449db80ca

I mistakenly tried to read this attribute on the file within the fuse mount and the following command completely froze and I can't kill it:

fearless2# getfattr -d -e text /gv0/fleming1/db0/ALTUS_flash/archivelog/2013_04_22/.o1_mf_1_1093__1366653909363181_.arc -n trusted.afr

Comment 2 Michael Brown 2013-04-26 06:44:51 UTC
OK! I've created a SIMPLE test case by porting nfsshell to NFSv3 so I can test it against GlusterFS.

Behaviour replicated, tomorrow I'll dig through the GlusterFS code and fix it.

FreeBSD 9.1:
[michael@freebsd /export/scratch]$ ls -al test
-rw-r--r--  1 michael  wheel  6 Apr 26 02:28 test
>>> sent command: chmodtrunc 664 test 512
[michael@freebsd /export/scratch]$ ls -al test
-rw-rw-r--  1 michael  wheel  512 Apr 26 02:28 test
>>> sent command: chmodtrunc 440 test 1024
[michael@freebsd /export/scratch]$ ls -al test
-r--r-----  1 michael  wheel  1024 Apr 26 02:28 test

Linux kNFS (Debian):
[tla /storage/public/scratch]$ echo test > test
[tla /storage/public/scratch]$ ls -al
-rw-r--r-- 1 michael michael  5 Apr 26 02:18 test
>>> sent command: chmodtrunc 664 test 512
[tla /storage/public/scratch]$ ls -al
-rw-rw-r-- 1 michael michael 512 Apr 26 02:18 test
>>> sent command: chmodtrunc 440 test 1024
[tla /storage/public/scratch]$ ls -al
-r--r----- 1 michael michael 1024 Apr 26 02:18 test

Gluster 3.3.1 NFS:
[michael@fearless1 test]$ ls -al test
-rw-rw-r--. 1 michael michael 7 Apr 26 02:23 test
>>> sent command: chmodtrunc 644 test 512
[michael@fearless1 test]$ ls -al test
-rw-r-----. 1 michael michael 512 Apr 26 02:24 test
>>> sent command: chmodtrunc 440 test 1024
<<< Set attributes failed: Permission denied
[michael@fearless1 test]$ ls -al test
-r--r-----. 1 michael michael 512 Apr 26 02:24 test

[2013-04-26 02:24:33.670628] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-11: remote operation failed: Permission denied
[2013-04-26 02:24:33.670679] W [client3_1-fops.c:707:client3_1_truncate_cbk] 0-gv0-client-10: remote operation failed: Permission denied
[2013-04-26 02:24:33.670985] W [nfs3.c:889:nfs3svc_truncate_cbk] 0-nfs: 67bdf2f2: /scratch/test/test => -1 (Permission denied)

Comment 3 Michael Brown 2013-04-30 18:40:09 UTC
I took the liberty of graphing out the calls - I was initially hoping to understand it enough to make the change myself, but modifying this seems to require a larger architectural change.

http://www.websequencediagrams.com/files/render?link=Iy1dl46ejLv0p2srrCEH

Comment 4 Niels de Vos 2014-09-29 18:49:36 UTC
Reproducible with chmodtrunc in nfsshell from this branch:
- https://github.com/nixpanic/nfsshell/tree/chmodtrunc

Comment 5 Niels de Vos 2014-09-30 08:02:52 UTC
Your suggestion from comment #0 will likely work:

> So, arguably gluster should be doing the truncate before the chmod. Perhaps
> the Most Correct thing is to always chmod last if removing permissions.
> That's a longer discussion :p

But, it will require some more logic to when a read-only file gets something like SETATTR(chmod=0644, size=0). Trying to do a truncate() before and/or after a setattr() feels a little hacky.

Instead, I'll propose a change in the posix-acl xlator, where the permissions are checked and in the case of SETATTR(size=...) -> truncate() get denied.

Comment 6 Anand Avati 2014-09-30 08:03:18 UTC
REVIEW: http://review.gluster.org/8889 (gNFS: allow truncate() from SETATTR over NFS for owner) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 7 Niels de Vos 2014-09-30 08:04:57 UTC
Michael, this is currently a bug against glusterfs-3.3. Could you let us know for what versions of Gluster you would like to see a fix? 3.3 is not actively maintained anymore, but we can include this in 3.4 and more recent.

Comment 8 Niels de Vos 2014-10-02 07:24:16 UTC
On Twitter Michael mentioned that his project does not need this fix anymore.:
- https://twitter.com/Supermathie/status/516941222863437826

I'm moving this to 'mainline', and we can backport this change on request:
- http://www.gluster.org/community/documentation/index.php/Backport_Wishlist

Comment 9 Anand Avati 2014-10-02 07:24:40 UTC
COMMIT: http://review.gluster.org/8889 committed in master by Niels de Vos (ndevos) 
------
commit f2131b8c79641c1bf9e20657757bcc9a62a0625a
Author: Niels de Vos <ndevos>
Date:   Mon Sep 29 20:03:58 2014 +0200

    gNFS: allow truncate() from SETATTR over NFS for owner
    
    NFSv3 does not have a TRUNCATE procedure, instead it is part of the
    SETATTR (change the 'size' attribute). SETATTR with a new 'size'
    succeeds on other NFS-servers, even when the owner of the file does not
    have write permissions. Make Gluster/NFS behave the same way, by
    checking if the RPC/pid comes from the NFS-server, and allow truncate()
    when the file is owned by the user calling SETATTR.
    
    BUG: 955753
    Change-Id: I4b7cb8efe5a2032c6cd2eef6af610032f76d8b39
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/8889
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: soumya k <skoduri>

Comment 10 Niels de Vos 2015-05-14 17:25:28 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Niels de Vos 2015-05-14 17:35:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 12 Niels de Vos 2015-05-14 17:37:47 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 13 Niels de Vos 2015-05-14 17:42:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.