Description of problem: In order to retire one or more bricks from a volume, you must do a 'remove-brick start' operation, followed by 'remove-brick commit' when the migration is complete. When doing this, each file that gets migrated becomes unavailable to the clients. Issuing the commit operation makes all migrated files available again. Steps to Reproduce: 1. Begin migrating data off a brick with the remove-brick start command. 2. Check the $volume-rebalance log to find a file that has been migrated. 3. Try to access the file found in step 2. Access will fail. 4. Wait for the migration to complete. 5. If there are failures 6. Issue the remove-brick commit operation. 7. Try to access the file again. It will succeed. Actual results: Each migrated file is unavailable from the time it gets migrated until the commit operation is performed. Expected results: Each file should remain available after it gets migrated. The commit operation should not be required to continue to access data. The commit operation should simply finalize the removal, or (when it might be required) force removal with data loss if no migration has been done. Additional info: Bug 770346 is similar, though apparently with that bug, the data was completely lost even after the commit. The migration seems to be prone to failures on individual files. No failure notification is made other than a number on the 'status' screen that such failures have occurred. Such failures are guaranteed when the available disk space on one or more bricks is less than the amount of used space on the brick that is being removed, even if the volume as a whole has plenty of space. I will file a separate bug for that problem. I did my tests with a 4x2 distribute-replicate volume living on two nodes (each with 4 bricks), removing both replicas of the last brick. It is likely that the same problem would happen on a pure distribute volume, but I have not tested it. I expect to start off with 4TB drives, one brick per drive, and each brick will contain several million files. Migrating the data off such a brick will take several hours. We cannot afford to have that much data be unavailable for that much time. Someday the servers with the 4TB drives will be ancient, ready for retirement.
If the volume starts out more than half full, you are likely to run into Bug 862347 at step 4.
Hi Shawn, Please attach the client logs (mount process) where the look up of such files fail. The remove-brick logs related to the files in question would also help.
As noted on Bug 862347, I completed one remove-brick run and did not run into this bug. As of the end of that first run, all migrated files seem to be still accessible. I will see what happens during subsequent remove-brick runs.
During the first round of testing for Bug 862347, I did not run into this bug at all. I have no idea what's different between this run and the one where everything was unavailable. I do plan to do another round of testing after completely deleting the volume and starting over.
it is possible that you have hit bug 852361 in the earlier testing. Can you please share info on your volume type (gluster volume info) & (gluster volume status <VOL> detail). If you are not able to hit this bug in another few runs, we would like to close this bug as WORKSFORME
The files were definitely not owned by root, but as far as I know, nothing had them open at the time. I haven't had time to get back to this testing, but I certainly hope to do so soon. The output below is not from the volume I was testing with at the time, but it was on the same hardware and the setup is the same: [root@testb1 ~]# gluster volume info Volume Name: testvol Type: Distributed-Replicate Volume ID: 182df850-96f3-4d69-95b9-18e9ea409dfb Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: testb1:/bricks/b1/testvol Brick2: testb2:/bricks/b1/testvol Brick3: testb1:/bricks/b2/testvol Brick4: testb2:/bricks/b2/testvol Brick5: testb1:/bricks/b3/testvol Brick6: testb2:/bricks/b3/testvol Brick7: testb1:/bricks/b4/testvol Brick8: testb2:/bricks/b4/testvol Volume Name: flubber Type: Distributed-Replicate Volume ID: f936fc99-cebc-4ff5-b52f-51ad57ba211a Status: Created Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: testb1:/bricks/b1/flubber Brick2: testb2:/bricks/b1/flubber Brick3: testb1:/bricks/b2/flubber Brick4: testb2:/bricks/b2/flubber Brick5: testb1:/bricks/b3/flubber Brick6: testb2:/bricks/b3/flubber Brick7: testb1:/bricks/b4/flubber Brick8: testb2:/bricks/b4/flubber [root@testb1 ~]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_main-lv_root 49537840 3357692 43663772 8% / tmpfs 1914332 0 1914332 0% /dev/shm /dev/md1 1032076 127836 851812 14% /boot /dev/sda3 922833364 6370736 916462628 1% /bricks/b1 /dev/sdb3 922833364 6464812 916368552 1% /bricks/b2 /dev/sdc3 922833364 5977444 916855920 1% /bricks/b3 /dev/sdd3 922833364 6355516 916477848 1% /bricks/b4 [root@testb1 ~]# mount /dev/mapper/vg_main-lv_root on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/md1 on /boot type ext4 (rw) /dev/sda3 on /bricks/b1 type xfs (rw,noatime,nodiratime,nobarrier,inode64) /dev/sdb3 on /bricks/b2 type xfs (rw,noatime,nodiratime,nobarrier,inode64) /dev/sdc3 on /bricks/b3 type xfs (rw,noatime,nodiratime,nobarrier,inode64) /dev/sdd3 on /bricks/b4 type xfs (rw,noatime,nodiratime,nobarrier,inode64) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
I realized that I did not include one of your requests. Looks like df and mount give you most of this information, though. [root@testb1 ~]# gluster volume status testvol detail Status of volume: testvol ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b1/testvol Port : 24009 Online : Y Pid : 1758 File System : xfs Device : /dev/sda3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 874.0GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813115 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b1/testvol Port : 24009 Online : Y Pid : 1730 File System : xfs Device : /dev/sda3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 874.0GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813115 ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b2/testvol Port : 24010 Online : Y Pid : 1763 File System : xfs Device : /dev/sdb3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 873.9GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813132 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b2/testvol Port : 24010 Online : Y Pid : 1735 File System : xfs Device : /dev/sdb3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 873.9GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813132 ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b3/testvol Port : 24011 Online : Y Pid : 1769 File System : xfs Device : /dev/sdc3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 874.4GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813028 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b3/testvol Port : 24011 Online : Y Pid : 1742 File System : xfs Device : /dev/sdc3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 874.4GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813028 ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b4/testvol Port : 24012 Online : Y Pid : 1776 File System : xfs Device : /dev/sdd3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 874.0GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813177 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b4/testvol Port : 24012 Online : Y Pid : 1747 File System : xfs Device : /dev/sdd3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 874.0GB Total Disk Space : 880.1GB Inode Count : 230820992 Free Inodes : 230813177
I have set up a new test with 4GiB bricks. This will be a simultaneous test of bug 862347. I just limited the size of the xfs filesystems with -d size=4g. Server info gathered after I filled up the volume to slightly over half full: [root@testb1 ~]# gluster volume status testvol detail Status of volume: testvol ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b1/testvol Port : 24013 Online : Y Pid : 9751 File System : xfs Device : /dev/sda3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.8GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047782 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b1/testvol Port : 24013 Online : Y Pid : 9550 File System : xfs Device : /dev/sda3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.8GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047782 ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b2/testvol Port : 24014 Online : Y Pid : 9756 File System : xfs Device : /dev/sdb3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.6GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047748 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b2/testvol Port : 24014 Online : Y Pid : 9556 File System : xfs Device : /dev/sdb3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.6GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047748 ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b3/testvol Port : 24015 Online : Y Pid : 9762 File System : xfs Device : /dev/sdc3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.7GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047747 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b3/testvol Port : 24015 Online : Y Pid : 9561 File System : xfs Device : /dev/sdc3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.7GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047747 ------------------------------------------------------------------------------ Brick : Brick testb1:/bricks/b4/testvol Port : 24016 Online : Y Pid : 9768 File System : xfs Device : /dev/sdd3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.7GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047741 ------------------------------------------------------------------------------ Brick : Brick testb2:/bricks/b4/testvol Port : 24016 Online : Y Pid : 9567 File System : xfs Device : /dev/sdd3 Mount Options : rw,noatime,nodiratime,nobarrier,inode64 Inode Size : 1024 Disk Space Free : 1.7GB Total Disk Space : 4.0GB Inode Count : 1048576 Free Inodes : 1047741 [root@testb1 ~]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_main-lv_root 49537840 3357664 43663800 8% / tmpfs 1914332 0 1914332 0% /dev/shm /dev/md1 1032076 127836 851812 14% /boot /dev/sda3 4184064 2282448 1901616 55% /bricks/b1 /dev/sdb3 4184064 2489576 1694488 60% /bricks/b2 /dev/sdc3 4184064 2394880 1789184 58% /bricks/b3 /dev/sdd3 4184064 2442244 1741820 59% /bricks/b4 [root@testb2 ~]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_main-lv_root 49537840 3436492 43584972 8% / tmpfs 1914332 0 1914332 0% /dev/shm /dev/md1 1032076 122412 857236 13% /boot /dev/sda3 4184064 2282448 1901616 55% /bricks/b1 /dev/sdb3 4184064 2489576 1694488 60% /bricks/b2 /dev/sdc3 4184064 2394880 1789184 58% /bricks/b3 /dev/sdd3 4184064 2442244 1741820 59% /bricks/b4
Info from client gathered concurrently with the server info above. All files created using non-root user on the client. The client just happens to be a gluster-swift UFO server, thus the mount point. Files are created with random sizes between 16KiB and 16MiB. [elyograg@testb3 foo]$ pwd /mnt/gluster-object/AUTH_testvol/foo [elyograg@testb3 foo]$ df -k . Filesystem 1K-blocks Used Available Use% Mounted on localhost:testvol 16736256 9609216 7127040 58% /mnt/gluster-object/AUTH_testvol [elyograg@testb3 foo]$ du 6707257 ./g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw/JqI9N2 6707257 ./g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw 6707257 ./g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK 6707257 ./g1lpxY/E81phT/GPPjeB/tW8tvq 6707257 ./g1lpxY/E81phT/GPPjeB 6707257 ./g1lpxY/E81phT 6707257 ./g1lpxY 1430245 ./sC.CIW/aie4AJ/QzJ2W2/Jcx0hG/tjF-.K/NOGlG9 1430245 ./sC.CIW/aie4AJ/QzJ2W2/Jcx0hG/tjF-.K 1430245 ./sC.CIW/aie4AJ/QzJ2W2/Jcx0hG 1430245 ./sC.CIW/aie4AJ/QzJ2W2 1430245 ./sC.CIW/aie4AJ 1430245 ./sC.CIW 1334417 ./25hHAg/FktfVS/pWETA- 1334417 ./25hHAg/FktfVS 1334417 ./25hHAg 9471919 . [elyograg@testb3 foo]$ find . -type f | wc -l 1165
Before beginning the migration, I deleted everything in /var/log/glusterfs on the first server and restarted glusterd so I will have clean logfiles. [root@testb1 glusterfs]# rpm -qa | grep gluster glusterfs-server-3.3.1-1.el6.x86_64 glusterfs-fuse-3.3.1-1.el6.x86_64 glusterfs-geo-replication-3.3.1-1.el6.x86_64 glusterfs-3.3.1-1.el6.x86_64
Command entered to begin migration: gluster volume remove-brick testvol testb1:/bricks/b4/testvol testb2:/bricks/b4/testvol start
I can see that there are some permission problems after the every pass of the rebalance completes, with 9 failures. This is on the client: [elyograg@testb3 AUTH_testvol]$ ls -al foo/g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw/JqI9N2/x39Oei34 ---------T 1 root root 0 Oct 23 13:40 foo/g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw/JqI9N2/x39Oei34
Later, the ownership of that entry changed, but the permissions did not update: [elyograg@testb3 AUTH_testvol]$ ls -al foo/g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw/JqI9N2/x39Oei34 ---------T 1 elyograg elyograg 0 Oct 23 13:40 foo/g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw/JqI9N2/x39Oei34 I will attach the full listing of the parent directory of this item so you can see that there are other entries in that directory with the odd permissions.
When I checked that same file during the fourth pass of remove-brick, it had corrected itself completely: [elyograg@testb3 AUTH_testvol]$ ls -al foo/g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw/JqI9N2/x39Oei34 -rw-rw-r-- 1 elyograg elyograg 11468800 Oct 23 13:18 foo/g1lpxY/E81phT/GPPjeB/tW8tvq/iWbIBK/N.R7Fw/JqI9N2/x39Oei34
Created attachment 632367 [details] listing of parent directory for file with odd permissions Here is a directory listing showing the odd permissions and zero bytes on some files. By the time I had completed all the remove-brick passes, these errors had corrected themselves, no odd permissions. Except for a few files, I never did run into the major unavailability problems that led me to file this bug.
Created attachment 632368 [details] rebalance log which covers all four passes of remove-brick
It is important to know that during every single remove-brick pass, one (sometimes more than one) of the brick filesystems reached 100% capacity. After the first remove-brick pass: [root@testb1 glusterfs]# gluster volume remove-brick testvol testb1:/bricks/b4/testvol testb2:/bricks/b4/testvol status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 804 6.1GB 1248 9 completed testb4 0 0Bytes 0 0 not started testb3 0 0Bytes 0 0 not started testb2 0 0Bytes 1168 0 completed After the second pass: [root@testb1 glusterfs]# gluster volume remove-brick testvol testb1:/bricks/b4/testvol testb2:/bricks/b4/testvol status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 367 2.9GB 1337 467 completed testb4 0 0Bytes 0 0 not started testb3 0 0Bytes 0 0 not started testb2 0 0Bytes 1169 0 completed After the third pass: [root@testb1 glusterfs]# gluster volume remove-brick testvol testb1:/bricks/b4/testvol testb2:/bricks/b4/testvol status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 345 2.7GB 1413 122 completed testb4 0 0Bytes 0 0 not started testb3 0 0Bytes 0 0 not started testb2 0 0Bytes 1168 0 completed After the fourth pass. Finally, no failures! Also, no files on brick 4: [root@testb1 glusterfs]# gluster volume remove-brick testvol testb1:/bricks/b4/testvol testb2:/bricks/b4/testvol status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 122 940.3MB 1287 0 completed testb4 0 0Bytes 0 0 not started testb3 0 0Bytes 0 0 not started testb2 0 0Bytes 1166 0 completed Final server-side df after all four passes and issuing remove-brick commit: [root@testb2 ~]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_main-lv_root 49537840 3438192 43583272 8% / tmpfs 1914332 0 1914332 0% /dev/shm /dev/md1 1032076 122412 857236 13% /boot /dev/sda3 4184064 3146136 1037928 76% /bricks/b1 /dev/sdb3 4184064 3148236 1035828 76% /bricks/b2 /dev/sdc3 4184064 3283540 900524 79% /bricks/b3 /dev/sdd3 4184064 33880 4150184 1% /bricks/b4
Final note: it looks like the permission problems you mentioned did indeed occur, but eventually resolved themselves. I cannot reproduce this bug, so the WORKSFORME close is probably the best option.
Thanks for the detailed report and follow ups. Feel free to reopen the bug if the issue is seen again