Description of problem: running add-brick then remove-brick, then restarting gluster leads to broken volume brick counts Steps to Reproduce: 1. set up a simple replicated volume with two nodes {code} root@gluster1:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 {code} 2. add a third brick to the replica {code} root@gluster2:~# gluster volume add-brick hosting-test replica 3 gluster1.justindev:/export/brick2/sdc1 Add Brick successful root@gluster2:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 Brick3: gluster1.justindev:/export/brick2/sdc1 {code} 3. remove the brick {code} root@gluster1:~# echo y | gluster volume remove-brick hosting-test replica 2 gluster1.justindev:/export/brick2/sdc1 Removing brick(s) can result in data loss. Do you want to Continue? (y/n) Remove Brick commit force successful root@gluster1:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: 0dcadde0-b981-472d-851a-08fbfff40ae3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 {code} 4. stop and start gluster on either node, and we get funky maths: {code} root@gluster2:~# service glusterfs-server stop glusterfs-server stop/waiting root@gluster2:~# service glusterfs-server start glusterfs-server start/running, process 11739 root@gluster2:~# gluster volume info Volume Name: hosting-test Type: Replicate Volume ID: f8d7132b-6bb1-40d4-8414-b2168cdf2cd7 Status: Started Number of Bricks: 0 x 3 = 2 Transport-type: tcp Bricks: Brick1: gluster2.justindev:/export/brick1/sdb1 Brick2: gluster1.justindev:/export/brick1/sdb1 {code} Actual results: volume ends up with funky maths for bricks. Expected results: volume reports 1 x 2 = 2 for bricks. Additional info: Ubuntu 13.04, using the 3.3 or 3.4 packages from http://download.gluster.org/pub/gluster/glusterfs/*/Ubuntu.README
*** Bug 1000779 has been marked as a duplicate of this bug. ***
Additional info: Re-adding a brick results in an "operation failed", but the operation does indeed succeed and it seems to fix it. [13:12:53] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: f3117deb-f5f5-40ff-94b5-98b2095239b2 Status: Started Number of Bricks: 0 x 3 = 2 Transport-type: tcp Bricks: Brick1: fs-15.mseeger.example.dev:/mnt/brick22 Brick2: fs-14.mseeger.example.dev:/mnt/brick23 [13:12:55] root:~# rm -rf /mnt/bla/ [13:13:00] root:~# mkdir /mnt/bla [13:13:02] root:~# gluster volume add-brick test-fs-cluster-1 replica 3 fs-15:/mnt/bla/ Operation failed on fs-14.mseeger.example.dev [13:13:08] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: f3117deb-f5f5-40ff-94b5-98b2095239b2 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: fs-15.mseeger.example.dev:/mnt/brick22 Brick2: fs-14.mseeger.example.dev:/mnt/brick23 Brick3: fs-15:/mnt/bla Adding it a second time will for some reason remove that brick: [13:15:03] root:~# gluster volume add-brick test-fs-cluster-1 replica 3 fs-15:/mnt/bla/ Operation failed [13:15:04] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: f3117deb-f5f5-40ff-94b5-98b2095239b2 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-15.mseeger.example.dev:/mnt/brick22 Brick2: fs-14.mseeger.example.dev:/mnt/brick23 I'm not quite sure what's up with the volume geometry, but it's certainly corrupted
REVIEW: http://review.gluster.org/5893 (mgmt/glusterd: Update sub_count on remove brick) posted (#1) for review on master by Vijay Bellur (vbellur)
REVIEW: http://review.gluster.org/5893 (mgmt/glusterd: Update sub_count on remove brick) posted (#2) for review on master by Vijay Bellur (vbellur)
This seems to have fixed it. Will this be backported to 3.3 / 3.4?
This is what it looks like after the fix: [13:14:20] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36 [13:14:47] root:~# mkdir /mnt/bla [13:15:08] root:~# gluster volume add-brick test-fs-cluster-1 replica 3 fs-21:/mnt/bla/ Add Brick successful [13:15:42] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36 Brick3: fs-21:/mnt/bla [13:15:49] root:~# echo y | gluster volume remove-brick test-fs-cluster-1 replica 2 fs-21:/mnt/bla/ Removing brick(s) can result in data loss. Do you want to Continue? (y/n) Remove Brick commit force successful [13:16:17] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36 [13:16:23] root:~# service glusterfs-server stop glusterfs-server stop/waiting [13:16:34] root:~# service glusterfs-server start glusterfs-server start/running, process 29760 [13:16:37] root:~# gluster volume info Volume Name: test-fs-cluster-1 Type: Replicate Volume ID: a25ac752-57c9-4496-92ca-bfdcb964edd4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: fs-21.dev:/mnt/brick37 Brick2: fs-22.dev:/mnt/brick36
COMMIT: http://review.gluster.org/5893 committed in master by Anand Avati (avati) ------ commit 643533c77fd49316b7d16015fa1a008391d14bb2 Author: Vijay Bellur <vbellur> Date: Wed Sep 11 01:26:13 2013 +0530 mgmt/glusterd: Update sub_count on remove brick Change-Id: I7c17de39da03c6b2764790581e097936da406695 BUG: 1002556 Signed-off-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/5893 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Anand Avati <avati>
REVIEW: http://review.gluster.org/5902 (mgmt/glusterd: Update sub_count on remove brick) posted (#1) for review on release-3.4 by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/5902 committed in release-3.4 by Vijay Bellur (vbellur) ------ commit d9dde294cfd7bb83bccbe777dfd58b925a6f2f7b Author: Vijay Bellur <vbellur> Date: Wed Sep 11 01:26:13 2013 +0530 mgmt/glusterd: Update sub_count on remove brick Change-Id: I7c17de39da03c6b2764790581e097936da406695 BUG: 1002556 Signed-off-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/5902 Tested-by: Gluster Build System <jenkins.com>
This is alsy failing in 3.3 Will there be a backport? (I tested the fix on 3.3, worked fine)
This is also failing in 3.3 Will there be a backport? (I tested the fix on 3.3, worked fine)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report. glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5". [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978 [2] http://news.gmane.org/gmane.comp.file-systems.gluster.user [3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137