Description of problem: Create a replicate volume (2 replica), enable gsync and quota on it. mount the client and start untarring and then compilation of the glusterfs tarball. While untarring and compilation was going on added one more brick to the volume (thus increasing the replica count to 3). gsync session became faulty. So stopped it and started it again. Then removed one of the bricks (again bring down replica count to 2). Then disabled quota. The session again became faulty. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. enable gsync and quota on a replicate volume 2. mount the volume via fuse client and start untar/compilation of glusterfs source. 3. while above test is running add one more brick to the volume and increase the replica count from 2 to 3. 4. remove brick from the volume and decrease the replica count to 2 from 3. 5. disable quota Actual results: geo-replication session became faulty Expected results: geo-replication session should not become faulty. Additional info:
Created attachment 585670 [details] log file of the geo-replication process
well, the geo-replication status becomes faulty as it gets an ENOTCONN/ECONNABORTED during an add-brick or a remove-brick is performed. You need not stop/start the session again as the monitor thread does that on encountering a state change from OK -> faulty or such. The issue is what is shown in geo-rep cli status - it remains faulty after self restart as there is a window of 60 secs when the status file is updated, after which the status become OK again. There is no problem with syncing of files. Amar, does the above explanation seems that something needs to be fixed as there is actually no issue with the sync (only a temporary state change from gsyncd POV as it holds a reference to the lazily umounted volume).
Venky, do you think we should document this? Right now, I don't think this needs any fixes in code. Go ahead and close it with NOTABUG. (with doc-text field updated)
yes, it's good to document this as it could very much seem alarming for a user.