Bug 823304 - [1d939fe7adef651b90bb5c4cd5843768417f0138]: geo-replication status goes to faulty state due to corrupted timestamp
Summary: [1d939fe7adef651b90bb5c4cd5843768417f0138]: geo-replication status goes to fa...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: Unspecified
OS: Unspecified
low
unspecified
Target Milestone: ---
Assignee: Venky Shankar
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 849302
TreeView+ depends on / blocked
 
Reported: 2012-05-20 18:33 UTC by Raghavendra Bhat
Modified: 2013-03-06 11:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: {add,remove}-brick causes gsyncd to to enter into faulty state temporarily. Consequence: Although the gsyncd state is faulty (as per cli and is temporary) there is no issues with data syncing. There is only a gsyncd worker restart. Workaround (if any): Although it's not a workaround, after the worker restart, the gsyncd status remains as 'faulty' for 60 secs and turns 'OK' after that. Result: Gsyncd status turns 'OK' after 60 secs and there no problems with syncing of data.
Clone Of:
: 849302 (view as bug list)
Environment:
Last Closed: 2013-03-06 11:42:16 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
log file of the geo-replication process (573 bytes, text/x-log)
2012-05-20 18:34 UTC, Raghavendra Bhat
no flags Details

Description Raghavendra Bhat 2012-05-20 18:33:16 UTC
Description of problem:

Create a replicate volume (2 replica), enable gsync and quota on it. mount the client and start untarring and then compilation of the glusterfs tarball. While untarring and compilation was going on added one more brick to the volume (thus increasing the replica count to 3). gsync session became faulty. So stopped it and started it again. Then removed one of the bricks (again bring down replica count to 2). Then disabled quota. The session again became faulty.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. enable gsync and quota on a replicate volume
2. mount the volume via fuse client and start untar/compilation of glusterfs source.
3. while above test is running add one more brick to the volume and increase the replica count from 2 to 3.
4. remove brick from the volume and decrease the replica count to 2 from 3.
5. disable quota
  
Actual results:

geo-replication session became faulty
Expected results:
geo-replication session should not become faulty.

Additional info:

Comment 1 Raghavendra Bhat 2012-05-20 18:34:04 UTC
Created attachment 585670 [details]
log file of the geo-replication process

Comment 2 Venky Shankar 2013-03-06 09:10:40 UTC
well, the geo-replication status becomes faulty as it gets an ENOTCONN/ECONNABORTED during an add-brick or a remove-brick is performed.

You need not stop/start the session again as the monitor thread does that on encountering a state change from OK -> faulty or such. The issue is what is shown in geo-rep cli status - it remains faulty after self restart as there is a window of 60 secs when the status file is updated, after which the status become OK again. There is no problem with syncing of files.

Amar, does the above explanation seems that something needs to be fixed as there is actually no issue with the sync (only a temporary state change from gsyncd POV as it holds a reference to the lazily umounted volume).

Comment 3 Amar Tumballi 2013-03-06 10:38:12 UTC
Venky, do you think we should document this? Right now, I don't think this needs any fixes in code. Go ahead and close it with NOTABUG. (with doc-text field updated)

Comment 4 Venky Shankar 2013-03-06 11:42:16 UTC
yes, it's good to document this as it could very much seem alarming for a user.


Note You need to log in before you can comment on or make changes to this bug.