765514 – (GLUSTER-3782) [Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back up dbench fails

Bug 765514 (GLUSTER-3782) - [Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back up dbench fails

Summary: [Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back u...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	GLUSTER-3782
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	pre-release
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	854629
TreeView+	depends on / blocked

Reported:	2011-11-04 07:11 UTC by M S Vishwanath Bhat
Modified:	2016-06-01 01:55 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	854629 (view as bug list)
Environment:
Last Closed:	2013-02-22 10:27:31 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterfs client log (626.52 KB, text/x-log) 2011-11-04 04:12 UTC, M S Vishwanath Bhat	no flags	Details
View All

Description M S Vishwanath Bhat 2011-11-04 04:12:17 UTC

Created attachment 718 [details]
Bad XF86Config generated by installer.

Comment 1 M S Vishwanath Bhat 2011-11-04 07:11:42 UTC

Created a pure replicate volume with rdma transport type. Mounted via fuse and started running dbench for 1000 secs with 20 clients. After sometime I took down one of the brick. Now when the brick comes back online, dbench failed in unlink with following message.

  20     21179     6.03 MB/sec  execute 160 sec  latency 1102.911 ms
  20     21242     6.06 MB/sec  execute 161 sec  latency 952.563 ms
  20     21305     6.09 MB/sec  execute 162 sec  latency 346.382 ms
[21398] open ./clients/client12/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 0
  20     21357     6.10 MB/sec  execute 163 sec  latency 414.598 ms
[21152] open ./clients/client17/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21447] open ./clients/client3/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client13/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21370     6.07 MB/sec  execute 164 sec  latency 1099.871 ms
[21139] open ./clients/client6/~dmtmp/ACCESS/SALES.PRN succeeded for handle 21340
[21152] open ./clients/client6/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21436] open ./clients/client14/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client7/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21405     6.07 MB/sec  execute 165 sec  latency 712.598 ms
[21648] open ./clients/client10/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21436] open ./clients/client13/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21648] open ./clients/client16/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client8/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client4/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
  20     21474     6.12 MB/sec  execute 166 sec  latency 850.422 ms
[21436] open ./clients/client7/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
  20     21561     6.13 MB/sec  execute 167 sec  latency 205.801 ms
  20     21644     6.14 MB/sec  execute 168 sec  latency 312.451 ms
  20     21717     6.12 MB/sec  execute 169 sec  latency 94.683 ms
  20     21770     6.09 MB/sec  execute 170 sec  latency 178.992 ms
  20     21819     6.07 MB/sec  execute 171 sec  latency 171.873 ms
  20     21860     6.04 MB/sec  execute 172 sec  latency 338.510 ms
  20     21861     6.00 MB/sec  execute 173 sec  latency 1338.601 ms
  20     21872     5.97 MB/sec  execute 174 sec  latency 1463.730 ms
  20     21886     5.94 MB/sec  execute 175 sec  latency 1369.267 ms
[21886] unlink ./clients/client12/~dmtmp/COREL/GRAPH1.CDR failed (No such file or directory) - expected NT_STATUS_OK
ERROR: child 12 failed at line 21886
Child failed with status 1


Last time I tried, dbench failed as soon as the replicate brick went down. Now it's failing when it comes back online.

I have attached the client log.

Comment 2 Pranith Kumar K 2012-02-24 08:44:48 UTC

Is it happening on 3.2 branch?

Comment 3 M S Vishwanath Bhat 2012-02-24 09:05:07 UTC

It happened on RHSSA 3.2.4. I haven't checked whether it's still happening recently.

Comment 4 Vijay Bellur 2012-03-29 11:41:40 UTC

Can you please check if the problem persists on 3.3 now?

Comment 5 Pranith Kumar K 2013-02-22 10:27:31 UTC

Please feel free to re-open once you have the logs with 3.3.x/3.4.x releases.

Note You need to log in before you can comment on or make changes to this bug.