Bug 765514 (GLUSTER-3782) - [Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back up dbench fails
Summary: [Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back u...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: GLUSTER-3782
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 854629
TreeView+ depends on / blocked
 
Reported: 2011-11-04 07:11 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 854629 (view as bug list)
Environment:
Last Closed: 2013-02-22 10:27:31 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
glusterfs client log (626.52 KB, text/x-log)
2011-11-04 04:12 UTC, M S Vishwanath Bhat
no flags Details

Description M S Vishwanath Bhat 2011-11-04 04:12:17 UTC
Created attachment 718 [details]
Bad XF86Config generated by installer.

Comment 1 M S Vishwanath Bhat 2011-11-04 07:11:42 UTC
Created a pure replicate volume with rdma transport type. Mounted via fuse and started running dbench for 1000 secs with 20 clients. After sometime I took down one of the brick. Now when the brick comes back online, dbench failed in unlink with following message.

  20     21179     6.03 MB/sec  execute 160 sec  latency 1102.911 ms
  20     21242     6.06 MB/sec  execute 161 sec  latency 952.563 ms
  20     21305     6.09 MB/sec  execute 162 sec  latency 346.382 ms
[21398] open ./clients/client12/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 0
  20     21357     6.10 MB/sec  execute 163 sec  latency 414.598 ms
[21152] open ./clients/client17/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21447] open ./clients/client3/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client13/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21370     6.07 MB/sec  execute 164 sec  latency 1099.871 ms
[21139] open ./clients/client6/~dmtmp/ACCESS/SALES.PRN succeeded for handle 21340
[21152] open ./clients/client6/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21436] open ./clients/client14/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client7/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21405     6.07 MB/sec  execute 165 sec  latency 712.598 ms
[21648] open ./clients/client10/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21436] open ./clients/client13/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21648] open ./clients/client16/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client8/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client4/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
  20     21474     6.12 MB/sec  execute 166 sec  latency 850.422 ms
[21436] open ./clients/client7/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
  20     21561     6.13 MB/sec  execute 167 sec  latency 205.801 ms
  20     21644     6.14 MB/sec  execute 168 sec  latency 312.451 ms
  20     21717     6.12 MB/sec  execute 169 sec  latency 94.683 ms
  20     21770     6.09 MB/sec  execute 170 sec  latency 178.992 ms
  20     21819     6.07 MB/sec  execute 171 sec  latency 171.873 ms
  20     21860     6.04 MB/sec  execute 172 sec  latency 338.510 ms
  20     21861     6.00 MB/sec  execute 173 sec  latency 1338.601 ms
  20     21872     5.97 MB/sec  execute 174 sec  latency 1463.730 ms
  20     21886     5.94 MB/sec  execute 175 sec  latency 1369.267 ms
[21886] unlink ./clients/client12/~dmtmp/COREL/GRAPH1.CDR failed (No such file or directory) - expected NT_STATUS_OK
ERROR: child 12 failed at line 21886
Child failed with status 1


Last time I tried, dbench failed as soon as the replicate brick went down. Now it's failing when it comes back online.

I have attached the client log.

Comment 2 Pranith Kumar K 2012-02-24 08:44:48 UTC
Is it happening on 3.2 branch?

Comment 3 M S Vishwanath Bhat 2012-02-24 09:05:07 UTC
It happened on RHSSA 3.2.4. I haven't checked whether it's still happening recently.

Comment 4 Vijay Bellur 2012-03-29 11:41:40 UTC
Can you please check if the problem persists on 3.3 now?

Comment 5 Pranith Kumar K 2013-02-22 10:27:31 UTC
Please feel free to re-open once you have the logs with 3.3.x/3.4.x releases.


Note You need to log in before you can comment on or make changes to this bug.