Created attachment 718 [details] Bad XF86Config generated by installer.
Created a pure replicate volume with rdma transport type. Mounted via fuse and started running dbench for 1000 secs with 20 clients. After sometime I took down one of the brick. Now when the brick comes back online, dbench failed in unlink with following message. 20 21179 6.03 MB/sec execute 160 sec latency 1102.911 ms 20 21242 6.06 MB/sec execute 161 sec latency 952.563 ms 20 21305 6.09 MB/sec execute 162 sec latency 346.382 ms [21398] open ./clients/client12/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 0 20 21357 6.10 MB/sec execute 163 sec latency 414.598 ms [21152] open ./clients/client17/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548 [21447] open ./clients/client3/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063 [21325] open ./clients/client13/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063 20 21370 6.07 MB/sec execute 164 sec latency 1099.871 ms [21139] open ./clients/client6/~dmtmp/ACCESS/SALES.PRN succeeded for handle 21340 [21152] open ./clients/client6/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548 [21436] open ./clients/client14/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063 [21325] open ./clients/client7/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063 20 21405 6.07 MB/sec execute 165 sec latency 712.598 ms [21648] open ./clients/client10/~dmtmp/PM/BDES1.PRN succeeded for handle 12627 [21436] open ./clients/client13/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063 [21648] open ./clients/client16/~dmtmp/PM/BDES1.PRN succeeded for handle 12627 [21648] open ./clients/client8/~dmtmp/PM/BDES1.PRN succeeded for handle 12627 [21648] open ./clients/client4/~dmtmp/PM/BDES1.PRN succeeded for handle 12627 20 21474 6.12 MB/sec execute 166 sec latency 850.422 ms [21436] open ./clients/client7/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063 20 21561 6.13 MB/sec execute 167 sec latency 205.801 ms 20 21644 6.14 MB/sec execute 168 sec latency 312.451 ms 20 21717 6.12 MB/sec execute 169 sec latency 94.683 ms 20 21770 6.09 MB/sec execute 170 sec latency 178.992 ms 20 21819 6.07 MB/sec execute 171 sec latency 171.873 ms 20 21860 6.04 MB/sec execute 172 sec latency 338.510 ms 20 21861 6.00 MB/sec execute 173 sec latency 1338.601 ms 20 21872 5.97 MB/sec execute 174 sec latency 1463.730 ms 20 21886 5.94 MB/sec execute 175 sec latency 1369.267 ms [21886] unlink ./clients/client12/~dmtmp/COREL/GRAPH1.CDR failed (No such file or directory) - expected NT_STATUS_OK ERROR: child 12 failed at line 21886 Child failed with status 1 Last time I tried, dbench failed as soon as the replicate brick went down. Now it's failing when it comes back online. I have attached the client log.
Is it happening on 3.2 branch?
It happened on RHSSA 3.2.4. I haven't checked whether it's still happening recently.
Can you please check if the problem persists on 3.3 now?
Please feel free to re-open once you have the logs with 3.3.x/3.4.x releases.