Description of problem: I was working on the USC issue of geo-replication. While starting geo-replication session i got this crash. Version-Release number of selected component (if applicable):release3.2.6 master How reproducible:Not sure Additional info: This is bt of the core. ################################################################################ #0 0x00002aab1ceb99ea in _dict_lookup (this=0x31e2950, key=0x2aaaaab241a1 "gsync-count") at dict.c:209 #1 0x00002aab1ceb9b36 in dict_get_with_ref (this=0x31e2950, key=0x2aaaaab241a1 "gsync-count", data=0x7ffff93072d0) at dict.c:1299 #2 0x00002aab1cebbdc3 in dict_get_int32 (this=0x2aaaaab241a9, key=0xb <Address 0xb out of bounds>, val=0x7ffff930732c) at dict.c:1649 #3 0x00002aaaaaaea91b in glusterd_read_status_file (master=0x2f2e560 "vol10", slave=0x3038155 "gluster://10.1.11.86:vol10", dict=0x31e2950) at glusterd-op-sm.c:4124 #4 0x00002aaaaaaeacac in glusterd_get_gsync_status_mst_slv (volinfo=0x2f2e560, slave=0x3038155 "gluster://10.1.11.86:vol10", rsp_dict=0x31e2950) at glusterd-op-sm.c:4295 #5 0x00002aab1ceb8eb6 in dict_foreach (dict=0x2aaaaab241a9, fn=0x2aaaaaaeae00 <_get_status_mst_slv>, data=0x7ffff930c430) at dict.c:1198 #6 0x00002aaaaaae0859 in glusterd_get_gsync_status_mst (volinfo=0x2f2e560, rsp_dict=0x31e2950) at glusterd-op-sm.c:4310 #7 0x00002aaaaaaef4ca in glusterd_get_gsync_status (dict=<value optimized out>, op_errstr=0x7ffff930dd28, rsp_dict=<value optimized out>) at glusterd-op-sm.c:4329 #8 glusterd_op_gsync_set (dict=<value optimized out>, op_errstr=0x7ffff930dd28, rsp_dict=<value optimized out>) at glusterd-op-sm.c:4768 #9 0x00002aaaaaaf24a7 in glusterd_op_commit_perform (op=<value optimized out>, dict=0x3032d20, op_errstr=0x7ffff930dd28, rsp_dict=0x9a8cae9f) at glusterd-op-sm.c:7646 #10 0x00002aaaaaaf35a3 in glusterd_op_ac_commit_op (event=<value optimized out>, ctx=0x31e2470) at glusterd-op-sm.c:7441 #11 0x00002aaaaaae04cf in glusterd_op_sm () at glusterd-op-sm.c:8458 #12 0x00002aaaaaac78c7 in glusterd_handle_commit_op (req=<value optimized out>) at glusterd-handler.c:601 #13 0x00002aab1d1121e1 in rpcsvc_handle_rpc_call (svc=0x2eefb00, trans=<value optimized out>, msg=0x30411c0) at rpcsvc.c:480 #14 0x00002aab1d1123ec in rpcsvc_notify (trans=0x2ef32d0, mydata=0x2aaaaab241a9, event=<value optimized out>, data=0x30411c0) at rpcsvc.c:576 #15 0x00002aab1d113317 in rpc_transport_notify (this=0x2aaaaab241a9, event=RPC_TRANSPORT_ACCEPT, data=0x9a8cae9f) at rpc-transport.c:919 #16 0x00002aaaaadec5ef in socket_event_poll_in (this=0x2ef32d0) at socket.c:1647 #17 0x00002aaaaadec798 in socket_event_handler (fd=<value optimized out>, idx=1, data=0x2ef32d0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1762 #18 0x00002aab1cee7631 in event_dispatch_epoll_handler (event_pool=0x2ee7370) at event.c:794 #19 event_dispatch_epoll (event_pool=0x2ee7370) at event.c:856 #20 0x000000000040566e in main (argc=1, argv=0x7ffff930e638) at glusterfsd.c:1509 (gdb) f 0 #0 0x00002aab1ceb99ea in _dict_lookup (this=0x31e2950, key=0x2aaaaab241a1 "gsync-count") at dict.c:209 209 for (pair = this->members[hashval]; pair != NULL; pair = pair->hash_next) { (gdb) f 1 #1 0x00002aab1ceb9b36 in dict_get_with_ref (this=0x31e2950, key=0x2aaaaab241a1 "gsync-count", data=0x7ffff93072d0) at dict.c:1299 1299 pair = _dict_lookup (this, key); (gdb) f 2 #2 0x00002aab1cebbdc3 in dict_get_int32 (this=0x2aaaaab241a9, key=0xb <Address 0xb out of bounds>, val=0x7ffff930732c) at dict.c:1649 1649 ret = dict_get_with_ref (this, key, &data); (gdb) f 3 #3 0x00002aaaaaaea91b in glusterd_read_status_file (master=0x2f2e560 "vol10", slave=0x3038155 "gluster://10.1.11.86:vol10", dict=0x31e2950) at glusterd-op-sm.c:4124 4124 ret = dict_get_int32 (dict, "gsync-count", &gsync_count); (gdb) ############################################################################## logs eer (10.1.11.84:792) [2012-03-04 13:27:44.374352] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected) , peer (10.1.11.84:795) [2012-03-04 13:27:44.374918] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected) , peer (10.1.11.84:800) [2012-03-04 13:27:44.375451] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected) , peer (10.1.11.84:812) [2012-03-04 13:27:44.375900] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected) , peer (10.1.11.84:823) [2012-03-04 13:27:44.376266] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected) , peer (10.1.11.84:891) [2012-03-04 13:27:44.376813] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.84:894) [2012-03-04 13:27:44.377199] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.84:897) [2012-03-04 13:27:44.377678] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.84:900) [2012-03-04 13:27:46.792311] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.84:584) [2012-03-04 13:27:46.804335] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.84:581) peer (127.0.0.1:366) [2012-03-04 13:36:24.911260] I [glusterd-handler.c:1729:glusterd_handle_gsync_set] 0-: master not found, while handlinggeo-replication options [2012-03-04 13:36:24.911302] I [glusterd-handler.c:1736:glusterd_handle_gsync_set] 0-: slave not not found, whilehandling geo-replication options [2012-03-04 13:36:24.911367] I [glusterd-utils.c:243:glusterd_lock] 0-glusterd: Cluster lock held by 97f387d3-9c0f-4a6f-8cdb-d26921070844 [2012-03-04 13:36:24.911386] I [glusterd-handler.c:420:glusterd_op_txn_begin] 0-glusterd: Acquired local lock [2012-03-04 13:36:24.912240] I [glusterd-rpc-ops.c:758:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: fb3f15ba-7c4d-4b79-96f9-92bf5bec535c [2012-03-04 13:36:24.912426] I [glusterd-rpc-ops.c:758:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 13d67a6b-566e-42a0-89b1-463a4ebe89b3 [2012-03-04 13:36:24.912606] I [glusterd-op-sm.c:6737:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 2 peers [2012-03-04 13:36:24.912976] I [glusterd-rpc-ops.c:1056:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: fb3f15ba-7c4d-4b79-96f9-92bf5bec535c [2012-03-04 13:36:24.913038] I [glusterd-rpc-ops.c:1056:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 13d67a6b-566e-42a0-89b1-463a4ebe89b3 [2012-03-04 13:36:27.291071] I [glusterd-op-sm.c:6854:glusterd_op_ac_send_commit_op] 0-glusterd: Sent op req to 2 peers [2012-03-04 13:36:28.677611] I [glusterd-rpc-ops.c:1242:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 13d67a6b-566e-42a0-89b1-463a4ebe89b3 [2012-03-04 13:36:30.761185] I [glusterd-rpc-ops.c:1242:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: fb3f15ba-7c4d-4b79-96f9-92bf5bec535c [2012-03-04 13:36:30.762003] I [glusterd-rpc-ops.c:817:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 13d67a6b-566e-42a0-89b1-463a4ebe89b3 [2012-03-04 13:36:30.762120] I [glusterd-rpc-ops.c:817:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: fb3f15ba-7c4d-4b79-96f9-92bf5bec535c [2012-03-04 13:36:30.762159] I [glusterd-op-sm.c:7250:glusterd_op_txn_complete] 0-glusterd: Cleared local lock [2012-03-04 13:36:30.765011] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:942)
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.
Similar to bug #801692. For 3.2qa* release.
*** Bug 801692 has been marked as a duplicate of this bug. ***
Vijaykumar, Can you try the steps again to see if this is reproducible? I can't hit this case.
Vijaykumar, Please test this out. If it's not reproducible please close it.