Two node setup with replication. Both nodes nfs mount the volume and work as expected. However, when one node is down, the other node's nfs mount hangs and is unusable. ;( # gluster volume info hosted Volume Name: hosted Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: hosted01.fedoraproject.org:/glusterfs/hosted Brick2: hosted02.fedoraproject.org:/glusterfs/hosted Options Reconfigured: nfs.rpc-auth-allow: 127.0.0.1 nfs.disable: off performance.io-thread-count: 32 localhost:/hosted /srv nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,nolock,proto=tcp,port=38467,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=38465,mountproto=tcp,local_lock=all,addr=127.0.0.1 0 0 root 2678 0.1 0.1 43260 11488 ? Ssl 16:56 0:01 /usr/sbin/glusterd -p /var/run/glusterd.pid root 2730 0.1 0.1 60132 12176 ? Ssl 16:56 0:01 /usr/sbin/glusterfsd --xlator-option hosted-server.listen-port=24009 -s localhost --volfile-id hosted.hosted02.fedoraproject.org.glusterfs-hosted -p /etc/glusterd/vols/hosted/run/hosted02.fedoraproject.org-glusterfs-hosted.pid -S /tmp/b940ce67b4f7f4925e9d979a03f2d34a.socket --brick-name /glusterfs/hosted --brick-port 24009 -l /var/log/glusterfs/bricks/glusterfs-hosted.log root 2734 0.2 0.4 62088 29884 ? Ssl 16:56 0:02 /usr/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log I'll attach a kill -USR1 from the nfs server when the second node is down in case that shows anything. Happy to gather more info, try things, etc.
Created attachment 576265 [details] glusterdump file from nfs server when other node is down.
Created attachment 576288 [details] logs from first node
Created attachment 576289 [details] logs from second node
Additional info: Stopping all gluster* processes on one node works fine and results in the other node working as expected. Taking down the second node completely, results in the nfs hang we are seeing. (or while it's rebooting). If one node is powered off, the other node does eventually start working again. it's after about 16 minutes: [2012-04-09 21:08:59.239152] I [client-handshake.c:913:client_setvolume_cbk] 0-hosted-client-1: Connected to 66.135.62.187:24009, attached to remote volume '/glusterfs/hosted'. [2012-04-09 21:08:59.239183] I [client-handshake.c:779:client_post_handshake] 0-hosted-client-1: 1 fds open - Delaying child_up until they are re-opened [2012-04-09 21:08:59.316107] I [client-lk.c:617:decrement_reopen_fd_count] 0-hosted-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP <node was powered off here and no logs until:> [2012-04-09 21:26:43.834287] W [socket.c:204:__socket_rwv] 0-hosted-client-1: readv failed (No route to host) When one node is powered off, you can see on the other one: tcp 0 6204 192.168.122.2:1022 66.135.62.187:24009 ESTABLISHED When one node is powered off, ping from the other: PING hosted02.fedoraproject.org (66.135.62.187) 56(84) bytes of data. From hosted02.fedoraproject.org (66.135.62.187) icmp_seq=1 Destination Host Unreachable Some info about the crazy network setup: node1: 192.168.122.2 (libvirt private network guest, with all 66.135.62.201 traffic DNAT and SNATed into it) node2: 192.168.122.3 (libvirt private network guest with all 66.135.62.187 traffic DNAT and SNATed into it). I suspect the issue here is that something in the nat setup is not sending back what's expected, so things have to wait for the normal tcp timeout on the established connection before it gives up, but I would expect glusterd to be able to bypass that and call the connection down when it sees the host unreachable...
I see messages like these in the log files: [2012-04-08 17:31:53.849090] C [client-handshake.c:121:rpc_client_ping_timer_expired] 0-hosted-client-1: server 66.135.62.187:24009 has not responded in the last 42 seconds, disconnecting. which means it has detected that the peer is no longer active and bails the connection as expected. This will make sure that the nfs server no longer hangs and continues. We need to see why it is hanging for you more than 42 seconds. Can you tell us the time when it hangs so that we can check the corresponding log file messages? [2012-04-09 21:08:59.316107] I [client-lk.c:617:decrement_reopen_fd_count] 0-hosted-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP <node was powered off here and no logs until:> [2012-04-09 21:26:43.834287] W [socket.c:204:__socket_rwv] 0-hosted-client-1: readv failed (No route to host) client_ping_timer is only active if the connection is actively being used. Since, probably, this connection is inactive (no pending frames) client_ping_timer is not coming into the picture.
ok, here are some fresh logs. - Both nodes up and connected and working properly. - One node is halted at: Wed Apr 11 15:45:11 UTC 2012 - Other node logs: [2012-04-11 15:45:26.17611] W [socket.c:1494:__socket_proto_state_machine] 0-management: reading from socket failed. Error (Transport endpoint is not connected), peer (66.135.62.187:24007) [2012-04-11 15:45:26.18372] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (66.135.62.187:1023) [2012-04-11 15:45:48.307313] E [socket.c:1685:socket_connect_finish] 0-management: connection to 66.135.62.187:24007 failed (Connection timed out) - df and other access of nfs mount now hangs on that node. netstat shows: tcp 0 468 192.168.122.2:1022 66.135.62.187:24009 ESTABLISHED (connection still established to down node). - 42seconds later, nothing changes. no log addtions or access to nfs mount. - 15minutes and 38 seconds later there is a flurry of log messages: [2012-04-11 16:01:26.305] W [socket.c:204:__socket_rwv] 0-hosted-client-1: readv failed (No route to host) [2012-04-11 16:01:26.609] W [socket.c:1494:__socket_proto_state_machine] 0-hosted-client-1: reading from socket failed. Error (No route to host), peer (66.135.62.187:24009) [2012-04-11 16:01:26.44905] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:45:33.885394 [2012-04-11 16:01:26.45037] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.45619] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:46:10.89206 [2012-04-11 16:01:26.45705] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.45910] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:46:33.884480 [2012-04-11 16:01:26.45983] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.46181] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:48:33.884500 [2012-04-11 16:01:26.46490] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.46753] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:49:33.884493 [2012-04-11 16:01:26.46826] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.47023] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:51:33.884496 [2012-04-11 16:01:26.47095] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.47477] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:52:33.884513 [2012-04-11 16:01:26.47553] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.47757] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:54:33.884525 [2012-04-11 16:01:26.47830] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.48025] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:55:33.884508 [2012-04-11 16:01:26.48095] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.48502] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:57:33.884504 [2012-04-11 16:01:26.48578] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.48781] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 15:58:33.884497 [2012-04-11 16:01:26.48852] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.49045] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-04-11 16:00:01.948006 [2012-04-11 16:01:26.49141] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.49811] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(STATFS(14)) called at 2012-04-11 16:00:33.884507 [2012-04-11 16:01:26.49887] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.50085] E [rpc-clnt.c:341:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x86) [0xe6a6c6] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88) [0xe69d98] (-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0x27) [0xe69cf7]))) 0-hosted-client-1: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-04-11 16:01:01.949335 [2012-04-11 16:01:26.50157] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-hosted-client-1: remote operation failed: Transport endpoint is not connected [2012-04-11 16:01:26.50589] I [client.c:1883:client_rpc_notify] 0-hosted-client-1: disconnected [2012-04-11 16:01:40.530324] E [socket.c:1685:socket_connect_finish] 0-hosted-client-1: connection to 66.135.62.187:24009 failed (No route to host) After that the nfs mount starts working and operating as normal. - netstat no longer shows the ESTABLISHED entry. Happy to gather more debugging.
This seems almost like what would happen if (a) we were doing a non-blocking read on the socket and (b) we weren't getting the ICMP Host Unreachable for ~10 minutes. It's possible that (b) might be related to the funky VM setup, but it shouldn't matter because (a) shouldn't be the case anyway. AFAICT we do try to set NBIO by default, so if we're not or if it's failing then we should probably look into that. Another possibility would be to try the keepalive options, and see if they cause the connection to fail sooner.
How can we try the keepalive options? ;)
Seems to work okay for me with 3.2.7 and 3.3.1. Approx 42 seconds after knocking node two off-line (qemu/kvm, forced off) writes on the nfs mounted volume resumed.
Please reopen if this is still a problem.