In larger environments, integrating ejabberd into your existing infrastructure by binding it against your existing AD server seems to be a great idea in first place. We are provided with a read-only domain controller (RODC) by our central IT and use it for all our authentication matters. However, we noticed that our ejabberd server – while working perfectly in first place – stopped to successfully authenticate our users although they provided valid credentials.
Restarting ejabberd resolved the issue for several hours until it started to throw error messages like:
Failed authentication for email@example.com
Checking the authentication sequence with tcpdump we eventually found out that ejabberd did not manage to talk to the AD server. It set up the connection when the service started, but after a few hours it timed out without ejabberd noticing. Since we could not convince the AD server to allow TCP connections to exist infinitely (no privileges on the respective RODC) we had to play around with the knobs on our end – the underlying debian system that hosts our ejabberd service.
The Linux IP stack is able to notice broken connections if there is traffic that uses them. A more reliable way of detecting broken ultra-low traffic connections, there is “TCP KEEPALIVE” support, but ejabberd does not make use of it. Fortunately there is a way of tricking software into using TCP keepalive without recompiling it: libkeepalive.
The rather small download uses the LD_PRELOAD environment variable – you can set it within your service startup script to apply keepalive for a single service / application – or you can use the configuration file /etc/ld.so.preload for changing userspace behaviour for the whole system.
For further informations about where to get libkeepalive and how to configure the TCP keepalive feature on linux, visit the following two links:
We achieved our best results with the following setup:
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5