Complete blackout probably due to uncontroled SQL connexion timeout
Concerned version
Version: 2.0.4 Platform: RedHat 7.4 (oVirt) / Apache 2.4 / PostgreSQL 9.6
Summary
Due to an unstable network link, the LemonLDAP server entering into an infinite loop as it can't retrieve his entire configuration from PostgreSQL server.
Problem is aggravated by constant requests made from clients and from load-balancer keep alive.
Symptoms
The number of Apache threads increase to the maximum and consumes all the server ressources as it try constantly to load his configuration (and more at each HTTP request). The LemonLDAP server seems to trying to load his config and to validate it in an infinite loop...
Many httpd threads and TCP connexions to the PostgreSQL backend on the LemonLDAP-NG Manager. On the backend, SQL connexions are opened but never ends, not sending the results of the query.
Logs
Backend - Authentication
[Mon Aug 19 18:04:10.898280 2019] [:error] [pid 5363] DBD::Pg::st execute failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request. at /usr/share/perl5/vendor_perl/Lemonldap/NG/Common/Conf/RDBI.pm line 58.\n
[Mon Aug 19 18:04:11.225032 2019] [:error] [pid 5431] DBD::Pg::st execute failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request. at /usr/share/perl5/vendor_perl/Lemonldap/NG/Common/Conf/RDBI.pm line 58.\n
[Mon Aug 19 18:04:11.280359 2019] [:error] [pid 5394] DBD::Pg::st execute failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request. at /usr/share/perl5/vendor_perl/Lemonldap/NG/Common/Conf/RDBI.pm line 58.\n
[Mon Aug 19 18:04:11.406485 2019] [:error] [pid 5402] DBD::Pg::st execute failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request. at /usr/share/perl5/vendor_perl/Lemonldap/NG/Common/Conf/RDBI.pm line 58.\n
[Mon Aug 19 18:04:13.192939 2019] [:error] [pid 5347] DBD::Pg::st execute failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request. at /usr/share/perl5/vendor_perl/Lemonldap/NG/Common/Conf/RDBI.pm line 58.\n
[Mon Aug 19 18:04:39.426176 2019] [:error] [pid 5459] DBD::Pg::st execute failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request. at /usr/share/perl5/vendor_perl/Lemonldap/NG/Common/Conf/RDBI.pm line 58.\n
[Mon Aug 19 18:05:44.804817 2019] [:error] [pid 5491] DBD::Pg::st execute failed: server closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request. at /usr/share/perl5/vendor_perl/Lemonldap/NG/Common/Conf/RDBI.pm line 58.\n
.../...
[Mon Aug 19 18:06:42.962065 2019] [:error] [pid 7528] Attempt to reload Lemonldap/NG/Portal/SharedConf.pm aborted.\nCompilation failed in require at /var/lib/lemonldap-ng/portal/index.pl line 3.\nBEGIN failed--compilation aborted at /var/lib/lemonldap-ng/portal/index.pl line 3.\n
[Mon Aug 19 18:07:20.558154 2019] [:error] [pid 7528] Attempt to reload Lemonldap/NG/Portal/SharedConf.pm aborted.\nCompilation failed in require at /var/lib/lemonldap-ng/portal/index.pl line 3.\nBEGIN failed--compilation aborted at /var/lib/lemonldap-ng/portal/index.pl line 3.\n
[Mon Aug 19 18:07:37.973469 2019] [:error] [pid 7528] Attempt to reload Lemonldap/NG/Portal/SharedConf.pm aborted.\nCompilation failed in require at /var/lib/lemonldap-ng/portal/index.pl line 3.\nBEGIN failed--compilation aborted at /var/lib/lemonldap-ng/portal/index.pl line 3.\n
[Mon Aug 19 18:07:56.889691 2019] [:error] [pid 7528] Attempt to reload Lemonldap/NG/Portal/SharedConf.pm aborted.\nCompilation failed in require at /var/lib/lemonldap-ng/portal/index.pl line 3.\nBEGIN failed--compilation aborted at /var/lib/lemonldap-ng/portal/index.pl line 3.\n
[Mon Aug 19 18:09:49.753073 2019] [:error] [pid 7528] Attempt to reload Lemonldap/NG/Portal/SharedConf.pm aborted.\nCompilation failed in require at /var/lib/lemonldap-ng/portal/index.pl line 3.\nBEGIN failed--compilation aborted at /var/lib/lemonldap-ng/portal/index.pl line 3.\n
[Mon Aug 19 18:11:03.277762 2019] [:error] [pid 7528] Attempt to reload Lemonldap/NG/Portal/SharedConf.pm aborted.\nCompilation failed in require at /var/lib/lemonldap-ng/portal/index.pl line 3.\nBEGIN failed--compilation aborted at /var/lib/lemonldap-ng/portal/index.pl line 3.\n
Authentication Reverse Proxy (relies to the authentication Backend)
[Mon Aug 19 18:04:55.772109 2019] [proxy_http:error] [pid 1442] (70007)The timeout specified has expired: [client 10.11.12.13:23675] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:04:55.772257 2019] [proxy:error] [pid 1442] [client 10.11.12.13:23675] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:05:14.590318 2019] [proxy_http:error] [pid 1659] (70007)The timeout specified has expired: [client 10.11.12.13:48772] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:05:14.590441 2019] [proxy:error] [pid 1659] [client 10.11.12.13:48772] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:05:32.646083 2019] [proxy_http:error] [pid 1410] (70007)The timeout specified has expired: [client 10.11.12.13:62882] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:05:32.646265 2019] [proxy:error] [pid 1410] [client 10.11.12.13:62882] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:05:51.744966 2019] [proxy_http:error] [pid 5935] (70007)The timeout specified has expired: [client 10.11.12.13:21191] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:05:51.745084 2019] [proxy:error] [pid 5935] [client 10.11.12.13:21191] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:06:10.096646 2019] [proxy_http:error] [pid 5932] (70007)The timeout specified has expired: [client 10.11.12.13:23746] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:06:10.096768 2019] [proxy:error] [pid 5932] [client 10.11.12.13:23746] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:06:28.751767 2019] [proxy_http:error] [pid 1351] (70007)The timeout specified has expired: [client 10.11.12.13:20624] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:06:28.752037 2019] [proxy:error] [pid 1351] [client 10.11.12.13:20624] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:06:46.824418 2019] [proxy_http:error] [pid 1356] (70007)The timeout specified has expired: [client 10.11.12.13:56967] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:06:46.824728 2019] [proxy:error] [pid 1356] [client 10.11.12.13:56967] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:07:05.836016 2019] [proxy_http:error] [pid 5936] (70007)The timeout specified has expired: [client 10.11.12.13:2963] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:07:05.836191 2019] [proxy:error] [pid 5936] [client 10.11.12.13:2963] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:07:23.977137 2019] [proxy_http:error] [pid 1352] (70007)The timeout specified has expired: [client 10.11.12.13:62826] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:07:23.977410 2019] [proxy:error] [pid 1352] [client 10.11.12.13:62826] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:08:01.015300 2019] [proxy_http:error] [pid 1663] (70007)The timeout specified has expired: [client 10.11.12.13:13003] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:08:01.015584 2019] [proxy:error] [pid 1663] [client 10.11.12.13:13003] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:09:15.071084 2019] [proxy_http:error] [pid 1352] (70007)The timeout specified has expired: [client 10.11.12.13:17369] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:09:15.071319 2019] [proxy:error] [pid 1352] [client 10.11.12.13:17369] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:09:34.078636 2019] [proxy_http:error] [pid 5938] (70007)The timeout specified has expired: [client 10.11.12.13:34712] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:09:34.078918 2019] [proxy:error] [pid 5938] [client 10.11.12.13:34712] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:09:52.055251 2019] [proxy_http:error] [pid 5932] (70007)The timeout specified has expired: [client 10.11.12.13:37212] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:09:52.055539 2019] [proxy:error] [pid 5932] [client 10.11.12.13:37212] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:10:10.994187 2019] [proxy_http:error] [pid 1410] (70007)The timeout specified has expired: [client 10.11.12.13:37252] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:10:10.994299 2019] [proxy:error] [pid 1410] [client 10.11.12.13:37252] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:10:29.136379 2019] [proxy_http:error] [pid 5935] (70007)The timeout specified has expired: [client 10.11.12.13:18545] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:10:29.136677 2019] [proxy:error] [pid 5935] [client 10.11.12.13:18545] AH00898: Error reading from remote server returned by /
[Mon Aug 19 18:11:06.135030 2019] [proxy_http:error] [pid 1351] (70007)The timeout specified has expired: [client 10.11.12.13:14335] AH01102: error reading status line from remote server authentification.localdomain:80
[Mon Aug 19 18:11:06.135136 2019] [proxy:error] [pid 1351] [client 10.11.12.13:14335] AH00898: Error reading from remote server returned by /
Backends used
Configuration/sessions are stored into a PostgreSQL backend, on a different server than the manager/handler.
Possible fixes
Manage the timeout of the DBI Perl module and stop trying to load configuration if the PostgreSQL server does not return all requested values or breaks the connexion unexpectedly.
Means of testing
Changing MTU from 1500 to 9000 on the LemonLDAP-NG Manager server will brokes the SQL connexion unexpectedly when oversized TCP packets are sent/received from the SQL server.
ifconfig command can help to diagnose as it show many DROPPED packets on the lan card.
Issue solved by checking the network connexion and fix MTU config.