解决logstash启动过慢的问题

2017-3-5 13:51:49

最近在搭elk时,发现logstash在服务器上要花费将近10分钟才能启动完成开始pipeline

而我用的机器是ucloud的2C4G,不太可能是服务器性能的瓶颈

查资料后发现和jruby的启动有关,于是找到了这个issue

https://github.com/elastic/logstash/issues/5507

提到了jruby wiki里的一段话

When JRuby boots up, the JDK libraries responsible for random number generation go to /dev/random for (at least) initial entropy. After this point, more recent versions of JRuby will use a PRNG for subsequent random numbers, but older versions will continue to return to /dev/random. Unfortunately /dev/random can “run out” of “good” random numbers, providing a guarantee that reads from it will not return until the entropy pool is restored. On some systems – especially virtualized – the entropy pool can be small enough that this slows down JRuby’s startup time or execution time significantly.

jruby启动的时候jdk回去从/dev/random中初始化随机数熵,新版本的jruby会用RPNG算法产生随后的随机数,但是旧版本的jruby会持续从/dev/random中获取数字。但是不幸的是,random发生器会跟不上生成速度,所以获取随机数的过程会被阻塞,直到随机数池拥有足够的熵然后恢复。这在某些系统上,尤其是虚拟化系统,熵数池可能会比较小从而会减慢jruby的启动速度。

检查了一下系统的熵数池cat /proc/sys/kernel/random/entropy_avail,发现只有65,正常情况这个数字推荐大于1000,对比了一下独立主机的这个数值,大约在700-900之间晃悠。

解决

最简单的解决方案是安装一个熵数发生器,比如Haveged,centos可从epel源中获取,安装后启动服务sudo systemctl start haveged就可以看到entropy_avail暴涨至2000多,logstash几乎是秒启,然后再sudo systemctl enable haveged设为开机自启。