This week one of our services, written in Java and residing on a TomCat, Red Hat 6.3 server had an interesting problem. Suddenly server seemed to have CPU congestion. According to top output, Java processes were the reason of congestion. But it showed more %sy CPU utilization than %us. Java threads were actually using 10-15% and the rest was used by OS itself. nmon values for context switching was far more beyond the values of the service on second node. Process causing this was interestingly ksoftirqd. A little investigation in internet proved that this was because of the “leap second” issue.
As soon as issuing the commands below for workaround, CPU usage dropped drastically:
date `date +”%m%d%H%M%C%y.%S”`
However what triggered the problem after over a month is still a mystery. For more information about the matter:
Leap Second Detector for RHEL
Resolving Leap Second Issues in RHEL
Leap Second and Java in RHEL
Leap Second Issues for SUSE