It has been quite long since I have posted here. The main reason was the lots of work in the office and preparing for my next AWS Certification, which was AWS Solutions Architect Professional. I will write a short note on the exam details and what I did to clear it, later on.
What was happening ?
Coming back to this topic, I had setup Cloudwatch alerts for Memory Usage and Disk Usage and then alert the stakeholders when the usage breaches the threshold set. For me it was set to 90% for both the metrics. But upon observing closely, the alerts for memory usage were quite frequent, say 2 times per 5 minutes. This meant that something was definitely wrong and I needed to recheck the script deployed, fast.
Upon inspection, my previous script was calculating the memory from free -m command, using the first column:
# vmstat -s 8176604 total memory 5555796 used memory 4379188 active memory 745036 inactive memory 2620808 free memory 394516 buffer memory 3206024 swap cache 0 total swap 0 used swap 0 free swap 664123 non-nice user cpu ticks 45 nice user cpu ticks 138166 system cpu ticks 75768535 idle cpu ticks 39822 IO-wait cpu ticks 0 IRQ cpu ticks 1913 softirq cpu ticks 12845 stolen cpu ticks 5794205 pages paged in 10155760 pages paged out 0 pages swapped in 0 pages swapped out 57218586 interrupts 101894616 CPU context switches 1515624326 boot time 589690 forks
Till now, this is the best method I have come up with, to calculate the memory usage of the linux boxes. If you have any other method to calculate the memory usage, do let me know in the comments boxes.
The Memory and Disk Usage script is available on my Github account.