Categories: ITLinux

Linux & App Servers Monitoring Tricks

LINUX & APPLICATION SERVERS MONITORING TRICKS


Objective:

  • Prevent server from going down,
  • detect what caused the server to go down,
  • get servers back after failure.
  • Explain what caused the server to fail

I – Regularly watch your monitoring  tool(s) (Nagios, Wily,top, …..)

On some of those tools,  you can see a graph of the apps CPU Load Average, CPU Used Percentage, Disk Usage percentage, memory used percentage, Network Bandwidth, Swap Used percentage.

II – Check ulimit count (Number of files opened by applications like tomcat, oracle,…)

[Server]# ulimit -n
1024
If the number of files opened is getting closer to ulimit count (1024), increase the ulimit and talk to dev to identify and fix the process that is causing that.
To increase ulimit count for a specific application account, run the command  ulimit –n [value]
Ulimit can be set to whatever you want. Its one of those things that’s put in place as a throttle to keep things from going too nuts. Some systems will actually just set it to unlimited.

III- Port Monitoring 
Check number of connections to ports used by your apps

IV- Thread dump (stack trace of all threads ) If you have a high cpu percentage

[Server]# kill -3 (The output is printed in catalina.out) to see what is causing this and send it to developers.

V- Disk space /[drive_name] filling up quickly
Identify the file(s) that are filling up the disks. Most of the time ,it will be logs files.
[Server]# du -ks /[drive_name]/* | sort -nr | head
5719076 /[drive_name]/catalina
3675672 /[drive_name]/data
3287436 /[drive_name]/source
2044316 /[drive_name]/servers
319404 /[drive_name]/images
16 /[drive_name]/lost+found
By running this command on the larger folder, that will lead you to the files that eat the disk space.
Back up, remove or empty the file in question given that it won’t break the system.
If the log files are responsible for the disk filling up, let the developer know about it so that they can solve it. In the meantime, empty the log file with the command:

[Log_File_Location]# echo -n > Large_Log_File_Name.log

VI- Watch catalina.out and log4j.out after staging and live deploy, especially when you are restarting the servers.

 

[Server]# tail -f log4j.log
VII- Start app servers properly
Before restarting app servers, make sure there is no app pid running for that specific server.

[Server_Name]$ ps -ef | grep oracle
Kill the pid for that server.

IX – Cpu Load level

I would say that if we peak under 70% CPU during high traffic, we are doing well and have room. A good level to be ticking over at would be 30% used.
[Server]# top
top – 12:37:29 up 47 days, 23:09, 4 users, load average: 0.20, 0.20, 0.22
Tasks: 189 total, 1 running, 178 sleeping, 10 stopped, 0 zombie
Cpu(s): 1.2%us, 0.1%sy, 0.0%ni, 97.5%id, 1.0%wa, 0.0%hi, 0.1%si, 0.0%st

X- Server specific status pings (To assure the server are up and serving contents)
Write scripts for this

XI- Garbage collection stats

If you are interested in any garbage collection stats there’s the gc.log files on each of the appservers (bad thing about it is it doesn’t do any date stamping so you can see how memory fluctuates but its a difficult to create a chart over time). In the past I’ve thought it might be good idea to write a cron that archived it daily so that you could at least break things down day by day.

XII- DB Connection

XIII- Load Average Monitoring script
Set up a  cron that just email sysadmin when the load average is above 3.

XIV – Find out who is monopolizing or eating the CPUs
[Server]# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10

Etienne Noumen

Sports Lover, Linux guru, Engineer, Entrepreneur & Family Man.

Recent Posts

A Daily Chronicle of AI Innovations in May 2024

AI Innovations in May 2024

18 hours ago

Tips for Ensuring Success Throughout Your Career

For most people, a satisfactory career is essential for leading a happy life. However, ensuring…

5 days ago

Different Career Paths in the Pipeline Industry

The pipeline industry is more than pipework and construction, and we explore those details in…

5 days ago

SQL Interview Questions and Answers

SQL Interview Questions and Answers In the world of data-driven decision-making, SQL (Structured Query Language)…

2 weeks ago

Things To Consider When Switching Internet Providers

Before you make the decision to switch your home’s interest service provider, take the time…

4 weeks ago

A Daily Chronicle of AI Innovations in April 2024

AI Innovations in April 2024. Welcome to the April 2024 edition of the Daily Chronicle,…

1 month ago