Shell script to kill all processes used by a specific application or user
On Linux servers, when applications are not correctly shut down, there are still some processes running and starting the application with those processes still running can result in the application not starting correctly or not functioning properly. It is imperative to always kill all lingering processes after shutting down an application.
If the list of the lingering processes is long, it can be a pain to go through them one by one.
This shell script will kill them all in one command.
Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6 Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)
I – Regularly watch your monitoring tool(s) (Nagios, Wily,top, …..)
On some of those tools, you can see a graph of the apps CPU Load Average, CPU Used Percentage, Disk Usage percentage, memory used percentage, Network Bandwidth, Swap Used percentage.
II – Check ulimit count (Number of files opened by applications like tomcat, oracle,…)
Get 20% off Google Google Workspace (Google Meet) Standard Plan with the following codes: 96DRHDRA9J7GTN6 Get 20% off Google Workspace (Google Meet) Business Plan (AMERICAS): M9HNXHX3WC9H7YE (Email us for more codes)
[Server]# ulimit -n 1024 If the number of files opened is getting closer to ulimit count (1024), increase the ulimit and talk to dev to identify and fix the process that is causing that. To increase ulimit count for a specific application account, run the command ulimit –n [value] Ulimit can be set to whatever you want. Its one of those things that’s put in place as a throttle to keep things from going too nuts. Some systems will actually just set it to unlimited.
III- Port Monitoring Check number of connections to ports used by your apps
IV- Thread dump (stack trace of all threads ) If you have a high cpu percentage
[Server]# kill -3 (The output is printed in catalina.out) to see what is causing this and send it to developers.
V- Disk space /[drive_name] filling up quickly Identify the file(s) that are filling up the disks. Most of the time ,it will be logs files. [Server]# du -ks /[drive_name]/* | sort -nr | head 5719076 /[drive_name]/catalina 3675672 /[drive_name]/data 3287436 /[drive_name]/source 2044316 /[drive_name]/servers 319404 /[drive_name]/images 16 /[drive_name]/lost+found By running this command on the larger folder, that will lead you to the files that eat the disk space. Back up, remove or empty the file in question given that it won’t break the system. If the log files are responsible for the disk filling up, let the developer know about it so that they can solve it. In the meantime, empty the log file with the command:
[Server]# tail -f log4j.log VII- Start app servers properly Before restarting app servers, make sure there is no app pid running for that specific server.
[Server_Name]$ ps -ef | grep oracle Kill the pid for that server.
I would say that if we peak under 70% CPU during high traffic, we are doing well and have room. A good level to be ticking over at would be 30% used. [Server]# top top – 12:37:29 up 47 days, 23:09, 4 users, load average: 0.20, 0.20, 0.22 Tasks: 189 total, 1 running, 178 sleeping, 10 stopped, 0 zombie Cpu(s): 1.2%us, 0.1%sy, 0.0%ni, 97.5%id, 1.0%wa, 0.0%hi, 0.1%si, 0.0%st
X- Server specific status pings (To assure the server are up and serving contents) Write scripts for this
XI- Garbage collection stats
If you are interested in any garbage collection stats there’s the gc.log files on each of the appservers (bad thing about it is it doesn’t do any date stamping so you can see how memory fluctuates but its a difficult to create a chart over time). In the past I’ve thought it might be good idea to write a cron that archived it daily so that you could at least break things down day by day.
XII- DB Connection
XIII- Load Average Monitoring script Set up a cron that just email sysadmin when the load average is above 3.
XIV – Find out who is monopolizing or eating the CPUs [Server]# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
Today I Learned (TIL) You learn something new every day; what did you learn today? Submit interesting and specific facts about something that you just found out here.
Reddit Science This community is a place to share and discuss new scientific research. Read about the latest advances in astronomy, biology, medicine, physics, social science, and more. Find and submit new publications and popular science coverage of current research.