We are finishing our web application and are planning for deployment. A very important aspect of production deployment is monitoring the health of the system. Having a small team of developer / support makes us very important to get initial notifications of potential problems and solve them before impacting users.
A good choice, but generally want to take more feedback on the best monitoring tools / practices for the web app for the Django app? Also, the CPU, memory, disk space, database connectivity should be monitored separately, the recommendations about this will also be welcome.
Our web app is written in Dijono, we are running Linux under Ubuntu (Ubuntu), Fast CGI with PostGrace SQL Database
EDIT We have a fully virtualized environment under a linode.
Edit We are using the Django-logging, so we have a different kind of information, errors, important problems, etc. Nagios is good, it is best to run the system test (selenium) regularly on a regular basis.
Edit: Looks even more interesting.
Perhaps there is a test suite system that can keep a pressure test for you everything. I do not remember the name from the top of my head, maybe someone could mention one below.
Other things that I want to do:
The best ideal for infrastructure is always right, locates, repairs, receives its root, And if you can, treat / prevent it.
After a system exists at many levels, we should test at several levels:
Edit: All errors or warnings directly posted to your case manager via email In this way you can track the events in one place.
1) Connection : Monitor your Internet connectivity with and from the server. Log in to somewhere
2) Server : Monitor all the processes that you need to make sure they are running and not pinning the server Use something equal to the HP server or hardware failure notification, which it can do to a bio-level. Notify and log in if they are.
3) Software : Identify the key software that should always run. Set the display level if any and then monitor them. Nagios should be able to help with this, on windows it can be a bit more. When there is an exception, you should be able to run the script yourself to restart automatically. My dream system allows me to interact with the server via SMS, if the server sees it as an exception which I either have to give permission, or will not automatically be there until I cancel by SMS Do not. One day ..
4) Remote power : Ensure that the remote power reset capabilities are in your hands. If you ever use windows for anything, you might want to schedule a weekly reboot.
5) Business Logic Testing : Regularly running the script to test your system's workflow, Selenium may possibly obtain some of these, Along with this, it is also to say that it was going on this time and there were errors in these files. If possible, monitor the system yourself through your script.
6) Backup : Create a backup that you can set up and forget If you can find things in virtual machines, then it will be ideal because you can not access any part of your infrastructure Can scale, move, or deploy. I have such examples where I had transferred a dead server to my laptop, when I fixed a problem, then run it in vmware.
Comments
Post a Comment