The website for the moyn monitoring system
Find a file
2024-10-23 14:45:27 +02:00
index.html initial, pre-wrap-all 2024-10-23 10:20:19 +02:00
index.md initial, pre-wrap-all 2024-10-23 10:20:19 +02:00
Makefile Makefile: better syncing and more structure 2024-10-23 10:31:54 +02:00
README.md README.md: some polishing 2024-10-23 14:45:27 +02:00

Moyn — a Monitoring System

Moyn is — to be — a monitoring system that is intended to be flexible (if you are a Python programmer, at least) and rather simple. It is intended to fill a gap below those big established systems like Nagios and its descendants or Prometheus, for example, a gap that has not been filled to my satisfaction by Monit. I have been using Monit for some time, but while it has its strengths, it is too simple in my eyes and does not offer the kind of flexibility I would like to have.

Moyn is meant to be just for myself, to scratch an itch of my own. If anyone else finds it useful (once it has reached some usable state at all), fine, but I don't intend any widespread adoption. Of course, suggestions by others are welcome. I just don't make any promises to heed them.

Ideas

These are the ideas I have in mind:

  • Moyn shall be a monitoring system in the sense that it runs continuously (or at least regularly) and checks the status of the items to be monitored. The result of these checks can be viewed. Some events (like status changes) can trigger arbitrary, configurable actions like, e.g., alerts.

  • At least the last known status of the monitored items should be kept persistently. This could be done in files on disk, or in some database system.

  • Implement it in Python (3). Working with C has fed me well for a number of years, but has increasingly felt tedious and cumbersome compared to higher-level languages. Perl was much better in this respect, but in the end I found it difficult to implement clean-looking programs in Perl. Also, Perl's OO is a mess. I love working with Lisp, but in contrast to e.g. C and Perl and Python it isn't as readily available on the usual systems, and access to POSIX systems' OS services isn't as direct as I'd wish. Python is easily available on all systems I care for, has clean and generally well-designed concepts and looks, and suffers little from historical warts and baggage. It has excellent access to OS services. Python brought fun back to programming for me. [x]

  • Keep the Nagios plugin interface for external plugins, i.e. programs to be called that perform checks. This is probably the most important idea at all, bringing a host of immediately available and proven plugins to the system at once, like the excellent collection at https://www.monitoring-plugins.org/. [x]

  • Have a Python plugin interface for internal plugins, i.e. Python code that is loaded into the moyn process. This enables loading plugins written in Python on startup (or on reload) so they can be run without the need to execute a plugin program in a separate process. To have a uniform interface to all plugins, have a wrapper for external plugins that uses the internal plugin interface.

  • Configure external plugins either in a file with a simple line per plugin, or a directory that contains shell script wrappers of the form enabled-plugins.d/100_check-things or the like. Either way, the check parameters must be available for display.

  • Keep Nagios's idea of having OK, WARN, CRITICAL, and STALE/UNKNOWN states. This has proven useful. Monit's model of just OK and Failed is too simple.

  • Have a web interface. In the simplest case this may be (and quite likely will indeed be in the first implementation), a text file generated by moyn and served by some web server. This can be extended and refined over time.

  • Eventually, when the web interface has somewhat matured, make the plugin/check configuration (including the command-line parameters of external plugins) available on the web interface. This is something I have always missed with Nagios. Monit is a bit more accessible, but not enough in my opinion.

  • Make moyn distributed. This could mean running local moyn agents on satellite hosts presenting results that can be collected by a central moyn process, in a way similar to the check_mk agent. Unlike NRPE, which runs single checks on remote hosts, this would run checks locally and present a cumulated result to the central process. This result could be as simple as a number of lines of check output.

  • Send out alerts on status changes. This can be done via email, SMS, some other messaging service, whatever. The advantage of sending SMS using a locally connected SMS modem would be to avoid a dependency on the things being monitored, like the Internet access. This should be configurable, like "send SMS alert only when other alert mechanisms fail".

Details

Some smaller detail thoughts:

  • Cache the path name of external plugins. Find them via PATH on startup or reload (if a full path is not supplied), but store the full path for faster program execution. [x]

  • Plugin execution must be done concurrently. check_ping, for instance, runs for over 4 seconds by default, and we don't want to suspend moyn for so long for a single check. [x]

  • The monitoring-plugins seem to have a certain variety in their output (which I always thought they hadn't), e.g.

    • check_dns: DNS OK: 0.028 seconds response time. ...;
    • check_mailq: OK: postfix mailq reports ...;
    • check_ping: PING OK - Packet loss = 0%, ....

    In these three alone there is (a) a keyword before the status or (b) not; the status is followed (a) by a colon or (b) by space-dash-space. Gnarlphft. Solution: don't look at it, just present it. Only thing that matters is the status for now, later probably also the performance data, if any.

  • Concurrency via the threading module, result passing via queue (a reference to some result object). A plugin execution could inherit from the Thread object, or be the thing called by the Thread's target argument. The Thread class looks indeed quite interesting and usable. [x]

  • Checks will be executed in concurrent threads as a rule. The number of these threads must be limited (to what -- 5? 100?) to avoid system overload. Let's say all checks of a time slot will be queued, and a thread window of size nthreads glides over this queue. [x] (easier than expected)

  • Templates for things like external services on a host that cover the usual things (ssh, http, https, ...). Very generic as to be usable for many things. Do I want to use Jinja2 templates? Maybe too clumsy. Am thinking more of something like C preprocessor macros, perhaps. Or YAML, as in the VRF-MLG checks generator? Maybe just Python code after all? All those choices. I will likely need some long and hard thinking about this.

  • The scheduler could be like the event loop module I implemented in C a number of times, only with a precision limited to seconds, and with the next repeat event on a multiple of the repetition interval. This may lead to skipping time slots for a check if the scheduler fails to schedule an event in its time slot at all, but should avoid schedule creep.

  • A check result will be enqueued in the results queue for examination. This is likely best done sequentially. [x]

  • There shall be dependencies. Does that mean I have to run a dependent check only after the ones it depends on succeed? That will be a mess. Otherwise, alert storm after a link outage.

The Name

The name "moyn" does not stand for a lot. The first part, "mo", obviously stands for monitoring. The "y" is part of the word "Python", but also the first letter of how I have tried to describe the pronunciation of my given name to English speakers, "yurghin". Also, instead of the letter "ü", as in my given name, the Danish language uses the letter "y" for the same sound, so a Danish speaker saying "Jyrgen" would pronounce my name right. (Actually the Danish form of the name is "Jørgen" and procounced slightly differently, but never mind.) Finally, the "n" is the first letter of my surname.

Most importantly, the name has, as far as I can see, not yet been used for anything which is even remotely in this area.

This software is not in any way related to the law and history professor Samuel Moyn at Yale University, or to the Moyn Moyn Festival in nothern Germany, or to the domains moyn.com and moyn.de, both of which do not appear to be in serious use right now.

The web page https://moyn.org/ is the home page for moyn. All relevant public content will be referenced there.

Current Status

At this moment there is little but a proof of concept for a few of the ideas. It can already run checks (once) in concurrent threads, and a limited number of those at a time, and collect the results from a queue. See also the ticked boxes [x] above.

To be honest the project has been dormant for a few years right now, and I don't know when or if I will pick it up again. Once I do that and have some results to show, I intend to publish moyn under a BSD license and to open the git repository for read access at some point, but that is likely far in the future.

[Jürgen Nickelsen ni@w21.org 2024-10-23]