Heartmon

Manage application availability by monitoring heartbeat signals in the log stream.

Usage

Usage: ./heartmon -d heartmon_config_directory \
       [-i include_filter] [-e exclude_filter] \
       [-w warn_seconds] [-c crit_seconds] [-r restart_seconds]

<heartmon_config_directory>/
	app/
		0: <application_binary>
		1: <argv[1]>
		2: <argv[2]>
		<n>: <argv[n]>
	fifo/
		0: <path_to_fifo[0]>
		1: <path_to_fifo[1]>
		2: <path_to_fifo[2]>
		<n>: <path_to_fifo[n]>
	log/
		0: <logger_binary>
		1: <argv[1]>
		2: <argv[2]>
		<n>: <argv[n]>

Heartmon will always re-spawn an app process if it terminates. There is no support for "run once" behavior. Each spawning of the app will be logged with a LOG_NOTICE message inserted into the log stream.

Filters use simple substrings, applied to one line of the log stream at a time.

All filters are optional. If none are supplied, all lines will be counted as heartbeats.

All action thresholds are optional. If none are supplied, no heartbeat monitoring will be performed. The only action that might be taken is that the app will be re-spawned if it terminates.

Multiple re-spawns will be limited in the following way: After a re-spawn, there will be a grace period before any of the action timers start counting down (warn, crit, restart). The grace period will be equal to the shortest non-zero timer threshold.

If the "warn" threshold is exceeded, a LOG_WARNING message will be inserted into the log stream.

If the "crit" threshold is exceeded, a LOG_WARNING message will be inserted into the log stream.

If the "restart" threshold is exceeded, an LOG_ERR message will be inserted into the log stream, and the app process will be sent a KILL signal. After the process terminates, it will be re-spawned.

Heartmon Features

Run app and log-collector processes. Read the log stream from the app and/or fifo(s) and pipe it to the log collector
If log collector terminates or stops consuming the log stream, buffer log data and re-spawn the log collector process, inject a log record for the re-spawn event into the log stream
If app terminates, re-spawn the app process and inject a log record for the re-spawn event into the log stream
Watch the log stream coming from the app for a "heartbeat." Any line in the log stream that matches the configured filters is counted as a "heartbeat." The threshold values specify how much time can elapse between successive heartbeats
- What to look for is configurable using include and exclude filters
- The time thresholds for warning, critical, and restart actions are configurable
- If thresholds are exceeded, actions are taken, including:
  - Inject a log record into the log stream
  - Kill and re-spawn the app process

Application Best Practice

Applications that are single-threaded can provide a direct heartbeat into the log stream
Applications that have multiple threads should be careful not to have threads that are not represented by the heartbeat
- For example, if the main thread generates heartbeat log entries, but service requests are handed off to a pool of worker threads, there might be a situation in which the main thread is successfully sending heartbeats, yet the worker threads are all blocking and unable to handle requests
- Either have the heart-beating thread responsible for knowing the health of the other threads, or else have it send a “noop” request through the app. The request itself can produce the log entry used as the heartbeat
Apps should be stateless and "crash-only." Terminating the app should never produce data inconsistencies, leave stale locks, etc.
As a last-ditch effort, an external URL monitor can “ping” the service periodically, producing an identifiable heartbeat log entry. This is the least desirable implementation because failures of the monitoring system, network, etc. will cause restarts although there may be nothing wrong with the app

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
hmconf		hmconf
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
TODO		TODO
buffer.c		buffer.c
buffer.h		buffer.h
buffer_leak_test.c		buffer_leak_test.c
buffer_test.c		buffer_test.c
fifos.c		fifos.c
fifos.h		fifos.h
heartmon-concept-v03.png		heartmon-concept-v03.png
heartmon.c		heartmon.c
hm-test-app-logversion.sh		hm-test-app-logversion.sh
hm-test-app.sh		hm-test-app.sh
hm-test-logger.sh		hm-test-logger.sh
io_select.c		io_select.c
io_select.h		io_select.h
license.txt		license.txt
notes.txt		notes.txt
spawn_process.c		spawn_process.c
spawn_process.h		spawn_process.h

License

kfeldmann/heartmon

Folders and files

Latest commit

History

Repository files navigation

Heartmon

Usage

Heartmon Features

Application Best Practice

About

Resources

License

Stars

Watchers

Forks

Languages