reaction, in replacement of fail2ban

This article is also available in 🇫🇷 French.

Concerns🔗

This article is formulated to show that the program I made is useful ~~to society~~ to server administrators.

Problem🔗

A lot of people are writing programs (called bots) which scan the Internet and try to break in servers (the computers behind the Internet) they find. Most often, when they get control of a server, they will:

mine cryptocurrencies,
send mail spam,
use this server to attack others,
or steal the data on the server and blackmail its owners to give it back.

All those examples are problematic. The last one over others, because a server often contains sensitive data for its users.

As an example, at the Picasoft non-profit, we host an alternative messaging app, facing the web giants. If a bot accesses the server, it can retrieve all their conversations, which is true for any online service.

It's therefore important to protect your servers well.

Solution🔗

The honorable fail2ban is a software that permits to ban bots which blindly try to break a server's barriers.

Its principle is simple:

Various programs, or services, are executed on a server.
- The most common is the SSH service, which permits to remotely connect to the server on which it is running.
Each service writes logs, which are information about incoming events.
- For SSH, one of those events is "[such IP address] tried to connect with a bad password".
fail2ban scan those logs.
- After a few failed login attempts, fail2ban updates its server's firewall so that it blocks the IP address for a certain amount of time.

Problem of the solution🔗

Fail2ban does correctly its job. However, I have 2 concerns about it:

Slowness/consumption🔗

To treat the logs of various services on my server, fail2ban used to take approximately 1 hour of work (CPU) per week. It may look like it's few, but it's actually a lot, taking into account the little traffic on my server. fail2ban is just reading text and sending actions to the firewall!

As a comparison, it's as much as the Funkwhale service on my server, a Spotify alternative, which does much more: Funkwhale permits me to listen to my music, from my devices, and I use it around 24h a week. It's a heavier task.

Fail2ban also consumes a lot of RAM, and gets easily to 300 MB.

On a server with more traffic, it can quickly become a real problem. So I wanted a solution that consumes less.

Complexity of a very abstract configuration...🔗

Fail2ban has a very large default configuration.

There is a lot of preliminary engineering, which aims to make configuration as easy as possible for the server admin.

It comes with default rules for a lot of different firewalls and services, divided across 160 files, containing 2600 lines (8900 including comments and newlines).

Finally, this configuration makes getting familiar with it harder: iptables action's file is 45 lines long (170 including comments and newlines). It's a mixture of TOML and a second layer of poorly-documented substitutions, made by fail2ban, which permit to define variables, options etc.

As long as one can be satisfied by the default configuration, it's not a big deal. But as soon as one wants to build its own configuration for an unsupported service, or when the firewall is a bit different, it becomes way too complex.

...but not that flexible🔗

If we have a program that reads something as input, and executes commands as output, why not present things that simply?

I wanted a solution without default configuration, which would permit to admins to easily build their own, based on well documented examples. Then, only the abstraction needed would be used.

Let's say I want to execute an action when some (hidden or not) URL is activated on the server. Instead of creating a web app, with "webhooks" etc, only a few lines of conf should be necessary to execute an arbitrary action.

I wanted something also ready to do something else than ban bots.

Solution of the problem of the solution🔗

After this very long introduction, let me introduce reaction!

Speed🔗

I have no expertise in the Go language. It is most likely possible to improve reaction's performance, but its current consumption already fully satisfies me.

On my server, on which a lot more logs are analyzed that only the SSH service, reaction (and all the commands it launches) consumes approximately 5 min of CPU a week and 25 MB of RAM.

With equal task work, fail2ban used to consume 1 hour and 300 MB, namely 30 times more and 10 times more resources.

Configuration🔗

Starting here, it becomes technical. Read at your own risk.

Three configuration languages are available: JSON, YAML, JSONnet (❤️). I won't present the first two, but I do speak of the last at the end.

Let's first specify how reaction must recognize an IP.

patterns:
  ip:
    regex: '(?:(?:[0-9]{1,3}\.){3}[0-9]{1,3})'

As a replacement of fail2ban jails, reaction has streams, which define a data source (e.g. tail -f /var/log/nginx/access.log for nginx).

streams:
  ssh:
    cmd: ['journalctl', '-fu', 'sshd.service']

We attach one or more filters to those streams. They are groups of regular expressions. It's on a filter that we define the number of bad retries (retry) we grant on an IP before reacting.

streams:
  ssh:
    cmd: ['journalctl', '-fu', 'sshd.service']
    filters:
      fail:
        regex:
          - 'authentication failure;.*rhost=<ip>'
        retry: 3
        retryperiod: '3h'

We add one or more actions to a filter, which will be executed when the filter is triggered.

streams:
  ssh:
    cmd: ['journalctl', '-fu', 'sshd.service']
    filters:
      fail:
        regex:
          - 'authentication failure;.*rhost=<ip>'
        retry: 3
        retryperiod: '3h'
        actions:
          ban:
            cmd: ['iptables', '-w', '-A', 'reaction', '-s', '<ip>', '-j', 'DROP']

Actions can be executed right now, or can be delayed with after. This permits to ban an IP now, and to unban it later.

streams:
  ssh:
    cmd: ['journalctl', '-fu', 'sshd.service']
    filters:
      fail:
        regex:
          - 'authentication failure;.*rhost=<ip>'
        retry: 3
        retryperiod: '3h'
        actions:
          ban:
            cmd: ['iptables', '-w', '-A', 'reaction', '-s', '<ip>', '-j', 'DROP']
          unban:
            cmd: ['iptables', '-w', '-D', 'reaction', '-s', '<ip>', '-j', 'DROP']
            after: '24h'

Those iptables commands need the existence of the reaction chain in the firewall. On startup, we ask reaction to create it, add it to the INPUT chain, which controls incoming connections.

start:
  - [ 'iptables', '-w', '-N', 'reaction' ]
  - [ 'iptables', '-w', '-I', 'INPUT', '-p', 'all', '-j', 'reaction' ]

We also ask reaction to empty it and delete it when quitting:

stop:
  - [ 'iptables', '-w', '-D', 'INPUT', '-p', 'all', '-j', 'reaction' ]
  - [ 'iptables', '-w', '-F', 'reaction' ]
  - [ 'iptables', '-w', '-X', 'reaction' ]

Et voilà. With 26 lines of configuration, no hidden defaults, reaction will watch SSH connections and ban malicious connections for 24h, after 3 bad tries.

JSONnet 🔗

It is a simple and flexible language, with a syntax close to JavaScript and JSON. By default, it's just a more flexible JSON:

// We can put comments
{
  // No need to put quotes everywhere
  streams: {
    ssh: {
      // We even can add commas after the last element ↓
      cmd: [' journalctl', '-fu', 'sshd.service'],
    }
  }
}

To avoid repetitions, we write variables and functions.

local hour2second(i) = i * 60 * 60;
{
  seconds: [
    hour2second(1),
    hour2second(3),
    hour2second(5),
  ],
}

JSONnet works as a preprocessor. It generates a JSON-compatible data structure. This result will be handed to reaction.

{
  "seconds": [ 3600, 10800, 18000 ]
}

It resembles the Nix language, but with a more pleasant syntax.

Now that JSONnet is presented, let's use the previous example written in YAML. We can rewrite it in JSONnet, adding a second stream to protect another service.

We want to avoid repeating the iptables commands (once is enough 😆).

So we write a banFor() function, which takes as an argument the duration the IPs should be banned for, and returns a set of actions. We can then reuse it on each stream.

local banFor(time) = {
  ban: {
    cmd: ['iptables', '-w', '-A', 'reaction', '-s', '<ip>', '-j', 'DROP'],
  },
  unban: {
    after: time,
    cmd: ['iptables', '-w', '-D', 'reaction', '-s', '<ip>', '-j', 'DROP'],
  },
};
{
  patterns: {
    ip: {
      regex: @'(([0-9]{1,3}\.){3}[0-9]{1,3})|([0-9a-fA-F:]{2,90})',
    },
  },
  start: [
    [ 'iptables', '-w', '-N', 'reaction' ],
    [ 'iptables', '-w', '-I', 'INPUT', '-p', 'all', '-j', 'reaction' ],
  ],
  stop: [
    [ 'iptables', '-w', '-D', 'INPUT', '-p', 'all', '-j', 'reaction' ],
    [ 'iptables', '-w', '-F', 'reaction' ],
    [ 'iptables', '-w', '-X', 'reaction' ],
  ],
  streams: {
    ssh: {
      cmd: ['journalctl', '-f', '-u', 'sshd.service'],
      filters: {
        login: {
          regex: [ @'authentication failure;.*rhost=<ip>' ],
          retry: 3,
          retryperiod: '3h',
          actions: banFor('24h'),
        },
      },
    },
    nginx: {
      cmd: ['tail', '-f', '/var/log/nginx/access.log'],
      filters: {
        directus: {
          regex: [ @'^<ip> .* "POST /auth/login HTTP/..." 401', ],
          retry: 6,
          retryperiod: '4h',
          actions: banFor('4h'),
        },
      },
    },
  },
}

Et voilà ! We wrote the few abstractions needed. We can now define in 8 lines how to protect a new service.

Here, it's Directus, which I advice to construct tailored user interfaces and database very easily. Directus has been an inspiration source to build an expressive and flexible software.

This configuration file is working, but simplified a bit. A more complete example for SSH bans is available here.

Installation🔗

It is possible to compile it from source, or to download a release:

# curl -o /usr/local/bin/reaction https://static.ppom.me/reaction/releases/$VERSION/reaction
# chmod 755 /usr/local/bin/reaction

To launch reaction via a systemd unit:

/etc/systemd/system/reaction.service

[Install]
WantedBy=multi-user.target
[Service]
ExecStart=/usr/local/bin/reaction start -c /etc/reaction.yml
StateDirectory=reaction
RuntimeDirectory=reaction
WorkingDirectory=/var/lib/reaction

Write your configuration file: /etc/reaction.yml or /etc/reaction.jsonnet (adapt the systemd file, depending on the format used).

Then reload systemd so that it discovers the new unit, activate it at boot and start it now:

# systemctl daemon-reload
# systemctl enable reaction.service
# systemctl start reaction.service

Use🔗

Read the logs with journalctl -f -u reaction.service.

Read current ban state with reaction show

$ reaction show
ssh:
  login:
    4.3.2.1:
      actions:
        unban:
        - "2023-11-01 22:00:00"
    112.113.114.115:
      actions:
        unban:
        - "2023-10-29 00:16:16"

Remove a ban with reaction flush
Test your regexes with reaction test-regex
Look at the help for all those commands with reaction --help
Look at other configuration examples on the repo.
Browse the wiki, which is slowly growing.

Conclusion🔗

After 6 (part-part-time) months of work, reaction is mature enough to be v1. It's been 5 months since it replaces fail2ban on my infrastructure.

The v2 is already planned, and will permit multiple reaction running on different servers to work as a cluster, exchanging IPs to ban using peer-to-peer!

you have questions,
ideas,
you discover bugs,
or need guidance for setup,

then don't hesitate

to create an issue on the repo,
or to send a mail to {reaction *at* ppom *dot* me} 🙂