Tag Archives: Datadog

Datadog NetFlow Monitoring with FluentD

I recently worked on an interesting assignment where NetFlow monitoring was needed for network traffic analysis. Now, the Datadog platform has a rich feature set for analyzing and visualizing just about any type of data, and NetFlow is no different.

The only unusual aspect of the assignment was with collection; We’ll need to use the open source FluentD to collect NetFlows, so in this post I’ll share how FluentD can be used to collect flows for analysis on the Datadog platform.

Brief Overview

I’ve not had a chance to work with FluentD previously, though I’ve heard good things about it. What’s useful about the FluentD agent is that it is modular and extensible with plugins. Particularly, it has both a Netflow Collector source plugin and the Datadog output plugin, which we need. Fun fact, the NetFlow plugin is a certified FluentD plugin, while the Datadog plugin is directly maintained by Datadog developers!

With both of these plugins available, it becomes a matter of configuring FluentD as a NetFlow collector, which will then convert flows to JSON-formatted logs to be submitted to Datadog. It’s really as simple as the diagram below.

The big picture: How FluentD collects and send NetFlows to Datadog

Getting Started

We use a vanilla Ubuntu 18.04 LTS VM, and install the pre-compiled FluentD agent like so:

$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-bionic-td-agent4.sh | sh

This installs the latest FluentD agent, and automatically configures systemctl to start it as a service. In case you’re using some other Linux distro, pop over to the FluentD Installation page and follow the distro-specific instructions. I’ve tested a similar set up with RHEL 7, and that works flawlessly too.

Once done, run a quick systemctl check to ensure that the agent is up and running:

$ sudo systemctl status td-agent
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/lib/systemd/system/td-agent.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2021-04-11 10:20:00 UTC; 15s ago
     Docs: https://docs.treasuredata.com/articles/td-agent
  Process: 17416 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=0/SUCCESS)
...

That all looks hunky dory, but what on earth is all that td-agent business, you ask? It turns out that the company Treasure Data maintains most (all?) stable distribution packages of FluentD. Hence, the FluentD agent is called td-agent in these distributions. You can read more about it in the official FluentD FAQ, but suffice to say that td-agent is equivalent to FluentD for our purposes. Keep in mind though, this does mean that command names, configuration files, etc will carry the td-agent name instead.

Installing the NetFlow Collector Source Plugin

Here, we install the Netflow plugin for FluentD, which turns the FluentD agent in to a NetFlow Collector. Run the following command to install the plugin:

$ sudo /usr/sbin/td-agent-gem install fluent-plugin-netflow

Insert the following lines into /etc/td-agent.conf. I use UDP port 5140 to receive flows in my lab, though you can change this to other ports if you like. Remember to point the NetFlow Exporter to the right UDP port later.

<source>
  @type netflow
  tag datadog.netflow.event
  bind 0.0.0.0
  port 5140
  versions [5, 9]
</source>

If you notice, we’re tagging all flows with datadog.netflow.event. This tag will be used by FluentD to match and route the flows to the appropriate plugin for handling. The ‘next steps’, so to speak.

Installing the Datadog Output Plugin

Next, we install the Datadog plugin that will transport the flows as JSON logs to the Datadog cloud for processing and analytics.

$ sudo /usr/sbin/td-agent-gem install fluent-plugin-datadog

Add the following lines into /etc/td-agent.conf, BELOW the NetFlow plugin configuration from the last section, and insert your Datadog API key at the api_key parameter.

<match datadog.netflow.**>
  @type datadog
  api_key <API KEY>

  dd_source 'netflow'
  dd_tags ''

  <buffer>
    @type memory
    flush_thread_count 4
    flush_interval 3s
    chunk_limit_size 5m
    chunk_limit_records 500
  </buffer>
</match>

There are some interesting points about this configuration. Notice first that the match directive looks for datadog.netflow.**. The ** bit is a greedy wildcard, and causes the plugin to process any flows with a tag starting with datadog.netflow. That includes flows collected by the NefFlow plugin, which we earlier configured to apply a datadog.netflow.event tag.

Secondly, notice that dd_source 'netflow' is set, ensuring that flows have the tag source:netflow. It is Datadog’s best practice to use to the source tag as a condition to route logs (flow logs in this case) to the appropriate log pipeline after ingestion. The pipeline then processes, parses and transforms log attributes. Where possible, the pipeline also performs enrichment, for example adding country/city of origin based on a source IP address. This is actually a whole big topic on its own but suffice to say, it is important to configure the source tag appropriately.

Finally, there’s also the ddtag parameter which isn’t used in the example here. This allows the application of custom tags, which could be useful in a larger environments. For example, a large enterprise may have different NetFlow collectors for different zones (DMZ, Internet gateways, branches, data centers, just to name a few), different sites, different clouds etc. Being able to drill into a specific collection context or view using tags is handy to gain extra clarity during analysis.

Now that we’re done with configuring the FluentD agent, make sure to restart the service so the changes take effect.

$ sudo systemctl restart td-agent

Configuring pfSense as a NetFlow Exporter

Configuring a network device as a NetFlow Exporter differs depending on the device. In my case, I use a pfSense firewall as my NetFlow exporter. There are more detailed instructions on installing and enabling NetFlow for pfSense using the softflowd plugin, so we won’t go into the details here.

As a sample configuration however, here is how softflowd is set up on my pfSense firewall to export flows. In short, these settings configure the firewall to collect flows traversing the WAN interface, and then to send them as Netflow v5 flows to port 5140 of the FluentD NetFlow Collector.

Assuming everything works as advertised, logging into the Datadog Log Explorer screen shows all the NetFlow flow logs that are collected. Popping open any of the flow logs shows the various attributes that were parsed from the flow logs. Unmistakably, the really important ones like Source IP, Source Port, Destination IP, Destination Port, Protocol type, as well as Bytes and Packets transferred are all available.

Quick Peek at the NetFlow Dashboard

In the interest of keeping this post on topic around FluentD NetFlow collection, we’ll cover Datadog logs processing some other time. However, as a peek into the possibilities once we’ve got flows ingested, processed and analyzed, here’s a really groovy NetFlow monitoring dashboard that I created.

One global NetFlow Dashboard to rule the world

Neat, isn’t it?

Part 2: Setting up Log Analytics on Datadog for QNAP NAS

Before Parsing Custom Logs

On the Datadog platform, navigate to Logs -> Search to get to the Log Explorer. Select only the qnap-nas Service facet. If you remember from Part 1, we had this Service configured on the agent configuration file. It might be a good idea to choose a longer time frame to view logs; In this case, we looking at logs from the “The Past Hour”. To generate some log activity, I logged into my NAS to start an antivirus scan as well as run a rapid test on one of the hard disks.

Clicking into any of the log lines, it’s soon clear that while the logs exist, the data is not actually parsed for easy slicing and dicing, which will be immensely useful if we want to perform log analysis and filtering. It would be nice to extract inline data into easy to use attributes.

Setting Up Custom Log Parsing

Navigate to Logs -> Configuration. Obseve that there are already a number of existing log parsing pipelines which come out of the box (Wait, do we have boxes for a SaaS?). These are automatically turned on when an associated monitoring integration is enabled. Did I mention that Datadog has 400+ vendors-supported integrations already available, and chances are that whatever you want to integrate for monitoring/tracing/log analytics is already here? Consider it said 🙂 And now, lets add on a custom log pipeline, just cuz we can. Click on “Add a new pipeline”.

To add a new log pipeline, click on “Add a new pipeline”. Whew, how hard was that?

First, we need to filter out log lines that we want to send through our QNAP log pipeline for parsing. We’ll simply use service:qnap-nas as our filter criteria. If you recall, we configured the agent in Part 1 to tag this attribute to all logs that come in from the QNAP NAS. Give this pipeline an easily distinguishable name; “QNAP NAS”, for example.

I call the pipeline “QNAP NAS”, just because.

Once the pipeline exists to snag the right logs, we need to apply some actions to the logs in order to parse it. In this case, the actions are called “Processors”. Click on “Add Processor”.

A pop-up appears to help configure the “New Processor”. For Step 1, let’s leave it as the default “Grok Parser”, because we are going to use Grok to extract attributes of interest. In Step 2, return to the “Log Explorer” screen to copy out a few log samples which will be used to test our parsing rules. From observation, there appear to two types of QNAP NAS logs; An event log type and a connection log type. Notice that all of the log samples have a red “No Match” indicator next to them, meaning we can’t extract any useful attributes yet.

Copy and paste in some sample logs from the QNAP NAS so we can test the Grok parsing rules

Going down to Step 3, place in the parsing rules. I’ve provided the rules in text format after the next screenshot, so you can easily copy/paste them for your own use. Essentially, we have main/general parsing rules for both types of logs. For readability, these will in turn call modular “Helper Rules” that need to be added in “Advanced Settings”. These “Helper Rules” will work their magic on specific sub-strings depending on where they are placed by the main parsing rules.

Here are the main rules in text form; You can copy/paste this in into the Step 3 text box as in the screenshot above.

QNAP_Conn %{QNAP_initial} %{QNAP_conn_log}
QNAP_Event %{QNAP_initial} %{QNAP_event_log}

And here are the “Helper Rules”, which get called by the main rules. Pop open the “Advanced Settings” drop-down and copy/paste these in. If you’re curious, “QNAP_initial” will match and parse the beginning of every QNAP log, while “QNAP_conn_log” and “QNAP_event_log” will respectively match and parse connection or event logs, depending on what comes after the initial part of the log line.

QNAP_initial \<%{number:priority}\>%{date("MMM dd HH:mm:ss"):date}\s+%{ipOrHost:host}\s+%{word:process_name}\[%{number:process_id}\]\:

QNAP_conn_log conn\s+log\:\s+Users\:\s+%{word:user},\s+Source\s+IP\:\s+%{ip:source_ip},\s+Computer\s+name\:\s+%{data:computer_name},\s+Connection\s+type\:\s+%{word:connection_type},\s+Accessed\s+resources\:\s+%{data:accessed_resources},\s+Action\:\s+%{data:action}

QNAP_event_log event\s+log\:\s+Users\:\s+%{word:user},\s+Source\s+IP\:\s+%{ip:source_ip},\s+Computer\s+name\:\s+%{data:computer_name},\s+Content\:\s+%{data:msg}

Scrolling back up, notice that that all the sample log messages now show the “Match” in green.

To see how well parsing works, select any of the sample log lines, and scroll down past Step 3 to see what attributes have been successfully parsed.

These values are all extracted from the sample log line, and assigned to attributes. It’s kinda like a key-value pair.

It works! Let’s clean up by giving the processor a name and saving it into the log pipeline.

It’s a log parser, what else could you call it?

If all goes well, we should now see the “QNAP NAS Log Parser” log processor attached to the QNAP NAS log pipeline.

After: Slice and Dice Log Data like a Pro

So the net is now cast, let’s see what we can catch! Return to the Log Explorer and filter for service:qnap-nas. Click on any of the recent logs, and observe that we now have attributes which have been extracted from the raw log line by the QNAP NAS Log Parser. The next screenshot shows the data extracted from a user’s action of writing a file on the NAS using Windows File Share.

More attributes than you can shake a stick at! (please don’t shake sticks at stuff)

We want to set these attributes as facets in order to index, slice and dice the logs. Let’s start with the “action” attribute, since this is a useful log facet that tells us what action a user performed. Mouse over the left area of each attribute, then look out for a small settings icon (symbol of a gear). Click on it to pop open a menu.

Mouse over, and click… Side note here, Pomplamoose covers are awesome and you should check them out on Youtube.

Select “Create facet for @action” from the menu…

… Which will pop-up a confirmation dialog. No changes needed here, just click on the “Add” button. Repeat the steps to add facets with the “computer_name”, “connection_type”, “host”, “source_ip”, and “user” attributes.

Observe that once these attributes have been added as facets, they now appear on the facet selector/menu on the left. You can see a list of facet values for manipulating the log view now.

Options, options, and more options. Options are good.

For example, let’s use the “user” facet to select ONLY the “admin” user. This will show a list of all logs that are related to the “admin” user, and filter everything else out.

This “admin” guy looks suspicious, let’s see what he’s been up to.

Observe that the “user:admin” term is now added automatically to the search bar, and that the visible logs are only those which are caused by the “admin” user. In this case, it’s a list of files that is being accessed by the user.

Apparently “admin” enjoys a cover of a Jim Croce song. Great taste!

Having facets is also fantastic for running log analytics. Say for example, I wanted to understand what actions are frequently performed on the NAS by users. It’s easy to click on the “graph” icon on the “action” facet.

With that simple click, a visualization of all the actions performed on the NAS by the users within the selected timeframe is displayed. Here we can see that by far the most frequently performed operation by the NAS is the “Read” operation.

So the users seem to like reading files off the NAS. Color me surprised…

There’s a lot more that can be done now that we’re able to parse the QNAP NAS logs, like adding this to a dashboard of related applications or systems, or setting up monitoring and alerting against specific thresholds. It’s all up to your imagination!

Part 1: Setting up Log Analytics on Datadog for QNAP NAS

1 Reply

I recently had the opportunity to join Datadog, a modern monitoring-as-a-service solution provider with a focus on Cloud Native applications. On its own, Datadog has substantial integration for monitoring/tracing/log analytics for enterprise cloud and applications out of the box. Not to toot any horns here, but you can pop by Datadog HQ to sign up for a trial if you need an easy-to-use cloud-based monitoring platform that’s good to go live in 5 minutes.

To get up to speed on log analytics, I wanted to learn how to set up log analytics for custom log sources, which could be a home-grown application or any system which Datadog has not integrated log parsing for yet. Note that this is NOT how most folks would use it in production, since there are already tonnes of out-of-the-box and supported integrations for log parsing/analytics. This is more “corner-case” testing, and to let myself learn how to make custom log parsing work. Also, having had several faults with my QNAP NAS recently which went undiscovered for too long, I thought that would be the perfect target to try this on.

Bit of a disclaimer before going any further: All views here are mine and do not reflect in any way the official position of my employer. Yadda Yadda. Mistakes were very likely made, and are mine. Got it? Good, let’s move on. 🙂

Now, Datadog relies on using a single agent to collect all manner of information, be it metrics, application traces, or logs. This agent can also be configured as a remote syslog collector, used to forward syslogs sent at it to the Datadog cloud for analytics. The set up that I did looked something like the following diagram.

And, just so we already have the source of custom logs already set up, I configured my QNAP NAS to send all its logs, hopes, fears, anger, failures and frustrations to the Datadog Monitor VM, where my Datadog Agent is installed.

QNAP is set to tell the Datadog Agent about all its problems. Everyone needs a sympathetic ear, and a good doggo to cuddle away their problems. Yes, even a QNAP NAS.

Now that’s done, we’ll deploy the Datadog agent as a Docker container. Using Ubuntu 18.04 as the base OS, install Docker by following the instructions at the Docker Installation Page. For test setups, you can also also run the Docker setup script (not recommended for production) here. Also, remember to install Docker Compose by following the setup instructions here, I’ve got a docker-compose.yaml further down that you can get the agent going in seconds.

To start off, create the following directory structure in your home directory. Use touch and mkdir as you see fit.

~/datadog-monitor
-> docker-compose.yaml
-> datadog-agent
   -> conf.d
      -> qnap.d
         -> conf.yaml

Here are the contents of ~/datadog-monitor/datadog-agent/conf.d/qnap.d/conf.yaml. It configures the agent that it to listen on UDP port 15141, and for any ingested logs to be marked with qnap-nas service and qnap source. I’ve added a number of other tags for easy correlation in my environment later on, but they are optional for the purposes of what we’re trying to do here.

logs:
  type: udp
  port: 15141
  service: qnap-nas
  source: qnap
  tags:
    - cloud_provider:vsphere
    - availability_zone:sgp1
    - env:prod
    - vendor:qnap

Here are the contents of ~/datadog-monitor/docker-compose.yaml. It’s a nice easy way to have Docker Compose bring up the agent container for us and start listening for logs immediately. We’re really setting in the correct environment variables to allow the agent to call home, and also enable logging. You can see we have also allowed the agent to mount and access anything in the ~/datadog-monitor/datadog-agent/conf.d we made earlier.

version: '3.8'
services:
  dd-agent:
    image: 'datadog/agent:7'
    environment:
      - DD_API_KEY=<REDACTED - Refer to your own API Key>
      - DD_LOGS_ENABLED=true
      - DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
      - DD_LOGS_CONFIG_USE_HTTP=true
      - DD_LOGS_CONFIG_COMPRESSION_LEVEL=1
      - DD_AC_EXCLUDE="name:dd-agent"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
      - ./datadog-agent/conf.d:/conf.d:ro
    ports:
      - "15141:15141/udp"
    restart: 'always'

Let make sure we are in the ~/datadog-monitor directory, and run docker-compose.

$ sudo docker-compose up -d
Creating network "datadog-monitor_default" with the default driver
Creating datadog-monitor_dd-agent_1 … done

It’s probably a good idea to verify that the agent container started correctly, and that it is ready to forward logs from the QNAP NAS.

$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c9d0e3d05441 datadog/agent:7 "/init" 4 seconds ago Up 2 seconds (health: starting) 8125/udp, 8126/tcp, 0.0.0.0:15141->15141/udp datadog-monitor_dd-agent_1

$ sudo docker exec c9d0e3d05441 agent status
===============
Agent (v7.18.1)
Status date: 2020-06-10 13:58:06.211055 UTC
Agent start: 2020-06-10 13:57:44.291976 UTC
Pid: 348
Go Version: go1.12.9
Python Version: 3.8.1
Build arch: amd64
Check Runners: 4
Log Level: info
...
==========
Logs Agent
==========
Sending uncompressed logs in HTTPS to agent-http-intake.logs.datadoghq.com on port 0 
BytesSent: 26753
EncodedBytesSent: 26753
LogsProcessed: 73
LogsSent: 56
...
qnap
----
Type: udp
Port: 15141
Status: OK

Perfect, we’re looking good for now. We’ve got both the log source (QNAP NAS) and the log collector (Datadog agent) set up. In the next post, we will set up custom log parsing for the QNAP NAS.

Kacang is Nuts

Life is short, go nuts!