For a while now, I have been running a Graylog setup on my VPS to collect any type of data I could find, just to get some hands-on with Threat Detection, correlation of data from multiple API’s and most of all, to finally win the Battle For RAM from my old foe ElasticSearch.

I am also a Presearch Node-runner, and what better data to process than the glorious performance data and PRE’s Earned of all of my nodes?

Do you support decentralization and an open internet that isn’t dominated by a handful of Big Tech companies? Now you can be part of the solution by operating a Presearch Node and helping to power the Presearch decentralized search engine. Presearch Nodes are used to process user search requests, and node operators earn Presearch PRE tokens for joining and supporting the network.

Source: presearch.io

Dashboards are cool.

You learn what time your nodes are switching gateways, to what extend node latency or stake influences earnings and especially what nodes are just too darn expensive for their overall performance (looking at you, Azure).

It has not yet made me a millionaire but I’ve learned lots of interesting stuff already, stuff I am absolutely not going to share with anyone because Do Your Own Presearch.

Pro-tip: anyone telling you how to make money on the Internet is full of it.

So no dashboards, no magic queries.

Also, no info on how to get your VPS, Docker engine and Graylog/ElasticSearch stack running. Or how to keep the latter from eating all your RAM again and again and again.

Instead, I describe how I am sending the log-messages and API statistics of my Presearch nodes to my Graylog stack, saving you hours and hours of time (p)researching.

Time, better spent on fun stuff like NetFlix and icecream.

Do Your Own Prerequisites

First, you will need:

  1. A Graylog instance that is accessible from your nodes, preferably behind a proxy such as Traefik.
  2. A working Presearch Dashboard, Node Registration Code and Node API key.
  3. Some staked nodes.

Oh, and nothing in your setup will be exactly like mine. You might get an error here and there. Google Presearch it. Switch it off and on again. Get some fresh air. Blame ElasticSearch.

Tell yourself: Zaphod made it work, and he’s a two-headed idiot.

Docker Logs & API Metrics

For each node, I collect the Presearch node logs from Docker and the performance metrics from the Node-API.

Log collection is done through a central GELF input on the Graylog server, so only one input is needed.

For the performance data off the Node API, we need a separate JSON API Input (and extractor) for each node.

First though, some administration.

A number of variables are used in the commands below; depending on your platform you can set them as environment variables or simply note them down in notepad to paste them when needed.

VariableDescription
$NODE-NAMEThe name you give to your node. To correlate logs with metrics, this should be the same as the node’s host name.
$NODE-REGISTRATION-CODEThe registration code you can find in your Presearch Node Dashboard.
$API-KEYThe API key found in the Presearch Node Dashboard.
$URL-ENCODED-PUBLIC-KEYThe URL-encoded Public Key of the node; see below.
$GRAYLOG-SERVER-IPThe Public IP-address of your central Graylog server.

Use your Presearch Node Dashboard (and below tool) to get the data you need. The Node Registration Code and Node API Key from your main dashboard, and the Public Key behind the “Stats” button next to your node.

URL encoded Node Public Key

To fetch data from the Presearch Node API for just one particular node, you need to include a URL-encoded version of your Node Public Key.

To get this from your browser:

  1. Open CyberChef Toolbox with the URL-Encode recipe: https://cyberchef.40two.app/#recipe=URL_Encode(true)
  2. Make sure to check the option to encode all special characters
  3. Copy the Public Key for your node (found under “Stats” for that node in the Presearch Node Dashboard)
  4. Paste your node’s Public Key in the Input box
  5. The URL encoded key can now be copied from the Ouput box

This Output is what you will need as the $URL-ENCODED-PUBLIC-KEY variable.

You can test if you converted the key Public Key properly by visiting:

https://nodes.presearch.org/api/nodes/status/$API-KEY?stats=true&public_keys=$URL-ENCODED-PUBLIC-KEY

replacing $API-KEY and $URL-ENCODED-PUBLIC-KEY with the appropriate values, and you should receive a JSON response for just your one node.

Graylog server preparation

The Graylog server needs some preparation to properly parse node data. This is a one-time thing, additional nodes can skip this step.

First we need to define a special Grok pattern to be able to parse some of the values returned by the Presearch node API, and then we set-up a GELF input for all the nodes to send their log messages.

Setup Grok pattern for Scientific notification

The Node API returns some values in Scientific Notification (SCI), which Grok does not extract by default. We need to add a new Grok Pattern:

graylog menu > System / Input > Grok Patterns > Create pattern

FieldValue
NameNUMBER_SCI
Pattern[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?

This pattern is later referred to in the Grok extractor for each node.

Setup GELF input on the Graylog server

Node logs are easy to capture since Docker natively supports sending container logs to a GELF (Graylog Extended Log Format) receiver anywhere on the internet. Thus, we will setup an UDP Gelf input on our Graylog instance.

Disclaimer: Smart people secure this with TLS but since this is a lot of work and worst case someone RickRolls my dashboard, I am using an insecure UDP input on my instance. Your mileage may vary. Not Technical Advice.

Using a single Graylog input for all nodes means the message source (the host-name of the Docker host) will be how we know which node sent the update.

And since we want to later correlate the node’s logs and metrics, it is best to set the $node-name variable for the API input to the node’s host name, so it will be reported as source for that input.

To set-up the Gelf input, on your Graylog console:

graylog menu > System / Input > Inputs > Select Input (drop-down) > GELF UDP

FieldValue
Input typeGELP UDP
Bind address0.0.0.0
Port12201
Receive Buffer Size262144

With the Gelf UDP running, the server is ready to receive node data.

You can test the Gelf input from the Graylog host itself (assuming Graylog runs on Docker) by sending ‘hello world’ to your Graylog stream:

docker run --log-driver=gelf --log-opt gelf-address=tcp://localhost:12201 alpine echo hello world

For each node:

For the first node, but also every node after, you need to first restart your presearch-node container to connect its logs and then create an input to talk to the Node-API.

Thus, the below steps are to be followed for each individual node.

Start node with GELF logging over UDP

If the Presearch node is running, first stop and remove it. The keys will be left untouched so it will return with the same identity.

sudo docker stop presearch-node && sudo docker rm presearch-node

Then, spin-up a new Presearch node with logging directed at the Graylog server:

sudo docker run -dt --name presearch-node --restart=unless-stopped -v presearch-node-storage:/app/node -e REGISTRATION_CODE="$NODE-REGISTRATION-CODE" --log-driver gelf --log-opt gelf-address=udp://$GRAYLOG-SERVER-IP:12201 presearch/node

If all works as intended, the log messages for the new node should arrive in your Graylog event stream.

Take note of the SOURCE listed, as this should be used as $NODE-NAME moving forward to collate the logs with the appropriate node statistics.

Add a Node-API input for the node

On your Graylog console:

Graylog server > System / Inputs > Inputs > Select input (drop-down) > JSON path from HTTP API

Then use the values below, replacing the $NODE-NAME and $URL-ENCODED-PUBLIC-KEY variables with the ones generated for this specific node:

FieldValue
Input typeJSON path from HTTP API
Title$NODE-NAME
URL of JSON resourcehttps://nodes.presearch.org/api/nodes/status/$APIKEY/?stats=true&public_keys=$URL-ENCODED-PUBLIC-KEY
Interval5
Interval time unitMinutes
JSON path of data to extract$.nodes.*.period
Message source$NODE-NAME

Save and start the input. It should immediately start receiving messages. Once the first message is in, an extractor can be built to get the metrics needed.

Add extractor from input

Graylog server > System / Inputs > Inputs > Manage extractors (next to newly created input)

In the Extractor screen for the input, under Add extractor, click Get started and then Load Message. If loading the message fails, the input is not running properly.

If you do see a message, click Select extractor type next to the result field, and pick Grok pattern.

FieldValue
Extractor typeGrok pattern
Source fieldresult
Grok patternuptime_percentage=%{BASE10NUM:uptime_pct;float}, avg_uptime_score=%{BASE10NUM:uptime_score;float}, avg_latency_ms=%{BASE10NUM:latency_ms;float}, avg_latency_score=%{BASE10NUM:latency_score;float}, total_requests=%{BASE10NUM:total_requests;float}, avg_success_rate=%{BASE10NUM:avg_success_rate;float}, avg_success_rate_score=%{BASE10NUM:avg_success_rate_score;float}, avg_reliability_score=%{BASE10NUM:avg_reliability_score;double}, avg_staked_capacity_percent=%{NUMBER_SCI:avg_staked_capacity_pct;float}, avg_utilization_percent=%{NUMBER_SCI:avg_utilization_pct;float}, total_pre_earned=%{BASE10NUM:total_pre_earned;float}
ConditionAlways try to extract
Extraction strategyCopy
Extractor title$NODE-NAME

If this is a node that has been up for more than a few hours, clicking test will parse all the needed fields.

If testing fails, this is because the node is new or not functioning properly, and one of the values is presented as NULL which confuses Grok.

Save the extractor.

Log messages should now be coming in from the API; clicking an entry expands it to see all the variables extracted for your dashboarding pleasure.

Rounding up

Watch the main Graylog stream to see the data come in from the nodes.

If all works, you can start building your dashboards…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top