Grafana - The Class Monitor

Back in school days, we used to have a class monitor, who used to observe our behaviour and report if we did any mischief. Grafana is like that class monitor who monitors everything & visualizes things for us. Now Grafana has a friend, Prometheus - the guy who scrapes -> stores -> sends data to Grafana.

Node exporter

We spoke about Grafana's friend - Prometheus, who has all the data. Prometheus relies on data collectors like node exporters, to get ec2-instance metrics like its CPU, Memory, and disk space used.

Download node exporter -> Run it as a systemd service

  • To check if the metrics are properly published or not -
curl -v 'http://localhost:9100/metrics'

-v -> verbose, gives status of your curl request.

9100 -> Node exporter default port

Prometheus

Now that metrics are being published on our server, we need to ask Prometheus to scrape these metrics for us. In prometheus.yml we will mention our instance name - taking it on a name-based tag.

  • Why a name-based tag?

    If we have 10 application servers, & their names are app-1, app-2, app-3, ...

    then we've to write all our 10 server's IPs, for Prometheus to scrape metrics from them. Instead, we'll use a Name tag to get our application servers, which have similar names.

- job_name: 'app-servers'
    metrics_path: '/metrics'
    ec2_sd_configs:
      - region: <region>
        role_arn: <prometheus-server-role-name>
        port: 9100
    relabel_configs:
      - source_labels: [__meta_ec2_private_ip]
        target_label: instance
      - source_labels: [__meta_ec2_tag_Name]
        regex: app.*
        action: keep

role_arn - Permission to get our application servers by their name tag.

regex - Server name whose metrics are to be scraped.

Restart the Prometheus service, after adding the above code to prometheus.yml -

systemctl restart prometheus.service

We discussed before, that Prometheus will be the one going to scrape metrics on application servers, and for that it needs to be allowed on application-servers security group -

aws ec2 authorize-security-group-ingress \
        --group-id <app-server-sg> \
        --protocol tcp \
        --port 9100 \
        --source-group <prometheus-server-sg>

app-server-sg -> Application server security group where Prometheus server will be allowed on 9100 port.

Grafana

Metrics exposed by the Node exporter & scraped & stored by Prometheus will be visualized with Grafana. Add Prometheus as a data source on Grafana.

Write a Prom QL query to get server metrics like CPU, Memory, and disk space used.

Thank you.