Grafana

From BitFolk
Jump to navigation Jump to search

In October 2019 BitFolk launched its Grafana graphing service.

In order to help you manage and monitor your service, BitFolk provides graphs generated by Grafana based on data collected every 15 seconds by Prometheus.

The service can be found at https://tools.bitfolk.com/grafana/

Logging in

Simply visit the service's URL and you will be redirected to BitFolk's Panel in order to authenticate. You can authenticate as any valid login; if you have multiple then dahsboards for the other services are available by click on on the "Dashboards" icon on the left - it looks like four small squares arranged in a square shape.

The home dashboard

Upon logging in you'll be presented with your "home" dashboard. Here's an interactive example which non-customers can look at. This should contain the following panels:

Total data transfer

Total data transfer

A textual report of the amount of bytes transferred in the selected time period. Note that this includes all data, including to/from local BitFolk infrastructure such as package mirrors, spamd and the monitoring systems themselves, thus it will read out higher than your data transfer for billing purposes.

95th percentile bandwidth

95th percentile bandwidth usage

A textual report of your 95th percentile bandwidth usage. Mainly useful for high bandwidth users to determine if the 95th percentile billing option would be a better way to go. Again includes all data transferred, not excluding local data flows.

Data transfer

Graph of data transfer

A graph of data transfer from the point of view of your VPS, i.e. "In" is traffic in to your VPS and "Out" is traffic out of your VPS. Again includes all data transferred, not excluding local data flows. Inbound traffic is shown on the negative Y axis.

% CPU

Graph of Xen CPU usage

Gauge of the last CPU reading as a percent of all available CPUs. All BitFolk customers at the time of writing have 2 vCPUs, so 50% would represent one full CPU core and 100% would represent two full CPU cores.

Xen CPU usage

Graph of Xen CPU usage

A graph of CPU usage. As above, this is a percentage of all available CPUs. All BitFolk customers have 2 vCPUs at the time of writing, so 50% would represent one full CPU core and 100% would represent two full CPU cores.

Block device throughput

Graph of block device throughput

A graph of throughput of each block device in bytes per second. Reads are shown on the negative Y axis. You'll have a read stat and a write stat for each block device in your VM. The default BitFolk guest has two block devices, with xvda being for the operating system and xvdb being for swap.

Block device IOPS

Graph of block device IOPS

A graph of the number of Input/Output Operations per second (IOPS) for each block device. Reads are shown on the negative Y axis.

Basic user interface

Hover over a panel to see exact values/times.

Drag-select to narrow down to a given time range.

Click on an item in the legend to show only that item. Ctrl and click to select multiple items.

Time range and page refresh time can also be set at the top right.

"No negative Y"

During acceptance testing there was feedback from one customer that graphs with a negative Y axis are more difficult to read. This view was not shared by any other customer who gave a response so the default dashboard was left as the one with graphs using a negative Y axis, but an alternate dashboard that does not use negative Y axis was added for anyone else who does feel this way.

Adding additional dashboards and panels

The default dashboard covers every metric that was graphed with BitFolk's old Cacti service, plus a few more besides. There was always an open offer to graph more things in Cacti if any customer wished but at the time of the cut-over to Grafana only 1 customer was making use of that offer. They had a bunch more graphs based on SNMP metrics.

That customer decided to install Node Exporter and their additional Cacti graphs (with a couple of minor omissions) were replicated in Grafana with some useful extra ones. An interactive snapshot of that is available.

If you too would like that additional dashboard, please contact Support.

If Node Exporter doesn't carry the metrics that you want to have graphed and they're not suitable for putting in the Textfile Collector, BitFolk could be persuaded to scrape a different collector like snmpd.

Sharing Grafana dashboard and panels

The actual Grafana interface for the most part requires a BitFolk login to use, but there are ways to share snapshots of dashboard and panels publicly.

Dashboard snapshots

Sharing a dashboard snapshot

At the top of the dashboard is a row of icons. The one with the arrow pointing right will bring up some sharing options. Of these, the middle option, "Snapshot", will create an interactive snapshot and provide the link to it. This snapshot will be publicly available and will allow anyone who knows the link to look through and zoom into the graphs inside the time span you selected. It will not update with new data.

The first option, "Link", will not be of use to you because it merely provides a link to the current dashboard, which requires authentication.

The third option, "Export", only exports the JSON definition of the dashboard which is only of use if you have another Grafana install and access to Prometheus metrics of the same names.

Panel snapshots

Snapshots of individual panels within dashboards are also possible. Hover your cursor over the title bar of the panel and a down-pointing triangle will appear. Click on this and a menu shows up. Select the "share…" option to a get a window much like the dashboard share window.

Under the "Link" tab, the only useful option is "Direct link rendered image" which will generate a png snapshot of the panel in the requested dimensions which you can save and upload to somewhere else. The main link that this tab provides again is only available after authenticating so isn't any use for sharing.

Under the "Snapshot" tab it's possible to create a public snapshot as for dashboards above.

Frequently Answered Questions

Can I edit the dashboard to add new panels?

No. The goal was to provide the same graphs as Cacti with at least the same level of usability, but not to run a hosted install of Grafana+Prometheus. There are service providers who charge for that, or you can do so on your own VM.

That said, if you…

  • have ideas for existing metrics you'd like to see graphs or…
  • can find a way to expose the metrics you want graphed in a format that Prometheus can ingest…

then BitFolk will probably be willing to scrape them and serve them to you in an additional Grafana dashboard.

I have multiple VMs. Do I have to keep logging in and out?

No. There is a dashboard for each VM. Hover over the "Dashboards" icon on the left (looks like four little squares in a square arrangement) and then click "Manage" to see a list of all dashboards.

How long will this retain data for?

There was over 5 years of data in Cacti. BitFolk hopes to initially store a year of data in Prometheus and then evaluate the situation. It should be possible to store any amount of data externally, but access to very historic data may be slower.

Why does Grafana show much more data transfer than the BitFolk Panel does?

It's probably because Grafana is seeing every data flow, including to local BitFolk infrastructure such as package mirrors, spamd etc. Those data flows are normally excluded from billing, and it's only the billable data that the Panel is showing.

If you think that doesn't explain it, please tell Support.

Why did my CPU graph in Cacti show values over 100% but Grafana doesn't?

By popular request the Grafana gauge and graphs were normalised to show percent of total available CPU. Therefore they will only show 0–100% no matter how many CPUs you have. Cacti was showing percent of a CPU core, so 100% represented one CPU core.

Why haven't you imported the 5+ years of data from Cacti?

BitFolk hasn't done this because there is no easy way to do it. If you are aware of one, please let Support know!

The Cacti interface is still available, so you can still look at historical graphs. At some point that interface is going to be switched off but a static dump of all the graphs will be retained.

Will you expose the metrics that you have about our VMs so we can scrape them ourselves?

No, unfortunately this is not likely to happen because separating out metrics related to each individual customer and managing the authorization is too much work.

I've installed Node Exporter on my VM, should I firewall it?

Yes, that would be a good idea. By default it listens on port 9100 of every interface. You need to allow access only from:

  • BitFolk's Prometheus at 85.119.80.242 and 2001:ba8:1f1:f0f2::2
  • BitFolk's monitoring IPs
  • Anywhere else that you yourself want to access it from

Can I make Node Exporter use https?

You would have to put a HTTPS reverse proxy (Apache, nginx, haproxy, stunnel, …) in front of it. Don't forget to tell BitFolk that it's https on port 433 (or whatever).

Can I make Node Exporter use HTTP Basic Auth?

You would have to put a reverse proxy in front of it and do the authentication there. Don't forget to tell BitFolk what the access credentials are.

What are the dangers of not firewalling Node Exporter?

  • Malicious people could read all sorts of metrics about your host like what versions of packages are installed, what its mount points are and how busy it is
  • As with all software that is available from remote hosts, if there were some sort of bug in Node Exporter then an unauthenticated attacker would be able to exploit it