Metrics

This topic isn't crucial to understanding Concourse; if you're just getting started and have finished the Installing section, you may want to first move on to Using Concourse.

Metrics are essential in understanding how any large system is behaving and performing. Concourse can emit metrics about both the system health itself and about the builds that it is running. Operators can tap into these metrics in order to observe the health of the system.

In the spirit of openness, the metrics from our deployment are public. We consider it a bug to emit anything sensitive or secret into our metrics pipeline.

Concourse components and virtual machines emit metrics to Riemann. Riemann handles the processing of this event stream and the forwarding to various Time Series Databases (TSDBs). To fully understand how to take advantage of Concourse metrics you should familiarise yourself with the concepts in Riemann. I'll wait here while you read through that (don't worry, it's not very long!).

Riemann events can contain both tags and attributes. We use them for different things in Concourse metrics. We use custom attributes for tagging all metrics with context specific information about the metric. Such as the deployment, pipeline, or build that it relates to. We don't use the standard build-in tags much as they are normally used for conditionals in Riemann's stream processing which we don't rely on and don't provide the key-value data storage that we need. This is our own convention and isn't shared by everything in the Riemann community but we find that it works well.

Deploying the Metrics Infrastructure

We've made a few BOSH releases that you can collocate and deploy to get a similar setup to ours. If you will be emitting to DataDog then you will only need the Riemann release.

The documentation for each of these lives in the release themselves.

If you want to mirror our setup then you should set up Riemann to emit metrics to InfluxDB and then point Grafana at the same InfluxDB.

If you set your Concourse to log everything at or above the debug level then all metrics will be logged as well as emitted. This is useful if you haven't yet set up a Riemann server.

Concourse Metrics

This reference section lists of all of the metrics that Concourse emits. We don't include the warning and critical levels as they will keep changing as we optimise the system. To find those, please refer to the source of truth: the code.

scheduling: full duration (ms)

This is the time taken (in milliseconds) to schedule an entire pipeline including the time taken to load the version information from the database and calculate the latest valid versions for each job.

Attributes

pipeline

The pipeline which was being scheduled.

scheduling: loading versions duration (ms)

This is the time taken (in milliseconds) to load the version information from the database.

Attributes

pipeline

The pipeline which was being scheduled.

scheduling: job duration (ms)

This is the time taken (in milliseconds) to calculate the set of valid input versions when scheduling a job. It is emitted once for each job per pipeline scheduling tick.

Attributes

pipeline

The pipeline which was being scheduled.

job

The job which was being scheduled.

worker containers

The number of containers that are currently running on your workers.

Attributes

worker

The address of the worker.

build started

This event is emitted when a build starts. Its value is the build ID of the build. However, it is most useful for annotating your metrics with the start and end of different jobs.

Attributes

pipeline

The pipeline which contains the build being started.

job

The job which configured the build being started.

build_name

The name of the build being started. (Remember that build numbers in Concourse are actually names and are strings).

build_id

The ID of the build being started.

build finished

This event is emitted when a build ends. Its value is the duration of the build in milliseconds. You can use this metric in conjunction with build started to annotate your metrics with when builds started and stopped.

Attributes

pipeline

The pipeline which contains the build that finished.

job

The job which configured the build that finished.

build_name

The name of the build that finished. (Remember that build numbers in Concourse are actually names and are strings).

build_id

The ID of the build that finished.

build_status

The resulting status of the build; one of "succeeded", "failed", "errored", or "aborted".

http response time

This metric is emitted for each HTTP request to an ATC (both API and web requests). It contains the duration (in milliseconds) for each request and is useful for finding slow requests.

Attributes

route

The route which the HTTP request matched. i.e. /builds/:id

path

The literal path of the HTTP request. i.e. /builds/1234

Service and Infrastructure Metrics

You may have seen on our dashboard that we have other metrics in addition to those defined above. Riemann has existing tooling for gathering metrics from machines and other common services. These can be found in the riemann-tools gem. We use riemann-health, riemann-aws-rds-status, and riemann-net.