Skip to main content

Temporal Cloud and SDKs emit metrics that can be used to monitor performance and troubleshoot errors.

While Temporal Cloud emits metrics through a Prometheus HTTP API endpoint, the open-source SDKs require you to set up a Prometheus scrape endpoint for Prometheus to collect and aggregate the Worker and Client metrics.

This article describes how to set up your Temporal Cloud and SDK metrics, and use them as data sources in Grafana.

The process for setting up observability includes the following steps:

  1. Get Prometheus endpoints for Temporal Cloud metrics and SDK metrics.
  2. Run Grafana and set up data sources for Temporal Cloud and SDK metrics in Grafana. The examples in this article describe running Grafana on your local host where you run your application code.
  3. Create dashboards in Grafana to view Temporal Cloud metrics and SDK metrics.

All requests made to the Temporal Cluster by the Client or Worker are gRPC requests. Sometimes, when these frontend requests can't be completed, you'll see this particular error message: Context: deadline exceeded. Network interruptions, timeouts, server overload, and Query errors are some of the causes of this error.

The following sections discuss the nature of this error and how to troubleshoot it.

The Temporal Cluster and SDKs emit metrics that can be used to monitor performance and troubleshoot issues. To collect and aggregate these metrics, you can use one of the following tools:

  • Prometheus
  • StatsD
  • M3

After you enable your monitoring tool, you can relay these metrics to any monitoring and observability platform.

The current Run Id is mutable and can change during a Workflow Retry. You should not rely on storing the current Run Id, or using it for any logical choices, because a Workflow Retry changes the Run Id and can lead to non-determinism issues.

Running into limits can cause unexpected failures, so be mindful when you design your systems. Here is a list of many hard (error) or soft (warn) limits that you could encounter while using the Temporal Platform.