Metrics collected by Application Signals - Amazon CloudWatch

Metrics collected by Application Signals

Application Signals collects both standard application metrics and runtime metrics from the applications that you enable it for.

Standard application metrics relate to the most critical parts of service performance, latency and availability.

Runtime metrics track application metrics over time, including memory usage, CPU usage, and garbage collection. Application Signals displays the runtime metrics in the context of the services that you have enabled for Application Signals. When you have an operational issue, observing the runtime metrics can be useful to help you find the root cause of the issue. For example, you can see if spikes in latency in your service relate to spikes in a runtime metric.

Standard application metrics collected

Application Signals collects standard application metrics from the services that it discovers. These metrics relate to the most critical aspects of a service's performance: latency, faults, and errors. They can help you identify issues, monitor performance trends, and optimize resources to improve the overall user experience.

The following table lists the metrics collected by Application Signals. These metrics are sent to CloudWatch in the ApplicationSignals namespace.

Metric Description

Latency

The delay before data transfer begins after the request is made.

Units: Milliseconds

Fault

A count of both HTTP 5XX server-side faults and OpenTelemetry span status errors.

Units: None

Error

A count of HTTP 4XX client-side errors. These are considered to be request errors that are not caused by service problems. Therefore, the Availability metric displayed on Application Signals dashboards does not regard these errors as service faults.

Units: None

The Availability metric displayed on Application Signals dashboards is computed as (1 - Faults/Total)*100. Total responses includes all responses and is derived from SampleCount(Latency). Successful responses are all responses without a 5XX error. 4XX responses are treated as successful when Application Signals calculates Availability.

Dimensions collected and dimension combinations

The following dimensions are defined for each of the standard application metrics. For more information about dimensions, see Dimensions.

Different dimensions are collected for service metrics and dependency metrics. Within the services discovered by Application Signals, when microservice A calls microservice B, microservice B is serving the request. In this case, microservice A emits dependency metrics and microservice B emits service metrics. When a client calls microservice A, microservice A is serving the request and emits service metrics.

Dimensions for service metrics

The following dimensions are collected for service metrics.

Dimension Description

Service

The name of the service.

The maximum value is 255 characters.

Operation

The name of the API operation or other activity.

The maxiumum value is 1024 characters. Currently, you can set service level objectives on operations only if the operation name is 194 characters or fewer.

Environment

The name of the environment where services are running. If services are not running on Amazon EKS, you can specify an optional custom value for deployment.environment in the OTEL_ATTRIBUTE_RESOURCES parameter.

The maxiumum value is 259 characters.

When you view these metrics in the CloudWatch console, you can view them using the following dimension combinations:

  • [Environment, Service, Operation, [Latency, Error, Fault]]

  • [Environment, Service, [Latency, Error, Fault]]

Dimensions for dependency metrics

The following dimensions are collected for dependency metrics:

Dimension Description

Service

The name of the service.

The maximum value is 255 characters.

Operation

The name of the API operation or other operation.

The maximum value is 1024 characters.

RemoteService

The name of the remote service being invoked.

The maximum value is 255 characters.

RemoteOperation

The name of the API operation being invoked.

The maximum value is 1024 characters.

Environment

The name of the environment where services are running. If services are not running on Amazon EKS, you can specify an optional custom value for deployment.environment in the OTEL_ATTRIBUTE_RESOURCES parameter.

The maxiumum value is 259 characters.

RemoteEnvironment

The name of the environment where dependency services are running. The RemoteEnvironment parameter is automatically generated when a service calls a dependency and they are both running in the same cluster. Otherwise, RemoteEnvironment is neither generated nor reported in the service dependency's metrics. Currently only available on Amazon EKS and K8S platforms.

The maximum value is 259 characters.

RemoteResourceIdentifier

The name of the resource invoked by a remote call. The RemoteResourceIdentifier parameter is automatically generated if service calls a remote AWS service. Otherwise, RemoteResourceIdentifier is neither generated nor reported in the service dependency's metrics.

The maximum value is 1024 characters.

RemoteResourceType

The type of the resource that is invoked by a remote call. Required only if RemoteResourceIdentifier is defined.

The maximum value is 1024 characters.

When you view these metrics in the CloudWatch console, you can view them using the following dimension combinations:

Running on Amazon EKS clusters

  • [Environment, Service, Operation, RemoteService, RemoteOperation, RemoteEnvironment, RemoteResourceIdentifier, RemoteResourceType, [Latency, Error, Fault]]

  • [Environment, Service, Operation, RemoteService, RemoteOperation, RemoteEnvironment, [Latency, Error, Fault]]

  • [Environment, Service, Operation, RemoteService, RemoteOperation, RemoteResourceIdentifier, RemoteResourceType, [Latency, Error, Fault]]

  • [Environment, Service, Operation, RemoteService, RemoteOperation, [Latency, Error, Fault]]

  • [Environment, Service, RemoteService, RemoteEnvironment, [Latency, Error, Fault]]

  • [Environment, Service, RemoteService, [Latency, Error, Fault]]

  • [Environment, Service, RemoteService, RemoteOperation, RemoteEnvironment, RemoteResourceIdentifier, RemoteResourceType, [Latency, Error, Fault]]

  • [Environment, Service, RemoteService, RemoteOperation, RemoteEnvironment, [Latency, Error, Fault]]

  • [Environment, Service, RemoteService, RemoteOperation, RemoteResourceIdentifier, RemoteResourceType, [Latency, Error, Fault]]

  • [Environment, Service, RemoteService, RemoteOperation, [Latency, Error, Fault]]

  • [RemoteService [Latency, Error, Fault]]

  • [RemoteService, RemoteResourceIdentifier, RemoteResourceType [Latency, Error, Fault]]

Runtime metrics

Application Signals uses the AWS Distro for OpenTelemetry SDK to automatically collect OpenTelemetry-compatible metrics from your Java and Python applications. To have runtime metrics collected, you must meet the following pre-requisites:

  • Your CloudWatch agent must be version 1.300049.1 or later.

  • If you use the Amazon CloudWatch Observability EKS add-on, it must be version 2.30-eksbuild.1 or later. If you update the add-on, you must restart your applications.

  • For Java applications, you must be running 1.32.5 or later of the AWS Distro for OpenTelemetry SDK for Java.

  • For Python applications, you must be running 0.7.0 or later of the AWS Distro for OpenTelemetry SDK for Python.

Runtime metrics are not collected for .NET or Node.js applications.

Runtime metric are charged as part of Application Signals costs. For more information about CloudWatch pricing, see Amazon CloudWatch Pricing.

Note

Known issues

The runtime metrics collection in the Java SDK release v1.32.5 is known to not work with applications using JBoss Wildfly. This issue extends to the Amazon CloudWatch Observability EKS add-on, affecting versions 2.3.0-eksbuild.1 through 2.5.0-eksbuild.1.

If you are impacted, either downgrade the version or disable your runtime metrics collection by adding the environment variable OTEL_AWS_APPLICATION_SIGNALS_RUNTIME_ENABLED=false to your application.

Java runtime metrics

Application Signals collects the following JVM metrics from Java applications that you enable for Application Signals. All runtime metrics are sent to CloudWatch in the ApplicationSignals namespace, and are collected with the Service and Environment dimension set.

Metric name Description Meaningful statistics

JVMGCDuration

Aggregated metric for the duration of JVM garbage collection actions.

Unit: Milliseconds

Sum, Average, Minimum, Maximum

JVMGCOldGenDuration

Aggregated metric for the duration of JVM garbage collection actions of the old generation. Available only in G1.

Unit: Milliseconds

Sum, Average, Minimum, Maximum

JVMGCYoungGenDuration

Aggregated metric for the duration of JVM garbage collection actions of the young generation. Available only in G1.

Unit: Milliseconds

Sum, Average, Minimum, Maximum

JVMGCCount

Aggregated metric for the number of JVM garbage collection actions.

Unit: None

Sum, Average, Minimum, Maximum

JVMGCOldGenCount

Aggregated metric for the number of JVM garbage collection actions of the old generation. Available only in G1.

Unit: None

Sum, Average, Minimum, Maximum

JVMGCYoungGenCount

Aggregated metric for the number of JVM garbage collection actions of the young generation. Available only in G1.

Unit: None

Sum, Average, Minimum, Maximum

JVMMemoryHeapUsed

The amount of memory heap used.

Unit: Bytes

Average, Minimum, Maximum

JVMMemoryUsedAfterLastGC

Amount of memory used, as measured after the most recent garbage collection event on this pool.

Unit: Bytes

Average, Minimum, Maximum

JVMMemoryOldGenUsed

The amount of memory used by the old generation.

Unit: Bytes

Average, Minimum, Maximum

JVMMemorySurvivorSpaceUsed

The amount of memory heap used by the survivor space.

Unit: Bytes

Average, Minimum, Maximum

JVMMemoryEdenSpaceUsed

The amount of memory used by the eden space.

Unit: Bytes

Average, Minimum, Maximum

JVMMemoryNonHeapUsed

The amount of non-heap memory used.

Unit: Bytes

Average, Minimum, Maximum

JVMThreadCount

The number of executing threads, including both daemon and non-daemon threads.

Unit: None

Sum, Average, Minimum, Maximum

JVMClassLoaded

The number of classes loaded.

Unit: None

Sum, Average, Minimum, Maximum

JVMCpuTime

The CPU time used by the process, as reported by the JVM.

Unit: None (Nanoseconds)

Sum, Average, Minimum, Maximum

JVMCpuRecentUtilization

The number of processors available to the Java virtual machine.

Unit: None

Average, Minimum, Maximum

Python runtime metrics

Application Signals collects the following metrics from Python applications that you enable for Application Signals. All runtime metrics are sent to CloudWatch in the ApplicationSignals namespace, and are collected with the Service and Environment dimension set.

Metric name Description Meaningful statistics

PythonProcessGCCount

The total number of objects currently being tracked.

Unit: None

Sum, Average, Minimum, Maximum

PythonProcessGCGen0Count

The number of objects currently being tracked in Generation 0.

Unit: None

Sum, Average, Minimum, Maximum

PythonProcessGCGen1Count

The number of objects currently being tracked in Generation 1.

Unit: None

Sum, Average, Minimum, Maximum

PythonProcessGCGen2Count

The number of objects currently being tracked in Generation 2.

Unit: None

Sum, Average, Minimum, Maximum

PythonProcessVMSMemoryUsed

The total amount of virtual memory used by the process.

Unit: Bytes

Average, Minimum, Maximum

PythonProcessRSSMemoryUsed

The total amount of non-swapped physical memory used by the process.

Unit: Bytes

Average, Minimum, Maximum

PythonProcessThreadCount

The number of threads currently used by the process.

Unit: None

Sum, Average, Minimum, Maximum

PythonProcessCpuTime

The CPU time used by the process.

Unit: Seconds

Sum, Average, Minimum, Maximum

PythonProcessCpuUtilization

The CPU utilization of the process.

Unit: None

Average, Minimum, Maximum

Disabling the collection of runtime metrics

Runtime metrics are collected by default for Java and Python applications that are enabled for Application Signals. If you want to disable the collection of these metrics, follow the instructions in this section for your environment.

Amazon EKS

To disable runtime metrics in Amazon EKS applications at the application level, add the following environment variable to your workload specification.

env: - name: OTEL_AWS_APPLICATION_SIGNALS_RUNTIME_ENABLED value: "false"

To disable runtime metrics in Amazon EKS applications at the cluster level, apply the configuration to the advanced configuration of your Amazon CloudWatch Observability EKS add-on.

{ "agent": { "config": { "traces": { "traces_collected": { "application_signals": { } } }, "logs": { "metrics_collected": { "application_signals": { } } } }, "manager": { "autoInstrumentationConfiguration": { "java": { "runtime_metrics": { "enabled": "false" } }, "python": { "runtime_metrics": { "enabled": "false" } } } } }

Amazon ECS

To disable runtime metrics in Amazon ECS applications, add the environment variable OTEL_AWS_APPLICATION_SIGNALS_RUNTIME_ENABLED=false in the new task definition revision and redeploy the application.

EC2

To disable runtime metrics in Amazon EC2 applications, add the environment variable OTEL_AWS_APPLICATION_SIGNALS_RUNTIME_ENABLED=false before the application starts.

Kubernetes

To disable runtime metrics in Kubernetes applications at the application level, add the following environment variable to your workload specification.

env: - name: OTEL_AWS_APPLICATION_SIGNALS_RUNTIME_ENABLED value: "false"

To disable runtime metrics in Kubernetes applications at the cluster level, use the following:

helm upgrade ... \ --set-string manager.autoInstrumentationConfiguration.java.runtime_metrics.enabled=false \ --set-string manager.autoInstrumentationConfiguration.python.runtime_metrics.enabled=false