AWS Container Insights Receiver
AWS Container Insights Receiver (awscontainerinsightreceiver
) is an AWS specific receiver that supports CloudWatch Container Insights . CloudWatch Container Insights collect, aggregate,
and summarize metrics and logs from your containerized applications and microservices. Data are collected as as performance log events
using embedded metric format . From the EMF data, Amazon CloudWatch can create the aggregated CloudWatch metrics at the cluster, node, pod, task, and service level.
CloudWatch Container Insights has been supported by ECS Agent and CloudWatch Agent to collect infrastructure metrics for many resources such as such as CPU, memory, disk, and network. To migrate existing customers to use OpenTelemetry, AWS Container Insights Receiver (together with CloudWatch EMF Exporter) aims to support the same CloudWatch Container Insights experience for the following platforms:
Amazon ECS
Amazon EKS
Kubernetes platforms on Amazon EC2
Design of AWS Container Insights Receiver
See the design doc
Example configuration:
receivers:
awscontainerinsightreceiver:
# all parameters are optional
collection_interval: 60s
container_orchestrator: eks
add_service_as_attribute: true
prefer_full_pod_name: false
add_full_pod_name_metric_label: false
There is no need to provide any parameters since they are all optional.
collection_interval (optional)
The interval at which metrics should be collected. The default is 60 second.
container_orchestrator (optional)
The type of container orchestration service, e.g. eks or ecs. The default is eks.
add_service_as_attribute (optional)
Whether to add the associated service name as attribute. The default is true
prefer_full_pod_name (optional)
The "PodName" attribute is set based on the name of the relevant controllers like Daemonset, Job, ReplicaSet, ReplicationController, ... If it can not be set that way and PrefFullPodName is true, the "PodName" attribute is set to the pod's own name. The default value is false.
add_full_pod_name_metric_label (optional)
The "FullPodName" attribute is the pod name including suffix. If false FullPodName label is not added. The default value is false
Sample configuration for Container Insights
This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver
and awsemfexporter
for an EKS cluster:
# create namespace
apiVersion: v1
kind: Namespace
metadata:
name: aws-otel-eks
labels:
name: aws-otel-eks
---
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
name: aws-otel-sa
namespace: aws-otel-eks
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: aoc-agent-role
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["list", "watch"]
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
- apiGroups: [""]
resources: ["nodes/stats", "configmaps", "events"]
verbs: ["create", "get"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["otel-container-insight-clusterleader"]
verbs: ["get","update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: aoc-agent-role-binding
subjects:
- kind: ServiceAccount
name: aws-otel-sa
namespace: aws-otel-eks
roleRef:
kind: ClusterRole
name: aoc-agent-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-agent-conf
namespace: aws-otel-eks
labels:
app: opentelemetry
component: otel-agent-conf
data:
otel-agent-config: |
extensions:
health_check:
receivers:
awscontainerinsightreceiver:
processors:
batch/metrics:
timeout: 60s
exporters:
awsemf:
namespace: ContainerInsights
log_group_name: '/aws/containerinsights/{ClusterName}/performance'
log_stream_name: '{NodeName}'
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources, kubernetes]
metric_declarations:
# node metrics
- dimensions: [[NodeName, InstanceId, ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- dimensions: [[ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- node_cpu_usage_total
- node_cpu_limit
- node_memory_working_set
- node_memory_limit
# pod metrics
- dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_utilization
- pod_memory_utilization
- pod_network_rx_bytes
- pod_network_tx_bytes
- pod_cpu_utilization_over_pod_limit
- pod_memory_utilization_over_pod_limit
- dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_reserved_capacity
- pod_memory_reserved_capacity
- dimensions: [[PodName, Namespace, ClusterName]]
metric_name_selectors:
- pod_number_of_container_restarts
# cluster metrics
- dimensions: [[ClusterName]]
metric_name_selectors:
- cluster_node_count
- cluster_failed_node_count
# service metrics
- dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- service_number_of_running_pods
# node fs metrics
- dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
metric_name_selectors:
- node_filesystem_utilization
# namespace metrics
- dimensions: [[Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- namespace_number_of_running_pods
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [batch/metrics]
exporters: [awsemf]
extensions: [health_check]
---
# create Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: aws-otel-eks-ci
namespace: aws-otel-eks
spec:
selector:
matchLabels:
name: aws-otel-eks-ci
template:
metadata:
labels:
name: aws-otel-eks-ci
spec:
containers:
- name: aws-otel-collector
image: {collector-image-url}
env:
#- name: AWS_REGION
# value: "us-east-1"
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: HOST_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
imagePullPolicy: Always
command:
- "/awscollector"
- "--config=/conf/otel-agent-config.yaml"
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: dockersock
mountPath: /var/run/docker.sock
readOnly: true
- name: varlibdocker
mountPath: /var/lib/docker
readOnly: true
- name: containerdsock
mountPath: /run/containerd/containerd.sock
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: devdisk
mountPath: /dev/disk
readOnly: true
- name: otel-agent-config-vol
mountPath: /conf
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 200m
memory: 200Mi
volumes:
- configMap:
name: otel-agent-conf
items:
- key: otel-agent-config
path: otel-agent-config.yaml
name: otel-agent-config-vol
- name: rootfs
hostPath:
path: /
- name: dockersock
hostPath:
path: /var/run/docker.sock
- name: varlibdocker
hostPath:
path: /var/lib/docker
- name: containerdsock
hostPath:
path: /run/containerd/containerd.sock
- name: sys
hostPath:
path: /sys
- name: devdisk
hostPath:
path: /dev/disk/
serviceAccountName: aws-otel-sa
To deploy to an EKS cluster
kubectl apply -f config.yaml
Available Metrics and Resource Attributes
Metric
Unit
cluster_failed_node_count
Count
cluster_node_count
Count
Resource Attribute
ClusterName
NodeName
Type
Timestamp
Version
Sources
Metric
Unit
namespace_number_of_running_pods
Count
Resource Attribute
ClusterName
NodeName
Namespace
Type
Timestamp
Version
Sources
kubernete
Metric
Unit
service_number_of_running_pods
Count
Resource Attribute
ClusterName
NodeName
Namespace
Service
Type
Timestamp
Version
Sources
kubernete
Metric
Unit
node_cpu_limit
Millicore
node_cpu_request
Millicore
node_cpu_reserved_capacity
Percent
node_cpu_usage_system
Millicore
node_cpu_usage_total
Millicore
node_cpu_usage_user
Millicore
node_cpu_utilization
Percent
node_memory_cache
Bytes
node_memory_failcnt
Count
node_memory_hierarchical_pgfault
Count/Second
node_memory_hierarchical_pgmajfault
Count/Second
node_memory_limit
Bytes
node_memory_mapped_file
Bytes
node_memory_max_usage
Bytes
node_memory_pgfault
Count/Second
node_memory_pgmajfault
Count/Second
node_memory_request
Bytes
node_memory_reserved_capacity
Percent
node_memory_rss
Bytes
node_memory_swap
Bytes
node_memory_usage
Bytes
node_memory_utilization
Percent
node_memory_working_set
Bytes
node_network_rx_bytes
Bytes/Second
node_network_rx_dropped
Count/Second
node_network_rx_errors
Count/Second
node_network_rx_packets
Count/Second
node_network_total_bytes
Bytes/Second
node_network_tx_bytes
Bytes/Second
node_network_tx_dropped
Count/Second
node_network_tx_errors
Count/Second
node_network_tx_packets
Count/Second
node_number_of_running_containers
Count
node_number_of_running_pods
Count
Resource Attribute
ClusterName
InstanceType
NodeName
Timestamp
Type
Version
Sources
kubernete
Metric
Unit
node_diskio_io_serviced_async
Count/Second
node_diskio_io_serviced_read
Count/Second
node_diskio_io_serviced_sync
Count/Second
node_diskio_io_serviced_total
Count/Second
node_diskio_io_serviced_write
Count/Second
node_diskio_io_service_bytes_async
Bytes/Second
node_diskio_io_service_bytes_read
Bytes/Second
node_diskio_io_service_bytes_sync
Bytes/Second
node_diskio_io_service_bytes_total
Bytes/Second
node_diskio_io_service_bytes_write
Bytes/Second
Resource Attribute
AutoScalingGroupName
ClusterName
InstanceId
InstanceType
NodeName
Timestamp
EBSVolumeId
device
Type
Version
Sources
kubernete
Metric
Unit
node_filesystem_available
Bytes
node_filesystem_capacity
Bytes
node_filesystem_inodes
Count
node_filesystem_inodes_free
Count
node_filesystem_usage
Bytes
node_filesystem_utilization
Percent
Resource Attribute
AutoScalingGroupName
ClusterName
InstanceId
InstanceType
NodeName
Timestamp
EBSVolumeId
device
fstype
Type
Version
Sources
kubernete
Metric
Unit
node_interface_network_rx_bytes
Bytes/Second
node_interface_network_rx_dropped
Count/Second
node_interface_network_rx_errors
Count/Second
node_interface_network_rx_packets
Count/Second
node_interface_network_total_bytes
Bytes/Second
node_interface_network_tx_bytes
Bytes/Second
node_interface_network_tx_dropped
Count/Second
node_interface_network_tx_errors
Count/Second
node_interface_network_tx_packets
Count/Second
Resource Attribute
AutoScalingGroupName
ClusterName
InstanceId
InstanceType
NodeName
Timestamp
Type
Version
interface
Sources
kubernete
Metric
Unit
pod_cpu_limit
Millicore
pod_cpu_request
Millicore
pod_cpu_reserved_capacity
Percent
pod_cpu_usage_system
Millicore
pod_cpu_usage_total
Millicore
pod_cpu_usage_user
Millicore
pod_cpu_utilization
Percent
pod_cpu_utilization_over_pod_limit
Percent
pod_memory_cache
Bytes
pod_memory_failcnt
Count
pod_memory_hierarchical_pgfault
Count/Second
pod_memory_hierarchical_pgmajfault
Count/Second
pod_memory_limit
Bytes
pod_memory_mapped_file
Bytes
pod_memory_max_usage
Bytes
pod_memory_pgfault
Count/Second
pod_memory_pgmajfault
Count/Second
pod_memory_request
Bytes
pod_memory_reserved_capacity
Percent
pod_memory_rss
Bytes
pod_memory_swap
Bytes
pod_memory_usage
Bytes
pod_memory_utilization
Percent
pod_memory_utilization_over_pod_limit
Percent
pod_memory_working_set
Bytes
pod_network_rx_bytes
Bytes/Second
pod_network_rx_dropped
Count/Second
pod_network_rx_errors
Count/Second
pod_network_rx_packets
Count/Second
pod_network_total_bytes
Bytes/Second
pod_network_tx_bytes
Bytes/Second
pod_network_tx_dropped
Count/Second
pod_network_tx_errors
Count/Second
pod_network_tx_packets
Count/Second
pod_number_of_container_restarts
Count
pod_number_of_containers
Count
pod_number_of_running_containers
Count
Resource Attribute
AutoScalingGroupName
ClusterName
InstanceId
InstanceType
K8sPodName
Namespace
NodeName
PodId
Timestamp
Type
Version
Sources
kubernete
pod_status
Metric
Unit
pod_interface_network_rx_bytes
Bytes/Second
pod_interface_network_rx_dropped
Count/Second
pod_interface_network_rx_errors
Count/Second
pod_interface_network_rx_packets
Count/Second
pod_interface_network_total_bytes
Bytes/Second
pod_interface_network_tx_bytes
Bytes/Second
pod_interface_network_tx_dropped
Count/Second
pod_interface_network_tx_errors
Count/Second
pod_interface_network_tx_packets
Count/Second
Resource Attribute
AutoScalingGroupName
ClusterName
InstanceId
InstanceType
K8sPodName
Namespace
NodeName
PodId
Timestamp
Type
Version
interface
Sources
kubernete
pod_status
Metric
Unit
container_cpu_limit
Millicore
container_cpu_request
Millicore
container_cpu_usage_system
Millicore
container_cpu_usage_total
Millicore
container_cpu_usage_user
Millicore
container_cpu_utilization
Percent
container_memory_cache
Bytes
container_memory_failcnt
Count
container_memory_hierarchical_pgfault
Count/Second
container_memory_hierarchical_pgmajfault
Count/Second
container_memory_limit
Bytes
container_memory_mapped_file
Bytes
container_memory_max_usage
Bytes
container_memory_pgfault
Count/Second
container_memory_pgmajfault
Count/Second
container_memory_request
Bytes
container_memory_rss
Bytes
container_memory_swap
Bytes
container_memory_usage
Bytes
container_memory_utilization
Percent
container_memory_working_set
Bytes
number_of_container_restarts
Count
Resource Attribute
AutoScalingGroupName
ClusterName
ContainerId
ContainerName
InstanceId
InstanceType
K8sPodName
Namespace
NodeName
PodId
Timestamp
Type
Version
Sources
kubernetes
container_status
container_status_reason
container_last_termination_reason
The attribute container_status_reason
is present only when container_status
is in "Waiting" or "Terminated" State. The attribute container_last_termination_reason
is present only when container_status
is in "Terminated" State.
This is a sample configuration for AWS Container Insights using the awscontainerinsightreceiver
and awsemfexporter
for an ECS cluster to collect the instance level metrics:
receivers:
awscontainerinsightreceiver:
collection_interval: 10s
container_orchestrator: ecs
processors:
batch/metrics:
timeout: 60s
exporters:
awsemf:
namespace: ContainerInsightsEC2Instance
log_group_name: '/aws/ecs/containerinsights/{ClusterName}/performance'
log_stream_name: 'instanceTelemetry/{ContainerInstanceId}'
resource_to_telemetry_conversion:
enabled: true
dimension_rollup_option: NoDimensionRollup
parse_json_encoded_attr_values: [Sources]
metric_declarations:
# instance metrics
- dimensions: [ [ ContainerInstanceId, InstanceId, ClusterName] ]
metric_name_selectors:
- instance_cpu_utilization
- instance_memory_utilization
- instance_network_total_bytes
- instance_cpu_reserved_capacity
- instance_memory_reserved_capacity
- instance_number_of_running_tasks
- instance_filesystem_utilization
- dimensions: [ [ClusterName] ]
metric_name_selectors:
- instance_cpu_utilization
- instance_memory_utilization
- instance_network_total_bytes
- instance_cpu_reserved_capacity
- instance_memory_reserved_capacity
- instance_number_of_running_tasks
- instance_cpu_usage_total
- instance_cpu_limit
- instance_memory_working_set
- instance_memory_limit
debug:
verbosity: detailed
service:
pipelines:
metrics:
receivers: [awscontainerinsightreceiver]
processors: [batch/metrics]
exporters: [awsemf,debug]
To deploy to an ECS cluster check this doc for details
Available Metrics and Resource Attributes
Metric
Unit
instance_cpu_limit
Millicore
instance_cpu_reserved_capacity
Percent
instance_cpu_usage_system
Millicore
instance_cpu_usage_total
Millicore
instance_cpu_usage_user
Millicore
instance_cpu_utilization
Percent
instance_memory_cache
Bytes
instance_memory_failcnt
Count
instance_memory_hierarchical_pgfault
Count/Second
instance_memory_hierarchical_pgmajfault
Count/Second
instance_memory_limit
Bytes
instance_memory_mapped_file
Bytes
instance_memory_max_usage
Bytes
instance_memory_pgfault
Count/Second
instance_memory_pgmajfault
Count/Second
instance_memory_reserved_capacity
Percent
instance_memory_rss
Bytes
instance_memory_swap
Bytes
instance_memory_usage
Bytes
instance_memory_utilization
Percent
instance_memory_working_set
Bytes
instance_network_rx_bytes
Bytes/Second
instance_network_rx_dropped
Count/Second
instance_network_rx_errors
Count/Second
instance_network_rx_packets
Count/Second
instance_network_total_bytes
Bytes/Second
instance_network_tx_bytes
Bytes/Second
instance_network_tx_dropped
Count/Second
instance_network_tx_errors
Count/Second
instance_network_tx_packets
Count/Second
instance_number_of_running_tasks
Count
Resource Attribute
ClusterName
InstanceType
AutoScalingGroupName
Timestamp
Type
Version
Sources
ContainerInstanceId
InstanceId
Metric
Unit
instance_diskio_io_serviced_async
Count/Second
instance_diskio_io_serviced_read
Count/Second
instance_diskio_io_serviced_sync
Count/Second
instance_diskio_io_serviced_total
Count/Second
instance_diskio_io_serviced_write
Count/Second
instance_diskio_io_service_bytes_async
Bytes/Second
instance_diskio_io_service_bytes_read
Bytes/Second
instance_diskio_io_service_bytes_sync
Bytes/Second
instance_diskio_io_service_bytes_total
Bytes/Second
instance_diskio_io_service_bytes_write
Bytes/Second
Resource Attribute
ClusterName
InstanceType
AutoScalingGroupName
Timestamp
Type
Version
Sources
ContainerInstanceId
InstanceId
EBSVolumeId
Metric
Unit
instance_filesystem_available
Bytes
instance_filesystem_capacity
Bytes
instance_filesystem_inodes
Count
instance_filesystem_inodes_free
Count
instance_filesystem_usage
Bytes
instance_filesystem_utilization
Percent
Resource Attribute
ClusterName
InstanceType
AutoScalingGroupName
Timestamp
Type
Version
Sources
ContainerInstanceId
InstanceId
EBSVolumeId
Metric
Unit
instance_interface_network_rx_bytes
Bytes/Second
instance_interface_network_rx_dropped
Count/Second
instance_interface_network_rx_errors
Count/Second
instance_interface_network_rx_packets
Count/Second
instance_interface_network_total_bytes
Bytes/Second
instance_interface_network_tx_bytes
Bytes/Second
instance_interface_network_tx_dropped
Count/Second
instance_interface_network_tx_errors
Count/Second
instance_interface_network_tx_packets
Count/Second
Resource Attribute
ClusterName
InstanceType
AutoScalingGroupName
Timestamp
Type
Version
Sources
ContainerInstanceId
InstanceId
EBSVolumeId
When using this component, the collector process needs root permission to be able to read the content of the files located in the following locations:
/
/var/run/docker.sock
/var/lib/docker
/run/containerd/containerd.sock
/sys
/dev/disk
This requirement comes from the fact that this component is based on cAdvisor .