TensorFlow - Amazon EMR

TensorFlow

TensorFlow is an open-source symbolic math library for machine intelligence and deep learning applications. For more information, see the TensorFlow website. TensorFlow is available with Amazon EMR release version 5.17.0 and later.

The following table lists the version of TensorFlow included in the latest release of the Amazon EMR 7.x series, along with the components that Amazon EMR installs with TensorFlow.

For the version of components installed with TensorFlow in this release, see Release 7.5.0 Component Versions.

TensorFlow version information for emr-7.5.0
Amazon EMR Release Label TensorFlow Version Components Installed With TensorFlow

emr-7.5.0

TensorFlow 2.16.1

emrfs, emr-goodies, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, tensorflow

The following table lists the version of TensorFlow included in the latest release of the Amazon EMR 6.x series, along with the components that Amazon EMR installs with TensorFlow.

For the version of components installed with TensorFlow in this release, see Release 6.15.0 Component Versions.

TensorFlow version information for emr-6.15.0
Amazon EMR Release Label TensorFlow Version Components Installed With TensorFlow

emr-6.15.0

TensorFlow 2.11.0

emrfs, emr-goodies, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, tensorflow

The following table lists the version of TensorFlow included in the latest release of the Amazon EMR 5.x series, along with the components that Amazon EMR installs with TensorFlow.

For the version of components installed with TensorFlow in this release, see Release 5.36.2 Component Versions.

TensorFlow version information for emr-5.36.2
Amazon EMR Release Label TensorFlow Version Components Installed With TensorFlow

emr-5.36.2

TensorFlow 2.4.1

emrfs, emr-goodies, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, tensorflow

TensorFlow builds by Amazon EC2 instance type

Amazon EMR uses different builds of the TensorFlow library depending on the instance types that you choose for your cluster. Amazon EMR supports TensorFlow for clusters with aarch64 (Graviton) instance types for EMR-7.5.0 and above. The following table lists builds by instance type.

EC2 instance types TensorFlow build

M5 and C5

Tensorflow 2.16.1 with Intel MKL optimization

P2, P4D, P5, G4DN, G5, G6 and GR6

Tensorflow 2.16.1 with CUDA 12.3, cuDNN 8.9.7.29

P3, P3DN, G3 and G3S

Tensorflow 2.16.1 with CUDA 12.3, cuDNN 8.9.7.29, NCCL 2.20.3-1

Nvidia NCCL is available only on P3 instances. End User License Agreement (EULA): By using Nvidia components on Amazon EMR, you agree to the terms and conditions outlined in the product EULA.

All others except Graviton instances

Tensorflow 2.16.1

Security

In addition to following the guidance in Using TensorFlow securely we recommend that you launch your cluster in a private subnet to help you limit access to trusted sources. For more information, see Amazon VPC options in the Amazon EMR Management Guide.

Using TensorBoard

TensorBoard is a suite of visualization tools for TensorFlow programs. For more information, see TensorBoard: Visualized learning on the Tensorflow website.

To use TensorBoard with Amazon EMR, you must start TensorBoard on the cluster master node.

To use tensorboard with Tensorflow on Amazon EMR
  1. Connect to the master node of the cluster using SSH. For more information, see Connect to the master node using SSH in the Amazon EMR Management Guide.

  2. Type the following command to start Tensorboard on the master node. Replace /my/log/directory with a directory on the master node where you have generated and stored summary data using a summary writer.

    Amazon EMR 5.19.0 and later
    python3 -m tensorboard.main --logdir=/home/hadoop/tensor --bind_all
    Amazon EMR 5.18.1 and earlier
    python3 -m tensorboard.main --logdir=/my/log/dir

    By default, the master node hosts TensorBoard using port 6006 and the master public DNS name. After you start TensorBoard, the command line output presents the URL that can be used to connect to TensorBoard, as shown in the following example:

    TensorBoard 2.16.1 at http://master-public-dns-name:6006 (Press CTRL+C to quit)
  3. Set up access to web interfaces on the master node from trusted clients. For more information, see View web interfaces hosted on Amazon EMR clusters in the Amazon EMR Management Guide.

  4. Open TensorBoard at http://master-public-dns-name:6006.