Protect Communications Between ML Compute Instances in a Distributed Training Job
By default, Amazon SageMaker AI runs training jobs in an Amazon Virtual Private Cloud (Amazon VPC) to help keep your data secure. You can add another level of security to protect your training containers and data by configuring a private VPC. Distributed ML frameworks and algorithms usually transmit information that is directly related to the model, such as weights, not the training dataset. When performing distributed training, you can further protect data that is transmitted between instances. This can help you to comply with regulatory requirements. To do this, use inter-container traffic encryption.
Note
For use cases in the healthcare sector, the best practice for security is to encrypt communication between the nodes.
Enabling inter-container traffic encryption can increase training time, especially if you are using distributed deep learning algorithms. Enabling inter-container traffic encryption doesn't affect training jobs with a single compute instance. However, for training jobs with several compute instances, the effect on training time depends on the amount of communication between compute instances. For affected algorithms, adding this additional level of security also increases cost. The training time for most SageMaker AI built-in algorithms, such as XGBoost, DeepAR, and linear learner, typically aren't affected.
You can enable inter-container traffic encryption for training jobs or hyperparameter tuning jobs. You can use SageMaker APIs or console to enable inter-container traffic encryption.
For information about running training jobs in a private VPC, see Give SageMaker AI Training Jobs Access to Resources in Your Amazon VPC.
Enable Inter-container Traffic Encryption (API)
Before enabling inter-container traffic encryption on training or hyperparameter tuning jobs with APIs, add inbound and outbound rules to your private VPC's security group.
To enable inter-container traffic encryption (API)
-
Add the following inbound and outbound rules in the security group for your private VPC:
Protocol Port Range Source UDP
500
Self Security Group ID
ESP 50
N/A
Self Security Group ID
-
When you send a request to the
CreateTrainingJob
orCreateHyperParameterTuningJob
API, specifyTrue
for theEnableInterContainerTrafficEncryption
parameter.
Note
For the ESP 50
protocol, the AWS Security Group Console might
display the port range as "All". However, Amazon EC2 ignores the specified port range
because it is not applicable for the ESP 50 IP protocol.
Enable Inter-container Traffic Encryption (Console)
Enable Inter-container Traffic Encryption in a Training Job
To enable inter-container traffic encryption in a training job
Open the Amazon SageMaker AI console at https://console.aws.amazon.com/sagemaker/
. -
In the navigation pane, choose Training, then choose Training jobs.
-
Choose Create training job.
-
Under Network, choose a VPC. You can use the default VPC or one that you have created.
-
Choose Enable inter-container traffic encryption.
After you enable inter-container traffic encryption, finish creating the training job. For more information, see Train a Model.
Enable Inter-container Traffic Encryption in a Hyperparameter Tuning Job
To enable inter-container traffic encryption in a hyperparameter tuning job
Open the Amazon SageMaker AI console at https://console.aws.amazon.com/sagemaker/
. -
In the navigation pane, choose Training, then choose Hyperparameter tuning jobs.
-
Choose Create hyperparameter tuning job.
-
Under Network, choose a VPC. You can use the default VPC or one that you created.
-
Choose Enable inter-container traffic encryption.
After enabling inter-container traffic encryption, finish creating the hyperparameter tuning job. For more information, see Configure and Launch a Hyperparameter Tuning Job.