CfnCluster¶
CfnCluster (“cloud formation cluster”) is a framework that deploys and maintains high performance computing clusters on Amazon Web Services (AWS). Developed by AWS, CfnCluster facilitates both quick start proof of concepts (POCs) and production deployments. CfnCluster supports many different types of clustered applications and can easily be extended to support different frameworks. Download CfnCluster today to see how CfnCluster’s command line interface leverages AWS CloudFormation templates and other AWS cloud services.
Getting started with CfnCluster¶
CfnCluster (“cloud formation cluster”) is a framework that deploys and maintains high performance computing clusters on Amazon Web Services (AWS). Developed by AWS, CfnCluster facilitates both quick start proof of concepts (POCs) and production deployments. CfnCluster supports many different types of clustered applications and can easily be extended to support different frameworks. Download CfnCluster today to see how CfnCluster’s command line interface leverages AWS CloudFormation templates and other AWS cloud services.
Installing CfnCluster¶
The current working version is CfnCluster-v0.0.22. The CLI is written in python and uses BOTO for AWS actions. You can install the CLI with the following commands, depending on your OS.
Windows¶
Windows support is experimental!!
Install the following packages:
- Python2.7 - https://www.python.org/download/
- setuptools - https://pypi.python.org/pypi/setuptools#windows-7-or-graphical-install
Once installed, you should update the Environment Variables to have the Python install directory and Python Scripts directory in the PATH, for example: C:\Python27;C:\Python27\Scripts
Now it should be possible to run the following within a command prompt window:
C:\> easy_install CfnCluster
Upgrading¶
To upgrade an older version of CfnCluster, you can use either of the following commands, depening on how it was originally installed:
$ sudo pip install --upgrade cfncluster
or
$ sudo easy_install -U cfncluster
Remember when upgrading to check that the exiting config is compatible with the latest version installed.
Configuring CfnCluster¶
Once installed you will need to setup some initial config. The easiest way to do this is below:
$ cfncluster configure
This configure wizard will prompt you for everything you need to create your cluster. You will first be prompted for your cluster name, which is the logical name of your cluster.
Cluster Name [mycluster]:
Next, you will be prompted for your AWS Access & Secret Keys. Enter the keys for an IAM user with administrative privledges. These can also be read from your environment variaables or the aws CLI config.
AWS Access Key ID []:
AWS Secret Access Key ID []:
Now, you will be presented with a list of valid AWS region identifiers. Choose the region in which you’d like your cluster to run.
Acceptable Values for AWS Region ID:
us-east-1
cn-north-1
ap-northeast-1
eu-west-1
ap-southeast-1
ap-southeast-2
us-west-2
us-gov-west-1
us-west-1
eu-central-1
sa-east-1
AWS Region ID []:
Choose a descriptive name for your VPC. Typically, this will something like “production” or “test”.
VPC Name [myvpc]:
Next, you will need to choose a keypair that already exists in EC2 in order to log into your master instance. If you do not already have a keypair, refer to the EC2 documentation on EC2 Key Pairs - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html
Acceptable Values for Key Name:
keypair1
keypair-test
production-key
Key Name []:
Choose the VPC ID in which you’d like your cluster launched into.
Acceptable Values for VPC ID:
vpc-1kd24879
vpc-blk4982d
VPC ID []:
Finally, choose the subnet in which you’d like your master server to run in.
Acceptable Values for Master Subnet ID:
subnet-9k284a6f
subnet-1k01g357
subnet-b921nv04
Master Subnet ID []:
Next, a simple cluster launches into a VPC and uses an existing subnet which supports public IP’s i.e. the route table for the subnet is 0.0.0.0/0 => igw-xxxxxx. The VPC must have “DNS Resolution = yes” and “DNS Hostnames = yes”. It should also have DHCP options with the correct “domain-name” for the region, as defined in the docs: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_DHCP_Options.html
Once all of those settings contain valid values, you can launch the cluster by repeating the command that was used at the start.
$ cfncluster create mycluster
Once the cluster reaches the “CREATE_COMPLETE” status, you can connect using your normal SSH client/settings. For more details on connecting to EC2 instances, check the EC2 User Guide - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-connect-to-instance-linux.html#using-ssh-client
Working with CfnCluster¶
Network Configurations¶
CfnCluster leverages Amazon Virtual Private Cloud(VPC) for networking. This provides a very flexiable and configurable networking platform to deploy clusters within. CfnCluster support the following high-level configurations:
- Single subnet, master and compute in the same subnet
- Two subnets, master in one subnet and compute new private subnet
- Two subnets, master in one subnet and compute in exisiting private subnet
All of these configurations can operate with or without public IP addressing. It can also be deployed to leverage an HTTP proxy for all AWS requests. The combinations of these configurations result in many different deployment scenario’s, ranging from a single public subnet with all access over the Internet, to fully private via AWS Direct Connect and HTTP proxy for all traffic.
Below are some architeture diagrams for some of those scenarios:

CfnCluster in a single public subnet
The configuration for this architecture, requires the following settings:
note that all values are examples only
[vpc public]
vpc_id = vpc-a1b2c3d4
master_subnet_id = subnet-a1b2c3d4

CfnCluster using two subnets(new private)
The configuration for this architecture, requires the following settings:
note that all values are examples only
[vpc public-private]
vpc_id = vpc-a1b2c3d4
master_subnet_id = subnet-a1b2c3d4
compute_subnet_cidr = 10.0.1.0/24

CfnCluster in a private subnet connected using Direct Connect
The configuration for this architecture, requires the following settings:
note that all values are examples only
[vpc private-dx]
vpc_id = vpc-a1b2c3d4
master_subnet_id = subnet-a1b2c3d4
proxy_server = http://proxy.corp.net:8080
use_public_ips = false
Custom Bootstrap Actions¶
CfnCluster can execute arbritary code either before(pre) or after(post) the main bootstrap action during cluster creation. This code is typically stored in S3 and accessed via HTTP(S) during cluster creation. The code will be executed as root and can be in any script language supppoted by the cluster OS, typically bash or python.
pre-install actions are called before any cluster deployment bootstrap such as configuring NAT, EBS and the scheduler. Typical pre-install actions may include modifying storage, adding extra users or packages.
post-install actions are called after cluster bootstrap is complete, as the last action before an instance is considered complete. Typical post-install actions may include changing scheduler settings, modofying storage or packages.
Arguments can be passed to scripts by specifying them in the config. These will be passed double-quoted to the pre/post-install actions.
If a pre/post-install actions fails, then the instance bootstrap will be considered failed and it will not continue. Success is signalled with an exit code of 0, any other exit code will be considered a fail.
Configuration¶
The following config settings are used to define pre/post-install actions and arguments. All options are optional and are not required for basic cluster install.
# URL to a preinstall script. This is executed before any of the boot_as_* scripts are run
# (defaults to NONE for the default template)
pre_install = NONE
# Arguments to be passed to preinstall script
# (defaults to NONE for the default template)
pre_install_args = NONE
# URL to a postinstall script. This is executed after any of the boot_as_* scripts are run
# (defaults to NONE for the default template)
post_install = NONE
# Arguments to be passed to postinstall script
# (defaults to NONE for the default template)
post_install_args = NONE
Example¶
The following are some steps to create a simple post install script that installs the R packages in a cluster.
- Create an script. For the R example, see below
#!/bin/bash
yum -y install --enablerepo=epel R
- Upload the script with the correct permissions to S3
aws s3 cp --acl public-read /path/to/myscript.sh s3://<bucket-name>/myscript.sh
- Update CfnCluster config to include the new post install action
[cluster default]
...
post_install = https://<bucket-name>.s3.amazonaws.com/myscript.sh
- Lauch a cluster
cfncluster create mycluster
Working with S3¶
Accessing S3 within CfnCluster can be controlled through two parameters in the CfnCluster config.
# Specify S3 resource which cfncluster nodes will be granted read-only access
# (defaults to NONE for the default template)
s3_read_resource = NONE
# Specify S3 resource which cfncluster nodes will be granted read-write access
# (defaults to NONE for the default template)
s3_read_write_resource = NONE
Both parameters except either *
or a valid S3 ARN. For details of how to specify S3 ARNs, please see http://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#arn-syntax-s3
Configuration¶
cfncluster uses the file ~/.cfncluster/config
by default for all configuration parameters.
You can see an example configuration file site-packages/cfncluster/examples/config
Layout¶
Configuration is defined in multiple sections. Required sections are “global”, “aws”, one “cluster”, and one “subnet”.
A section starts with the section name in brackets, followed by parameters and configuration.
[global]
cluster_template = default
update_check = true
sanity_check = true
Configuration Options¶
global¶
Global configuration options related to cfncluster.
[global]
cluster_template¶
The name of the cluster section used for the cluster.
See the Cluster Definition.
cluster_template = default
sanity_check¶
Attempts to validate that resources defined in parameters actually exist.
sanity_check = true
aws¶
This is the AWS credentials section (required). These settings apply to all clusters.
If not defined, boto will attempt to use a) enviornment or b) EC2 IAM role.
[aws]
aws_access_key_id = #your_aws_access_key_id
aws_secret_access_key = #your_secret_access_key
# Defaults to us-east-1 if not defined in enviornment or below
aws_region_name = #region
cluster¶
You can define one or more clusters for different types of jobs or workloads.
Each cluster has it’s own configuration based on your needs.
The format is [cluster <clustername>].
[cluster default]
template_url¶
Overrides the path to the cloudformation template used to create the cluster
Defaults to https://s3.amazonaws.com/cfncluster-<aws_region_name>/templates/cfncluster-<version>.cfn.json
.
template_url = https://s3.amazonaws.com/cfncluster-us-east-1/templates/cfncluster.cfn.json
compute_instance_type¶
The EC2 instance type used for the cluster compute nodes.
Defaults to t2.micro for default template.
compute_instance_type = t2.micro
master_instance_type¶
The EC2 instance type use for the master node.
This defaults to t2.micro for default template.
master_instance_type = t2.micro
initial_queue_size¶
The inital number of EC2 instances to launch as compute nodes in the cluster.
The default is 2 for default template.
initial_queue_size = 2
max_queue_size¶
The maximum number of EC2 instances that can be launched in the cluster.
This defaults to 10 for the default template.
max_queue_size = 10
maintain_initial_size¶
Boolean flag to set autoscaling group to maintain initial size.
If set to true, the Auto Scaling group will never have fewer members than the value of initial_queue_size. It will still allow the cluster to scale up to the value of max_queue_size.
Setting to false allows the Auto Scaling group to scale down to 0 members, so resources will not sit idle when they aren’t needed.
Defaults to false for the default template.
maintain_initial_size = false
scheduler¶
Scheduler to be used with the cluster. Valid options are sge, openlava, or torque.
Defaults to sge for the default template.
scheduler = sge
cluster_type¶
Type of cluster to launch i.e. ondemand or spot
Defaults to ondemand for the default template.
cluster_type = ondemand
spot_price¶
If cluster_type is set to spot, the maximum spot price for the ComputeFleet.
spot_price = 0.00
s3_read_resource¶
Specify S3 resource which cfncluster nodes will be granted read-only access
See working with S3 for details on format.
Defaults to NONE for the default template.
s3_read_resource = NONE
s3_read_write_resource¶
Specify S3 resource which cfncluster nodes will be granted read-write access
See working with S3 for details on format.
Defaults to NONE for the default template.
s3_read_write_resource = NONE
pre_install¶
URL to a preinstall script. This is executed before any of the boot_as_* scripts are run
Can be specified in “http://hostname/path/to/script.sh” or “s3://bucketname/path/to/script.sh” format.
Defaults to NONE for the default template.
pre_install = NONE
pre_install_args¶
Quoted list of arguments to be passed to preinstall script
Defaults to NONE for the default template.
pre_install_args = NONE
post_install¶
URL to a postinstall script. This is executed after any of the boot_as_* scripts are run
Can be specified in “http://hostname/path/to/script.sh” or “s3://bucketname/path/to/script.sh” format.
Defaults to NONE for the default template.
post_install = NONE
post_install_args¶
Arguments to be passed to postinstall script
Defaults to NONE for the default template.
post_install_args = NONE
proxy_server¶
HTTP(S) proxy server, typically http://x.x.x.x:8080
Defaults to NONE for the default template.
proxy_server = NONE
placement_group¶
Cluster placement group. This placement group must already exist.
Defaults to NONE for the default template.
placement_group = NONE
placement¶
Cluster placment logic. This enables the whole cluster or only compute to use the placement group.
Defaults to cluster in the default template.
placement = cluster
ephemeral_dir¶
If instance store volumes exist, this is the path/mountpoint they will be mounted on.
Defaults to /scratch in the default template.
ephemeral_dir = /scratch
encrypted_ephemeral¶
Encrypted ephemeral drives. In-memory keys, non-recoverable.
Defaults to false in default template.
encrypted_ephemeral = false
master_root_volume_size¶
MasterServer root volume size in GB. (AMI must support growroot)
Defaults to 10 in default template.
master_root_volume_size = 10
compute_root_volume_size¶
ComputeFleet root volume size in GB. (AMI must support growroot)
Defaults to 10 in default template.
compute_root_volume_size = 10
cwl_log_group¶
CloudWatch Logs Log Group name
Defaults to NONE in the default template.
cwl_log_group = NONE
ebs_settings¶
Settings section relating to EBS volume mounted on the master.
See EBS Section.
ebs_settings = custom
scaling_settings¶
Settings section relation to scaling
See Scaling Section.
scaling_settings = custom
vpc¶
VPC Configuration Settings:
[vpc public]
vpc_id = vpc-xxxxxx
master_subnet_id = subnet-xxxxxx
master_subnet_id¶
ID of an existing subnet you want to provision the Master server into.
master_subnet_id = subnet-xxxxxx
ssh_from¶
CIDR formatted IP range in which to allow SSH access from.
This is only used when cfncluster creates the security group.
Defaults to 0.0.0.0/0 in the default template.
ssh_from = 0.0.0.0/0
additional_sg¶
Additional VPC security group Id for all instances.
Defaults to NONE in the default template.
additional_sg = sg-xxxxxx
master_subnet_id¶
ID of an existing subnet you want to provision the compute nodes into.
master_subnet_id = subnet-xxxxxx
compute_subnet_cidr¶
If you wish for cfncluster to create a compute subnet, this is the CIDR that.
compute_subnet_cidr = 10.0.100.0/24
use_public_ips¶
Define whether or not to assign public IP addresses to EC2 instances.
Set to false if operating in a private VPC.
Defaults to true.
use_public_ips = true
ebs¶
EBS Volume configuration settings for the volume mounted on the master node and shared via NFS to compute nodes.
[ebs custom]
ebs_snapshot_id = snap-xxxxx
volume_type = io1
volume_iops = 200
ebs_snapshot_id¶
Id of EBS snapshot if using snapshot as source for volume.
Defaults to NONE for default template.
ebs_snapshot_id = snap-xxxxx
volume_type¶
The API name for the type of volume you wish to launch.
Defaults to gp2 for default template.
volume_type = io1
volume_size¶
Size of volume to be created (if not using a snapshot).
Defaults to 20GB for default template.
volume_size = 20
encrypted¶
Whether or not the volume should be encrypted (should not be used with snapshots).
Defaults to false for default template.
encrypted = false
scaling¶
Settings which define how the compute nodes scale.
[scaling custom]
scaling_period = 60
scaling_cooldown = 120
scaling_threshold¶
Threshold for triggering CloudWatch ScaleUp action.
Defaults to 4 for default template.
scaling_threshold = 4
scaling_adjustment¶
Number of instances to add when called CloudWatch ScaleUp action.
Defaults to 2 for default template.
scaling_adjustment = 2
scaling_threshold2¶
Threshold for triggering CloudWatch ScaleUp2 action.
Defaults to 200 for default template.
scaling_threshold2 = 200
scaling_adjustment2¶
Number of instances to add when called CloudWatch ScaleUp2 action
Defaults to 20 for default template.
scaling_adjustment2 = 20
scaling_period¶
Period to measure ScalingThreshold.
Defaults to 60 for default template.
scaling_period = 60
scaling_evaluation_periods¶
Number of periods to measure ScalingThreshold.
Defaults to 2 for default template.
scaling_evaluation_periods = 2
scaling_cooldown¶
Amount of time in seconds to wait before attempting further scaling actions.
Defaults to 120 for the default template.
scaling_cooldown = 120
How CfnCluster Works¶
CfnCluster was built not only as a way to manage clusters, but as a reference on how to use AWS services to build your HPC environment
CfnCluster Processes¶
There are a number of processes running within CfnCluster which are used to manage it’s behavior.
General Overview¶
A cluster’s lifecycle begins after it is created by a user. Typically, this is done from the Command Line Interface (CLI). Once created, a cluster will exist until it’s deleted.

publish_pending_jobs¶
Once a cluster is running, a cronjob owned by the root user will monitor the configured scheduler (SGE, Torque, Openlava, etc) and publish the number of pending jobs to CloudWatch. This is the metric utilized by Auto Scaling to add more nodes to the cluster.

Auto Scaling¶
Auto Scaling, along with Cloudwatch alarms are used to manage the number of running nodes in the cluster.

The number of instances added, along with the thresholds in which to add them are all configurable via the Scaling configuration section.
sqswatcher¶
The sqswatcher process monitors for SQS messages emitted by Auto Scaling which notifies of state changes within the cluster. When an instance comes online, it will submit an “instance ready” message to SQS, which is picked up by sqs_watcher running on the master server. These messages are used to notify the queue manager when new instances come online or are terminated, so they can be added or removed from the queue accordingly.

nodewatcher¶
The nodewatcher process runs on each node in the compute fleet. This process is used to determine when an instance is terminated. Because EC2 is billed by the instance hour, this process will wait until an instance has been running for 95% of an instance hour before it is terminated.

AWS Services used in CfnCluster¶
The following Amazon Web Services (AWS) services are used in CfnCluster.
- AWS CloudFormation
- AWS Identity and Access Management (IAM)
- Amazon SNS
- Amazon SQS
- Amazon EC2
- Auto Scaling
- Amazon EBS
- Amazon Cloud Watch
- Amazon S3
- Amazon DynamoDB
AWS CloudFormation¶
AWS CloudFormation is the core service used by CfnCluster. Each cluster is representated as a stack. All resources required by the cluster are defined within the CfnCluster CloudFormation template. CfnCluster cli commands typically map to CloudFormation stack commands, such as create, update and delete. Instances launched within a cluster make HTTPS calls to the CloudFormation Endpoint for the region the cluster is launched in.
For more details about AWS CloudFormation, see http://aws.amazon.com/cloudformation/
AWS Identity and Access Management (IAM)¶
IAM is used within CfnCluster to provide an Amazon EC2 IAM Role for the instances. This role is a least privilged role specifically created for each cluster. CfnCluster instances are given access only to the specific API calls that are required to deploy and manage the cluster.
For more details about AWS Identity and Access Management, see http://aws.amazon.com/iam/
Amazon SNS¶
Amazon Simple Notification Service is used to receive notifications from Auto Scaling. These events are called life cycle events, and are generated when an instance lauches or terminates in an Autoscaling Grpoup. Within CfnCluster, the Amazon SNS topic for the Autoscaling Group is subnscibred to an Amazon SQS queue.
For more details about Amazon SNS, see http://aws.amazon.com/sns/
Amazon SQS¶
Amazon Simple Queuing Service is used to hold notifications(messages) from Auto Scaling, sent through Amazon SNS and notifications from the ComputeFleet instanes. This decouples the sending of notifications from the receiving and allows the Master to handle them through polling. The MasterServer runs Amazon SQSwatcher and polls the queue. AutoScaling and the ComputeFleet instanes post messages to the queue.
For more details about Amazon SQS, see http://aws.amazon.com/sqs/
Amazon EC2¶
Amazon EC2 provides the compute for CfnCluster. The MasterServer and ComputeFleet are EC2 instances. Any instance type that support HVM can be selected. The MasterServer and ComputeFleet can be different instance types and the ComputeFleet can also be laucnhed as Spot instances. Ephmeral storage found on the instances is mounted as a RAID0 volume.
For more details about Amazon EC2, see http://aws.amazon.com/ec2/
Auto Scaling¶
Auto Scaling is used to manage the ComputeFleet instances. These are managed as an AutoScaling Group and can either be elastic driven by workload or static driven by the config.
For more details about Auto Scaling, see http://aws.amazon.com/autoscaling/
Amazon EBS¶
Amazon EBS provides the storage for the shared volume. Any EBS settings can be passed through the config. EBS volumes can either be initialized empty or from an exisiting EBS snapshot.
For more details about Amazon EBS, see http://aws.amazon.com/ebs/
Amazon Cloud Watch¶
Amazon Cloud Watch provides metric collection and alarms for CfnCluster. The MasterServer publishs pending tasks(jobs) for each cluster. Two alarms are defined that based on parameters defined in the config will automatically increase the size of the ComputeFleet Auto Scaling group.
For more details, see http://aws.amazon.com/cloudwatch/
Amazon S3¶
Amazon S3 is used to store the CfnCluster templates. Each region has a bucket with all templates. Within CfnCluster, access to S3 can be controlled to allow CLI/SDK tools to use S3.
For more details, see http://aws.amazon.com/s3/
Amazon DynamoDB¶
Amazon DynamoDB is used to store minimal state of the cluster. The MasterServer tracks provisioned instances in a DynamoDB table.
For more details, see http://aws.amazon.com/dynamodb/
CfnCluster auto-scaling¶
Clusters deployed with CfnCluster are elastic in several ways. The first is by simply setting the initial_queue_size and max_queue_size parameters of a cluster settings. The initial_queue_size sets minimum size value of the ComputeFleet Auto Scaling Group(ASG) and also the desired capacity value . The max_queue_size sets maximum size value of the ComputeFleet ASG. As part of the CfnCluster, two Amazon CloudWatch alarms are created. These alarms monitor a custom Amazon CloudWatch metric[1] that is published by the MasterServer of each cluster, this is the second elastic nature of CfnCluster. This metric is called pending and is created per Stack and unique to each cluster. These Amazon CloudWatch alarms call ScaleUp policies associated with the ComputeFleet ASG. This is what handles the automatic addition of compute nodes when there is pending tasks in the cluster. It is actually capable to scaling the cluster with zero compute nodes until the alarms no longer trigger or the max_queue_size is reached.
Within AutoScaling, there is typically a Amazon CloudWatch alarm to remove instances when no longer needed. This alarm would operate on a aggregate metric such as CPU or network. When the aggregate metric fell below a certain level, it would make a call to a ScaleDown policy. The decision to remove which instance is complex[2] and is not aware of individual instance utilization. For that reason, each one of the instances in the ComputeFleet ASG run a process called nodewatcher[3]. The purpose of this process is to monitor the instance and if idle AND close to the end of the current hour, remove it from the ComputeFleet ASG. It specifically calls the TerminateInstanceInAutoScalingGroup[4] API call, which will remove an instance as long as the size of the ASG is larger than the desired capacity. That is what handles the scale down of the cluster, without affecting any running jobs and also enables an elastic cluster with a fixed base number of instances.
The value of the auto scaling is the same for HPC as with any other workloads, the only difference here is CfnCluster has code to specifically make it interact in a more intelligent manner. If a static cluster is required, this can be achieved by setting initial_queue_size and max_queue_size parameters to the size of cluster required and also setting the maintain_initial_size parameter to true. This will cause the ComputeFleet ASG to have the same value for minimum, maximum and desired capacity.
References¶
- http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html
- http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingBehavior.InstanceTermination.html
- https://github.com/awslabs/cfncluster/tree/master/node/src/nodewatcher
- http://docs.aws.amazon.com/AutoScaling/latest/APIReference/API_TerminateInstanceInAutoScalingGroup.html
Tutorials¶
Here you can find tutorials for best practices guides for getting started with CfnCluster.
Running your first job on cfncluster¶
This tutorial will walk you through running your first “Hello World” job on cfncluster.
If you haven’t yet, you will need to following the getting started guide to install cfncluster and configure your CLI.
Verifying your installation¶
First, we’ll verify that cfncluster is correctly installed and configured.
$ cfncluster version
This should return the running version of cfncluster. If it gives you a message about configuration, you will need to run the following to configure cfncluster.
$ cfncluster configure
Creating your First Cluster¶
Now it’s time to create our first cluster. Because our workload isn’t performance intensive, we will use the default instance sizes of t2.micro. For production workloads, you’ll want to choose an instance size which better fits your needs.
We’re going to call our cluster “hello-world”.
$ cfncluster create hello-world
You’ll see some messages on your screen about the cluster creating. When it’s finished, it will provide the following output:
Starting: hello-world
Status: cfncluster-hello-world - CREATE_COMPLETE
Output:"MasterPrivateIP"="192.168.x.x"
Output:"MasterPublicIP"="54.148.x.x"
Output:"GangliaPrivateURL"="http://192.168.x.x/ganglia/"
Output:"GangliaPublicURL"="http://54.148.x.x/ganglia/"
The message “CREATE_COMPLETE” shows that the cluster created sucessfully. It also provided us with the public and private IP addresses of our master node. We’ll need this IP to log in.
Logging into your Master instance¶
You’ll use your OpenSSH pem file and the ec2-user to log into your master instance.
ssh -i /path/to/keyfile.pem ec2-user@54.148.x.x
Once logged in, run the command “qhost” to ensure that your compute nodes are setup and configured.
[ec2-user@ip-192-168-1-86 ~]$ qhost
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS
----------------------------------------------------------------------------------------------
global - - - - - - - - - -
ip-192-168-1-125 lx-amd64 2 1 2 2 0.15 3.7G 130.8M 1024.0M 0.0
ip-192-168-1-126 lx-amd64 2 1 2 2 0.15 3.7G 130.8M 1024.0M 0.0
As you can see, we have two compute nodes in our cluster, both with 2 threads available to them.
Running your first job¶
Now we’ll create a simple job which sleeps for a little while and then outputs it’s own hostname.
Create a file called “hellojob.sh” with the following contents.
#!/bin/bash
sleep 30
echo "Hello World from $(hostname)"
Next, submit the job using “qsub” and ensure it runs.
$ qsub hellojob.sh
Your job 1 ("hellojob.sh") has been submitted
Now, you can vew your queue and check the status of the job.
$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1 0.55500 hellojob.s ec2-user r 03/24/2015 22:23:48 all.q@ip-192-168-1-125.us-west 1
The job is currently in a running state. Wait 30 seconds for the job to finish and run qsub again.
$ qstat
$
Now that there are no jobs in the queue, we can check for output in our current directory.
$ ls -l
total 8
-rw-rw-r-- 1 ec2-user ec2-user 48 Mar 24 22:34 hellojob.sh
-rw-r--r-- 1 ec2-user ec2-user 0 Mar 24 22:34 hellojob.sh.e1
-rw-r--r-- 1 ec2-user ec2-user 34 Mar 24 22:34 hellojob.sh.o1
Here, we see our job script, an “e1” and “o1” file. Since the e1 file is empty, there was no output to stderr. If we view the .o1 file, we can see any output from our job.
$ cat hellojob.sh.o1
Hello World from ip-192-168-1-125
We can see that our job sucessfully ran on instance “ip-192-168-1-125”.
Building a custom CfnCluster AMI¶
Warning
Building a custom AMI is not the recomended approach for customizing CfnCluster.
Once you build your own AMI, you will no longer receive updates or bug fixes with future releases of CfnCluster. You will need to repeat the steps used to create your custom AMI with each new CfnCluster release.
Before reading any further, take a look at the Custom Bootstrap Actions section of the documentation to determine if the modifications you wish to make can be scripted and supported with future CfnCluster releases
While not ideal, there are a number of scenarios where building a custom AMI for CfnCluster is necessary. This tutorial will guide you through the process.
How to customize the CfnCluster AMI¶
The base CfnCluster AMI is often updated with new releases. This AMI has all of the components required for CfnCluster to function installed and configured. If you wish to customize an AMI for CfnCluster, you must start with this as the base.
Find the AMI which corresponds with the region you will be utilizing in the list here: https://github.com/awslabs/cfncluster/blob/master/amis.txt.
Within the EC2 Console, choose “Launch Instance”.
Navigate to “Community AMIs”, and enter the AMI id for your region into the search box.
Select the AMI, choose your instance type and properties, and launch your instance.
Log into your instance using the ec2-user and your SSH key.
Customize your instance as required
Run the following command to prepare your instance for AMI creation:
sudo /usr/local/sbin/ami_cleanup.shStop the instance
Create a new AMI from the instance
Enter the AMI id in the custom_ami field within your cluster configuration.
Getting Started¶
If you’ve never used CfnCluster
before, you should read the Getting Started with cfncluster guide to get familiar with cfncluster
& its usage.
Additional Docs¶
Additional Resources¶
- CfnCluster Source Repository
- CfnCluster Issue Tracker
- CfnCluster Webcast - HPC Scalability in the Cloud