Support running Heron on Amazon Ecs.#1837
Conversation
| #heron.statemgr.connection.string: "xx.xx.xx.xx:2181" | ||
|
|
||
|
|
||
| # path of the root address to store the state in a local file system |
There was a problem hiding this comment.
Is this a local file system or it expects zookeeper?
There was a problem hiding this comment.
It does expect zookeeper to be running on one of the ec2 instances. As suggested I will put up a document explaining the offline set ups on Amazon ec2 and the following steps on the scheduler.
|
|
||
| heron.statemgr.zookeeper.retry.interval.ms: 10000 | ||
|
|
||
| #################################################################### |
There was a problem hiding this comment.
Do we require tunneling? If not, please remove the tunnel configuration.
There was a problem hiding this comment.
Tunneling could be needed for any of the schedulers and they're valid configs so I think it's ok to keep them in the config file. Maybe we set it to False by default though.
| # guaranteed to be set in all Docker executor environments (outside of Marathon) | ||
| if is_docker_environment(): | ||
| self.master_host = os.environ.get('HOST') if 'HOST' in os.environ else socket.gethostname() | ||
| # Need to set the HOST environment vairable if docker is for AWS ECS tasks |
There was a problem hiding this comment.
Instead of adding code to this function, can we make a overall function called get_host - in which we can check whether it is running in docker or ECS or use socket.gethostname() and return the hostname.
There was a problem hiding this comment.
+1 for ways to do this without one-off logic to support ECS.
| public static final String ECS_CLUSTER_BINARY = "heron-examples.jar"; | ||
| public static final String COMPOSE_CMD = "ecs-cli compose --project-name "; | ||
| public static final String UP = " up"; | ||
|
|
There was a problem hiding this comment.
There is a lot of hard coding of values here. Not sure why you need them?
There was a problem hiding this comment.
+1 for removing as much of this hard-coding as possible.
|
@ananthgs - can you write a good concise PR description about how it works with ECS? |
|
@ananthgs - you might want to add a detailed documentation of how to set it up and use, as described in other schedulers for heron.io. However, this can be done in a different PR. |
billonahill
left a comment
There was a problem hiding this comment.
Thanks for the submission! Please add unit tests and accept the CLA:
https://twitter.github.io/heron/docs/contributors/community/
|
|
||
| heron.statemgr.zookeeper.retry.interval.ms: 10000 | ||
|
|
||
| #################################################################### |
There was a problem hiding this comment.
Tunneling could be needed for any of the schedulers and they're valid configs so I think it's ok to keep them in the config file. Maybe we set it to False by default though.
| # guaranteed to be set in all Docker executor environments (outside of Marathon) | ||
| if is_docker_environment(): | ||
| self.master_host = os.environ.get('HOST') if 'HOST' in os.environ else socket.gethostname() | ||
| # Need to set the HOST environment vairable if docker is for AWS ECS tasks |
There was a problem hiding this comment.
+1 for ways to do this without one-off logic to support ECS.
| public static final String ECS_CLUSTER_BINARY = "heron-examples.jar"; | ||
| public static final String COMPOSE_CMD = "ecs-cli compose --project-name "; | ||
| public static final String UP = " up"; | ||
|
|
There was a problem hiding this comment.
+1 for removing as much of this hard-coding as possible.
| + " herondata:\n" | ||
| + " driver: local"; | ||
| public static final String DESTINATION_JVM = "/usr/lib/jvm/java-8-oracle"; | ||
| public static final String ECS_CLUSTER_BINARY = "heron-examples.jar"; |
There was a problem hiding this comment.
Why does this code require a dep on heron-examples.jar?
| + "volumes:\n" | ||
| + " herondata:\n" | ||
| + " driver: local"; | ||
| public static final String DESTINATION_JVM = "/usr/lib/jvm/java-8-oracle"; |
There was a problem hiding this comment.
Why does jvm location need to be hardcoded?
There was a problem hiding this comment.
If your Docker image contains a pre-installed version of the Java 8 JRE, you should be able to reference $JAVA_HOME here. You should also be able to set that from within the scheduler configuration as well, via the heron.directory.sandbox.java.home option
| } | ||
|
|
||
| public String replacePortNumbers(int container, String content) { | ||
| int basePortnumber = 5000; |
There was a problem hiding this comment.
The 5000 here and the logic in this method looks duplicative of other logic in this class. Can you consolidate?
| } | ||
|
|
||
| public String getDockerFileContent(String execCommand, int container) { | ||
| String commandBuiler = EcsContext.PART1 + EcsContext.CMD; |
There was a problem hiding this comment.
Favor StringBuilder over string concat. Or better yet, use String.format(..)
| public String replacePortNumbers(int container, String content) { | ||
| int basePortnumber = 5000; | ||
| String localContent = new String(content); | ||
| for (int i = 0; i < SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR; i++) { |
There was a problem hiding this comment.
please clarify in the javadocs what this method does. Also I suspect StringBuilder might be a better fit here.
| } | ||
| builder.append(string); | ||
| } | ||
| String stringToReturn = builder.toString(); |
There was a problem hiding this comment.
return builder.toString();
| return null; | ||
| } | ||
|
|
||
| public boolean onKill(Scheduler.KillTopologyRequest request) { |
There was a problem hiding this comment.
These methods all need to be implemented and annotated with @OverRide.
There was a problem hiding this comment.
+1, please implement those methods in the following commits
| return os.path.isfile('/.dockerenv') | ||
|
|
||
| def isEcsAmiInstance(): | ||
| meta = 'http://169.254.169.254/latest/meta-data/ami-id' |
There was a problem hiding this comment.
please remove hard coded url addresses
pass it from out side if necessary
| + "services:\n" | ||
| + " container_number:\n" | ||
| + " image: ananthgs/onlyheronandubuntu\n"; | ||
| public static final String CMD = " command: [\"sh\", \"-c\", \"mkdir /s3; cd /s3 ;" |
There was a problem hiding this comment.
@ananthgs - Is this image a generic ubuntu image downloaded from docker hub? or specially created image? If this is specially created image - what does it contain other than the basic OS?
| public static final String ECSNETWORK = " networks:\n" | ||
| + " - heron\n" | ||
| + " ports:\n" | ||
| + " - \"5000:5000\"\n" |
There was a problem hiding this comment.
Why these ports are hard coded? Can we ask ECS to provide a dynamic port?
| self.master_host = subprocess.Popen(["curl", | ||
| "http://169.254.169.254/latest/meta-data/local-ipv4"] | ||
| , stdout=subprocess.PIPE).communicate()[0] | ||
| os.environ['HOST'] = self.master_host |
There was a problem hiding this comment.
No need for this line since you won't be grabbing os.environ['HOST'] after this
| + " image: ananthgs/onlyheronandubuntu\n"; | ||
| public static final String CMD = " command: [\"sh\", \"-c\", \"mkdir /s3; cd /s3 ;" | ||
| + "aws s3 cp s3://herondockercal/TOPOLOGY_NAME/topology.tar.gz /s3 ;" | ||
| + "aws s3 cp s3://herondockercal/heron-core-testbuild-ubuntu14.04.tar.gz /s3 ;cd /s3;" |
There was a problem hiding this comment.
That paths to the core binaries will need to be more dynamic than this and will need to feed out of the heron.package.core.uri configuration option in the scheduler library right?
| String content = null; | ||
| try { | ||
| tempDockerFile = File.createTempFile("docker", ".yml"); | ||
| content = getDockerFileContent(finalExecCommand, container); |
There was a problem hiding this comment.
Can you please explain what the process is here for creating this docker.yml file just so I understand what's going on? What is it used for and what is it composed of -- just so I can understand a little better.
There was a problem hiding this comment.
The AWS ECS task can be triggered via a docker style compose command. This Task is then run on the AWS Cluster in a Container instance. The overall approach is explained here :
https://docs.google.com/document/d/1ecbCuA46cIKPfY0SP0F1dcRlei4DIPz3pZ6ZSZ5zZgc/edit?usp=sharing.
Please feel free to comment on the approach.
| return os.path.isfile('/.dockerenv') | ||
|
|
||
| def isEcsAmiInstance(): | ||
| meta = 'http://169.254.169.254/latest/meta-data/ami-id' |
There was a problem hiding this comment.
please remove hard coded url addresses
pass it from out side if necessary
| # Need to set the HOST environment vairable if docker is for AWS ECS tasks | ||
| if isEcsAmiInstance(): | ||
| self.master_host = subprocess.Popen(["curl", | ||
| "http://169.254.169.254/latest/meta-data/local-ipv4"] |
| return null; | ||
| } | ||
|
|
||
| public boolean onKill(Scheduler.KillTopologyRequest request) { |
There was a problem hiding this comment.
+1, please implement those methods in the following commits
| } | ||
|
|
||
| public List<String> getJobLinks() { | ||
| return null; |
There was a problem hiding this comment.
an empty ArrayList is better than a null
There was a problem hiding this comment.
Added the get joblink feature to return the ecs tasks
| private String[] getExecutorCommand(int container) { | ||
| List<Integer> freePorts = new ArrayList<>(SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR); | ||
| for (int i = 0; i < SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR; i++) { | ||
| //freePorts.add(SysUtils.getFreePort()); |
There was a problem hiding this comment.
any specific reason not using this getFreePort() method but hard code the port number?
| LOG.info("Starting to deploy topology: " + EcsContext.topologyName(config)); | ||
| LOG.info("Starting executor for TMaster"); | ||
| startExecutor(0); | ||
| // for each container, run its own executor |
There was a problem hiding this comment.
log the event of starting normal container here
|
@ajorgensen you guys also run on EC2, correct? If so could you review this to see if your approach and this approach could coexist in this codebase? |
modified: ecs-heron-role.json
modified: ecs-heron-policy.json
# Conflicts: # heron/executor/src/python/heron_executor.py
Pre requisite: AWS and ECS-CLI needs to be installed on the submitter machine as per guidelines from Amazon.