Quick Start TensorFlow Application

출처

Katacoda - Deploying Kubeflow

Example TensorFlow Application

코드 출처(https://github.com/tensorflow/k8s/tree/master/examples/tf_sample)

for job_name in cluster_spec.keys():
  for i in range(len(cluster_spec[job_name])):
    d = "/job:{0}/task:{1}".format(job_name, i)
    with tf.device(d):
      a = tf.constant(range(width * height), shape=[height, width])
      b = tf.constant(range(width * height), shape=[height, width])
      c = tf.multiply(a, b)
      results.append(c)

TFJob Definition

example.yaml 명세를 정의한다.

apiVersion: "kubeflow.org/v1alpha2"
kind: "TFJob"
metadata:
	name: "example-job"
spec:
	tfReplicaSpecs:
		Master:
			replicas: 1
			restartPolicy: Never
			template:
				spec:
					containers:
						- name: tensorflow
						  image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
		Worker:
			replicas: 1
			restartPolicy: Never
			template:
				spec:
					containers:
						- name: tensorflow
						  image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
		PS:
			replicas: 2
			restartPolicy: Never
			template:
				spec:
					containers:
						- name: tensorflow
						  image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff

Deploying TFJob

$ kubectl apply -f example.yaml

Job 결과 보기

$ kubectl get tfjob
$ kubectl get pods | grep Completed

다음 명령어를 통해 로그를 확인하면 tfjob이 실행된 과정을 살펴볼 수 있다.

kubectl logs $(kubectl get pods | grep Completed | tr -s ' ' | cut -d ' ' -f 1)