[Kubeflow] TFJob Quick Start
by Nathan Kwon
Quick Start TensorFlow Application
출처
Example TensorFlow Application
코드 출처(https://github.com/tensorflow/k8s/tree/master/examples/tf_sample)
for job_name in cluster_spec.keys():
for i in range(len(cluster_spec[job_name])):
d = "/job:{0}/task:{1}".format(job_name, i)
with tf.device(d):
a = tf.constant(range(width * height), shape=[height, width])
b = tf.constant(range(width * height), shape=[height, width])
c = tf.multiply(a, b)
results.append(c)
TFJob Definition
example.yaml 명세를 정의한다.
apiVersion: "kubeflow.org/v1alpha2"
kind: "TFJob"
metadata:
name: "example-job"
spec:
tfReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
Worker:
replicas: 1
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
PS:
replicas: 2
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: gcr.io/tf-on-k8s-dogfood/tf_sample:dc944ff
Deploying TFJob
$ kubectl apply -f example.yaml
Job 결과 보기
$ kubectl get tfjob
$ kubectl get pods | grep Completed
다음 명령어를 통해 로그를 확인하면 tfjob이 실행된 과정을 살펴볼 수 있다.
kubectl logs $(kubectl get pods | grep Completed | tr -s ' ' | cut -d ' ' -f 1)
Subscribe via RSS