[Tensorflow 2.0] 모델 저장하고 불러오기

A Ydobon

17 min readSep 15, 2019

텐서플로우가 2.0 버전으로 새로워지면서, 전보다 훨씬 직관적이고 쉬워졌습니다. 이번에는 텐서플로우 2.0을 활용해 모델을 저장하고 다시 불러오는 방법에 대해 알아보겠습니다.

Save and restore models | TensorFlow Core

You saw how to load the weights into a model. Manually saving them is just as simple with the Model.save_weights…

www.tensorflow.org

아래 설명은 Tensorflow 2.0 Tutorial 을 참고하였음을 밝힙니다.

To see the English version of this story, click (here).

I. 들어가며

모델을 저장해야 하는 이유

모델을 저장하고 복구하는 데에는 상당한 노력과 시간이 필요한데요, 수십줄 또는 그 이상의 코드를 작성할 때 왜 이런 수고를 하면서까지 모델을 저장할까요?

이는 모델을 학습시키는 도중에 어떤 문제가 발생하여 작업이 중단되어도, 저장된 모델을 불러와 그 부분부터 시작할 수 있기 때문입니다. 또한 모델을 저장하여 사람들과 코드를 공유할 수 있고, 공유받은 코드를 활용해 모델의 정확도와 효율성을 향상시키며 더 나은 모델을 만들 수도 있습니다.

지금부터 ‘checkpoint’와 ‘callback’을 사용해 모델을 언제든 저장할 수 있는 방법을 알아보겠습니다.

II. 기본 작업

Tensorflow 설치하기

모델을 만들기 전에 텐서플로우를 설치하겠습니다.

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass!pip install -q pyyaml h5py  # Required to save models in HDF5 formatfrom __future__ import absolute_import, division, print_function, unicode_literalsimport osimport tensorflow as tf
from tensorflow import kerasprint(tf.version.VERSION)

**해시태그(#) 안의 코드에서 언급한 바와 같이, 위 코드는 Colab 환경에서 연습하시기 바랍니다** https://colab.research.google.com

상단의 코드를 통해 텐서플로우를 설치하였습니다.

2. 데이터셋(Dataset) 가져오기

이 스토리에서는 MNIST 데이터셋을 예시로 활용하여 저장과 불러오기에 대해 배우겠습니다. 빠른 코드 실행을 위해, 전체 데이터셋 중 처음 1000개의 데이터만 사용하겠습니다.

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()train_labels = train_labels[:1000]
test_labels = test_labels[:1000]train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

상단의 코드를 통해 1000개의 데이터셋을 학습시키고, 테스트하였습니다.

3. 모델 정의하기

Simple Sequential 모델을 만들어 보겠습니다.

# Define a simple sequential model
def create_model():
  model = tf.keras.models.Sequential([
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10, activation='softmax')
  ])model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])return model

‘create_model()’ 함수를 활용해 모델을 간단하게 만들 수 있습니다.

# Create a basic model instance
model = create_model()# Display the model's architecture
model.summary()

‘model = create_model()’ 을 통해 완성된 ‘create_model()’ 함수를 불러와 사용할 수 있습니다.

다음은 ‘model.summary()’ 의 결과값입니다.

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________

**위 결과들을 비롯해 MNIST 데이터셋이나 모델 만들기에 대해 더 알아보고 싶은 분은 이 튜토리얼을 참고하시기 바랍니다** https://www.tensorflow.org/beta/tutorials/keras/basic_classification

III. Checkpoint 저장 및 불러오기

Checkpoint는 학습이 완료된 모델을 불러와 직접 사용할 수 있도록 해줍니다. 또한, 모델 트레이닝 도중에 Checkpoint를 만든 후 해당 부분부터 다시 시작할 수도 있습니다.

먼저 ‘III.’ 에서는 ‘모델 트레이닝 도중에 Checkpoint를 만들어 저장하는 방법’을 알아본 후, ‘IV.’ 에서 ‘학습 완료된 모델을 저장하여 사용하는 방법’에 대해 알아보겠습니다.

모델 트레이닝 중 Weights 저장하기

우선 모델 트레이닝 도중에 weights를 저장해보겠습니다. 여기서 Weights는 state, 즉 상태라고 볼 수 있습니다.

checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                   save_weights_only=True,
                                   verbose=1)# Train the model with the new callback
model.fit(train_images, 
          train_labels,  
          epochs=10,
          validation_data=(test_images,test_labels),
          callbacks=[cp_callback])  # Pass callback to training# This may generate warnings related to saving the state of the optimizer.
# These warnings (and similar warnings throughout this notebook)
# are in place to discourage outdated usage, and can be ignored.

위 코드를 실행시키면 사용자가 지정한 디렉토리에 Tensorflow checkpoint 파일이 저장됩니다. Checkpoint는 각각의 epoch들이 실행될 때마다 업데이트 됩니다.

아래 코드를 실행시키면

!ls {checkpoint_dir}

다음과 같은 결과가 나옵니다.

checkpoint           cp.ckpt.data-00001-of-00002
cp.ckpt.data-00000-of-00002  cp.ckpt.index

이를 통해 저장된 Checkpoint를 확인할 수 있습니다.

2. Weights-only model을 저장하여 학습되지 않은 새 모델에 적용하기

애석하게도, checkpoint는 저장된 한글 파일이나 medium 포스트와는 달리 모델의 모든 내용을 담을 수 없습니다. 비유하자면 checkpoint는 우리가 긴 글을 읽고 요약한 것과 비슷하다고 볼 수 있습니다. 그럼 어떻게 하면 저장된 모델을 활용해 여러분만의 새 모델을 만들 수 있을까요?

원래 모델과 완전히 같은 구조로 저장하면, 다른 예시를 넣어도 weights를 공유할 수 있습니다! 아이폰 충전기로 갤럭시 노트 10을 충전할 수는 없지만, 갤럭시 s9 충전기로 갤럭시 노트10은 충전할 수 있죠? 이처럼 구조가 같다면(충전기처럼) weights를 공유할 수 있습니다.

자, 이제 학습되지 않은 새 모델을 구축하고 테스트셋에서 평가해 봅시다.

# Create a basic model instance
model = create_model()# Evaluate the model
loss, acc = model.evaluate(test_images, test_labels)
print("Untrained model, accuracy: {:5.2f}%".format(100*acc))

아직 학습시키지 않았기 때문에 정확도가 굉장히 낮을 것입니다.

그러면 이번에는 전에 저장해 둔 checkpoint를 불러오고 다시 평가해봅시다.

# Loads the weights
model.load_weights(checkpoint_path)# Re-evaluate the model
loss,acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

전자와 후자의 정확도를 비교해보세요.

IOW, compare it with the results that the following model fitting achieved previously:

model.fit(train_images, 
          train_labels,  
          epochs=10,
          validation_data=(test_images,test_labels),
          callbacks=[cp_callback])

3. Checkpoint Callback 옵션

앞에서 예상하셨겠지만, Checkpoint와 관련해서 몇가지 옵션을 선택할 수 있습니다. 시험공부 하거나 논문을 읽을 때 몇 페이지 내내 계속 같은 개념에 대해서만 나오면 효율을 위해서 “아, 뒤쪽 좀 읽고 좀이따 돌아와서 읽어야지~” 하고 넘어가신 적 있으실텐데요, checkpoint도 이렇게 효율적으로 쓸 수 있도록 이름을 붙이고 매 epoch마다 저장할 수 있습니다.

이제 새 모델을 학습시키고 epoch 5개씩 다른 이름을 지어서 저장해봅시다.

3.1. Checkpoint path와 checkpoint directory 정의하기

# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)#previoulsy, checkpoint_path = "training_1/cp.ckpt"

3.2. Callback 만들어서 모델의 weights 저장하기

# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    period=5)# previously, there was no period defined

3.3 과 3.4번에서는 옵션을 추가해서, 앞에서 했던것처럼 학습시키고 저장시키고, 새 모델을 만들어서 저장된 weights를 불러오겠습니다.

3.3 새 모델 만들고 새 callback으로 학습시키기

# Create a new model instance
model = create_model()

# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))

# Train the model with the new callback
model.fit(train_images, 
          train_labels,
          epochs=50, 
          callbacks=[cp_callback],
          validation_data=(test_images,test_labels),
          verbose=0)

이제 도출된 checkpoints를 살펴보고 가장 마지막 checkpoint를 찾아 ‘latest’라고 이름 붙여보겠습니다.

! ls {checkpoint_dir}latest = tf.train.latest_checkpoint(checkpoint_dir)
latest

위 코드를 실행하면 결과는:

'training_2/cp-0050.ckpt'

3.4. 모델 다시 만들고 latest checkpoint 다시 불러오기

# Create a new model instance
model = create_model()

# Load the previously saved weights
model.load_weights(latest)

# Re-evaluate the model
loss, acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

정확도 결과가 다른가요, 비슷한가요?

4. 직접 weights 저장하기

# Save the weights
model.save_weights('./checkpoints/my_checkpoint')
#previously, model.save_weights(checkpoint_path.format(epoch=0))
# Create a new model instance
model = create_model()

# Restore the weights
model.load_weights('./checkpoints/my_checkpoint')
#previously, model.load_weights(checkpoint_path) # Evaluate the model
loss,acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

IV. 학습된 모델 통째로 저장하기

모델을 통째로 저장하고 싶으면 어떻게 해야할까요? Checkpoint방법으로는 weights만 저장할 수 있지만, 전체 모델을 파일로 저장하면 다시 모델을 정의하지 않아도 원래 모델 그대로 가져올 수 있습니다. 모델을 통째로 저장하면, 마지막에 멈췄던 그 부분에서 항상 시작할 수 있어 굉장히 편리합니다.

HDF5 파일로 모델 저장하기

HDF5 파일로 모델을 저장하는 것은 꽤 간단합니다.

# Create a new model instance
model = create_model()# Train the model
model.fit(train_images, train_labels, epochs=5)# Save the entire model to a HDF5 file
model.save('my_model.h5')

2. 저장된 파일 위에서 모델 만들기

여러분이나 다른 사용자들이 만들어놓은 모델을 다시 불러와서 쓸 때 사용하는 방법입니다.

# Recreate the exact same model, including its weights and the optimizer
new_model = keras.models.load_model('my_model.h5')# Show the model architecture
new_model.summary()

결과는 딱 II. 기본작업의 3. 모델 정의하기와 같습니다;

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_10 (Dense)             (None, 512)               401920    
_________________________________________________________________
dropout_5 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________

우리는 모델 정확도를 다시 평가할 수 있습니다.

loss, acc = new_model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))

V. 마무리하며

이 스토리를 통해 Tensorflow 2.0을 활용하여 저장하는 법에 대해 알아봤는데요, 아무쪼록 텐서플로우 모델을 저장하고 불러오는데 도움이 되었으면 좋겠습니다.

한 치의 실수조차 없는 전문가는 아닌 만큼 부족한 부분이 있을 수도 있지만, 읽어주신 모든 분께 감사하고, 문의사항이나 따뜻한 지적은 언제타 큰 힘이 됩니다.

좋은하루 되세요.