지금까지 공부 했던 지도 학습 문제를 해결하기 위한 모델들은 당연히 레이블 정보가 필요했습니다.

목표로 하는 것은 레이블 정보 없이도 유용한 표현을 학습하는 것인데요. 만약 선형 활성화 함수만 사용하고 비용 함수가 MSE라면 PCA와 동일합니다.

이를 조금 응용한 것을 오토인코더라고 할 수 있습니다. 입력을 차원이 줄어든 압축된 표현으로 나타내는 층(부호화층)과 압축된 표현을 다시 원래의 차원을 가진 최초 입력 데이터로 복원하는 층(복호화층)으로 구성됩니다.

이 복호화층에서 입력을 재구성하는데 유용한 저차원 표현이 학습됩니다.

import numpy as np
import pandas as pd

import torch
from torch import Tensor
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.optim import Optimizer

import torch.nn as nn
import torch.nn.functional as F
from torch.nn.modules.loss import _Loss

import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

층 구현하기

class DenseLayer(nn.Module):
    def __init__(self, input_size: int, neurons: int, 
                 dropout: float = 1.0, activation: nn.Module = None) -> None:

        super().__init__()
        self.linear = nn.Linear(input_size, neurons)
        self.activation = activation
        if dropout < 1.0:
            self.dropout = nn.Dropout(1 - dropout)

    def forward(self, x: Tensor) -> Tensor:
        
        # 모든 파이토치 연산은 nn.Module를 상속하므로 역전파 연산를 자동으로 처리합니다.
        x = self.linear(x) # 가중치를 곱하고 편향을 더함

        if self.activation:
            x = self.activation(x)

        if hasattr(self, 'dropout'):
            x = self.dropout(x)

        return x

파이토치를 이용해 히든층을 구현했습니다. 역시 파이토치를 쓰니 식이 훨씬 간편해졌습니다.

특히 역전파 연산을 자동으로 해주기 때문에 forward 함수만 잘 구현하면 되겠습니다.

이전에 제가 공부했던 방식과 유사하게 구현하기 위해 이런식의 코드를 썼으며, 실제 파이토치 사용시 더 간편하다고 합니다.

다음 글은 파이토치 사용법을 학습해볼까 해요.

class ConvLayer(nn.Module): 
    def __init__(self, in_channels : int, out_channels : int,
               filter_size: int, activation =  None, 
               dropout: float = 1.0, flatten : bool = False) -> None:

        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, filter_size,
                              padding = filter_size // 2)
        self.activation = activation
        self.flatten = flatten

        if dropout < 1.0:
            self.dropout = nn.Dropout(1 - dropout)

    def forward(self, x: Tensor) -> Tensor:

        x = self.conv(x) # 합성곱 연산 수행

        if self.activation: # 활성화 함수 적용
            x = self.activation(x)

        if self.flatten: # 1차원으로 펴주는 경우
            x = x.view(x.shape[0], x.shape[1] * x.shape[2] * x.shape[3])

        if hasattr(self, 'dropout'): # 드롭아웃이 있는 경우
            x = self.dropout(x)


        return x

히든층과 비슷한 구조인 합성곱 층입니다. nn.Conv2d 함수를 실제로 이용하여 사용합니다.

인코더, 디코더 구현하기

class Encoder(nn.Module):
    def __init__(self, hidden_dim: int = 28):
        super(Encoder, self).__init__()
        self.conv1 = ConvLayer(1, 14, 5, activation = nn.Tanh())
        self.conv2 = ConvLayer(14, 7, 5, activation = nn.Tanh(), flatten = True)

        self.dense1 = DenseLayer(7 * 28 * 28, hidden_dim, activation = nn.Tanh())

    def forward(self, x: Tensor) -> Tensor:
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.dense1(x)

        return x

인코더 역할을 하는 인코더 클래스 입니다.

입력을 1채널에서 14채널로 변환하는 합성곱층, 14채널을 다시 7채널(각 체널은 28*28 뉴런)로 변환한 뒤 데이터를 1차원으로 펼칩니다.

그 후 히든층에 값을 넣어 28개의 특성을 최종 배출하게 되며 모든 층에서 하이퍼탄전트 함수를 활성화함수로 사용합니다.

class Decoder(nn.Module):
    def __init__(self, hidden_dim: int = 28):
        super(Decoder, self).__init__()
        self.dense1 = DenseLayer(hidden_dim, 7 * 28 * 28, activation = nn.Tanh())

        self.conv1 = ConvLayer(7, 14, 5, activation = nn.Tanh())
        self.conv2 = ConvLayer(14, 1, 5, activation = nn.Tanh())

    def forward(self, x: Tensor) -> Tensor:
        x = self.dense1(x)

        x = x.view(-1, 7, 28, 28) # -1은 알맞은 값을 계산해서 대입하라
        x = self.conv1(x)
        x = self.conv2(x)

        return x

디코더 역할을 하는 디코더 클래스입니다. 구성을 보시면 알겠지만, 인코더와 반대로 대칭되는 구조입니다.

밀집층에 28개의 특성을 입력받아 7 28 28 개의 특성을 출력합니다. 그 후 7채널과 2차원 구조를 만들어 줍니다.

그 후 2번의 합성곱 층을 거치는데 채널을 14개로 늘려주었다가 1채널로 다시 줄여준 것을 출력합니다.

결국 인코더의 입력값과 디코더의 출력값은 같은 형태를 유지하게 됩니다.

class Autoencoder(nn.Module):
    def __init__(self, hidden_dim: int = 28):
        super(Autoencoder, self).__init__()
        
        self.encoder = Encoder(hidden_dim)

        self.decoder = Decoder(hidden_dim)

    
    def forward(self, x: Tensor) -> Tensor:

        encoding = self.encoder(x)
        x = self.decoder(encoding)

        return x, encoding

앞서 구현한 인코더와 디코더를 같이 실행시키는 클래스를 만들었습니다.

트레이너 구현하기

from typing import Optional, Tuple

def permute_data(X: Tensor, y: Tensor):
    perm = torch.randperm(X.shape[0]) # 데이터 셔플
    return X[perm], y[perm]

class PyTorchTrainer(object):
    def __init__(self, model, optim, criterion):
        self.model = model
        self.optim = optim
        self.loss = criterion
    
    def _generate_batches(self, x: Tensor, y: Tensor, size: int = 32):
        N = x.shape[0]
        
        for ii in range(0, N, size):
            x_batch, y_batch = x[ii:ii+size], y[ii:ii+size]

            yield x_batch, y_batch # 제너레이터 관련

    def fit(self, x_train, y_train, x_test, y_test,
            epochs: int = 100, eval_every: int = 10, batch_size: int = 32):
        
        for e in range(epochs):
            x_train, y_train = permute_data(x_train, y_train)

            # 배치 크기별로 데이터 분리함.
            batch_generator = self._generate_batches(x_train, y_train, batch_size)

            for ii, (x_batch, y_batch) in enumerate(batch_generator):

                self.optim.zero_grad() # 매개변수 초기화
                output = self.model(x_batch)[0] # 배치값 모델에 대입
                loss = self.loss(output, y_batch) # 로스값 출력
                loss.backward() # 역전파 계산 수행.
                self.optim.step() # 매개변수 갱신

            # 한 에포크 끝난 뒤 결과 출력.
            output = self.model(x_test)[0]
            loss = self.loss(output, y_test)
            print(e, loss)

트레이너 또한 파이토치 클래스를 상속받아 직접 구현했습니다. 이전에 트레이너를 밑바닥부터 구현했기 때문에 어렵지는 않았습니다.

permute_data 함수는 데이터 순서를 섞어주는 역할을 하고, _generate_batches 함수는 배치 크기로 데이터를 분리합니다.

이때 파이썬에서 for문 내 yield 은 제너레이터를 사용한다고 하는데 정확히는 모르겠지만 메모리와 속도 차원에서 유용한 방식이다로 이해했습니다.

배치별로 zero_grad 함수를 시작 전에 수행하여 매개변수를 초기화시켜줘야 한다고 합니다.

딥러닝 모델에 값 대입하고, 로스 값 출력하고 역전파 계산을 통해 파라미터 업데이트를 진행하여 더 좋은 모델을 만들어 갑니다.

간단한 실습 해보기

import torchvision
from torchvision.datasets import MNIST
import torchvision.transforms as transforms

img_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1305,), (0.3081,))
])


train_dataset = MNIST(root='../mnist_data/',
                      train=True, 
                      download=True,
                      transform=img_transforms)

test_dataset = MNIST(root='../mnist_data/',
                     train=False, 
                     download=True,
                     transform=img_transforms)

mnist_train = ((train_dataset.data.type(torch.float32).unsqueeze(3).permute(0, 3, 1, 2) / 255.0) - 0.1305) / 0.3081
mnist_test = ((test_dataset.data.type(torch.float32).unsqueeze(3).permute(0, 3, 1, 2) / 255.0) - 0.1305) / 0.3081

X_train = mnist_train
X_test = mnist_test

# 모든 데이터를 -1 ~ 1 사이로 변환
X_train_auto = (X_train - X_train.min()) / (X_train.max() - X_train.min()) * 2 - 1
X_test_auto = (X_test - X_train.min()) / (X_train.max() - X_train.min()) * 2 - 1

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../mnist_data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting ../mnist_data/MNIST/raw/train-images-idx3-ubyte.gz to ../mnist_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../mnist_data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting ../mnist_data/MNIST/raw/train-labels-idx1-ubyte.gz to ../mnist_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../mnist_data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting ../mnist_data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../mnist_data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../mnist_data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting ../mnist_data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../mnist_data/MNIST/raw

실습 데이터로 유명한 MNIST 데이터를 불러와서 전처리를 수행했습니다.

model = Autoencoder(hidden_dim = 28)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum = 0.9)

trainer = PyTorchTrainer(model, optimizer, criterion)

trainer.fit(X_train_auto, X_train_auto, 
            X_test_auto, X_test_auto,
            epochs = 1, batch_size = 60)

0 tensor(0.0679, grad_fn=<MseLossBackward0>)

오토인코더의 여러가지 응용이 있겠지만, 입력값을 그대로 복원하는 방식을 진행하겠습니다.

그렇게 하려면 입력 데이터와 타겟 데이터를 같게 넣으면 되겠죠. 수행 결과 실제 로스값도 상당히 낮은 수치를 보입니다.

실제로 오토인코더는 비지도 학습을 수행한다고도 생각할 수 있고, 원본 데이터를 압축시키는 개념으로도 적용할 수 있겠습니다.

reconstructed_images, image_representations = model(X_test_auto)

def display_image(ax, t: Tensor):
    n = t.detach().numpy()
    ax.imshow(n.reshape(28, 28))

a = np.random.randint(0, 10000)

f, axarr = plt.subplots(1,2)
display_image(axarr[0], X_test[a])
display_image(axarr[1], reconstructed_images[a])

axarr[0].set_title("Originally")
axarr[1].set_title("AutoEncoder")

axarr[0].axis('off')
axarr[1].axis('off')

(-0.5, 27.5, 27.5, -0.5)

원본 그림과 꽤 비슷한 그림이 유지됩니다! 인코더 후 28개의 특징이 중요한 값을 잘 기억을 한 모양이죠.

t-SNE를 이용한 시각화

from sklearn.manifold import TSNE
tsne_result = TSNE(n_components=2, random_state=20190405).fit_transform(image_representations.detach().numpy())

t-SNE 기술을 이용해 2차원으로 차원을 축소해보겠습니다.

더 자세히 얘기하면 오토인코더로 28개의 특징으로 원본이미지를 압축한 뒤 그 결과에 다시 t-SNE를 적용해 2차원으로 특징을 축소합니다.

tsne_df = pd.DataFrame({'tsne_dim_1': tsne_result[:,0],
              'tsne_dim_2': tsne_result[:,1],
              'category': test_dataset.targets})
groups = tsne_df.groupby('category')

# Plot
fig, ax = plt.subplots(figsize=(25,25))

ax.margins(0.05) # 자동 스케일링을 위한 5% 패딩 추가
for name, group in groups:
    ax.scatter(group['tsne_dim_1'], group['tsne_dim_2'], marker='o', label=name)
ax.legend()

<matplotlib.legend.Legend at 0x7f111f422e10>

2차원으로 축소하게 되면 위의 그림과 같이 이미지를 2차원 그래프에 시각화가 가능해집니다.

그림에 있는 색깔은 실제 숫자 값 레이블에 따라 다르게 색칠했습니다. 이 레이블은 오토인코더 모델 학습 시 적용하지 않았었죠.

압축된 2개의 특징으로도 색깔 별로 꽤 잘 구분하는 모습이니 28개 특징으로는 레이블을 더 잘 구분하겠죠.

또 다른 의의는 레이블 없이 학습을 했는데도 레이블을 꽤 잘 구분한다는 점입니다. PCA를 딥러닝 버전으로 한 것 같네요.

느낀점

지나가는 말로 오토인코더를 들어봤는데 직접 학습하니 남들에게 오토인코더가 뭔지 자신있게 말할 정도로는 학습한 것 같습니다.

비지도 학습 분야에서도 딥러닝이 잘 활용되는걸 관찰하니 신기하네요. 아직 맛보기만 했지만.

딥러닝에 대한 이론적인 이해가 꽤 진행된거 같습니다. 이제 그 도구인 파이토치, 텐서플로를 다루는 법을 공부하는게 좋겠군요.