2019年7月30日 星期二

DCGANs_Paper(翻譯)

DCGANs_Paper(翻譯)

paper hyperlink

說明

區塊如下分類,原文區塊為藍底,翻譯區塊為綠底

原文

翻譯

任何的翻譯不通暢部份都請留言指導

ABSTRACT

In recent years, supervised learning with convolutional networks (CNNs) has
seen huge adoption in computer vision applications. Comparatively, unsupervised
learning with CNNs has received less attention. In this work we hope to help
bridge the gap between the success of CNNs for supervised learning and unsupervised learning.

We introduce a class of CNNs called deep convolutional generative
adversarial networks (DCGANs), that have certain architectural constraints, and
demonstrate that they are a strong candidate for unsupervised learning.

Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator.

Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

近幾年,使用卷積網路(CNNs)的監督式學習在影像辨識應用上已被廣泛採用,相比之下,使用CNNs的非監督式學習受到的觀注較少,在這個研究項目中,我們希望可以縮短CNNs在監督式學習與非監督式學習在成功之間的距離。

我們介紹一個CNNs類別,稱為深度卷積對抗式生成網路(Deep Convolutional Generative Adversarial Networks)DCGANs,它具有某些架構約束,並說明它們是非監督式學習的強力候選人。

在各種資料集上的訓練,我們展示了令人信服的證據,證明在Generator和Discriminator中,我們的深卷積對抗學習了從對象部份到場景的層次表示。

另外,我們將學習到的特徵應用於新任務上,這說明著它們作為一般圖像表示的適用性。

1 INTRODUCTION

Learning reusable feature representations from large unlabeled datasets has been an area of active research. In the context of computer vision, one can leverage the practically unlimited amount of unlabeled images and videos to learn good intermediate representations, which can then be used on a variety of supervised learning tasks such as image classification.

從大量未標記的資料中學習可重覆應用的特徵表示一直是一個積極研究的領域,在影像辨識領域中,人們可以利用幾乎無限量的未標記圖像與影片來學習良好中間表示,這可以被應用於各種不同的監督式學習任務,如影像辨識。

We propose that one way to build good image representations is by training Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), and later reusing parts of the generator and discriminator networks as feature extractors for supervised tasks. GANs provide an attractive alternative to maximum likelihood techniques.

我們提出一個方法來建立好的影像表示,就是通過訓練對抗式生成網路(GANs)(Ian Goodfellow et al.,2014),然後將Generator和Discriminator網路的一部分重新用於監督任務的特徵提取器,GANs替最大似然概率提供了一個吸引人的替代方案

One can additionally argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attractive to representation learning. GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs.

人們可以爭辯,他們的學習過程與缺乏啟發性的成本函數(如像素獨立均方誤差)對表示學習(representation learning)是很有吸引力的。GANs是出了名的訓練不穩定,時常導致Generator產生無意義的輸出。

There has been very limited published research in trying to understand and visualize what GANs learn, and the intermediate representations of multi-layer GANs.

在試著理解與可視化GANs學到什麼以及多層GANs的中間表示方面,已發表的研究非常有限。

In this paper, we make the following contributions

  • We propose and evaluate a set of constraints on the architectural topology of Convolutional GANs that make them stable to train in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN)
  • We use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms.
  • We visualize the filters learnt by GANs and empirically show that specific filters have learned to draw specific objects.
  • We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples.

在這篇論文中,我們做出以下貢獻:

  • 我們提出並評估一組在卷積GANs上的架構拓樸約束,這些約束使得它們可以在大多數設置情況下穩定訓練。我們將這個架構命名為深度卷積對抗式生成網路(Deep Convolutional GANs)(DCGAN)
  • 我們使用訓練過的Discriminators在影像辨識的任務上,說明與其它非監督式學習的效能差別
  • 我們可視化透過GANs訓練過的filter,並憑經驗表明特定的filter已學會繪製特定的對象
  • 我們說明Generator擁有著有趣的向量演算法特性,便於操作生成的樣本的許多語義特性

2.1 REPRESENTATION LEARNING FROM UNLABELED DATA

Unsupervised representation learning is a fairly well studied problem in general computer vision research, as well as in the context of images. A classic approach to unsupervised representation learning is to do clustering on the data (for example using K-means), and leverage the clusters for improved classification scores.

非監督式表示學習在一般電腦視覺研究以及影像領域中是一個相當好研究的問題。一種經典非監督式表示學習的方法就是對資料做聚類(如K-means),並利用聚類來提高分類得分。

In the context of images, one can do hierarchical clustering of image patches (Coates & Ng, 2012) to learn powerful image representations. Another popular method is to train auto-encoders (convolutionally, stacked (Vincent et al., 2010), separating the what and where components of the code (Zhao et al., 2015), ladder structures (Rasmus et al., 2015)) that encode an image into a compact code, and decode the code to reconstruct the image as accurately as possible.These methods have also been shown to learn good feature representations from image pixels.Deep belief networks (Lee et al., 2009) have also been shown to work well in learning hierarchical representations.

在影像領域中,人們可以對圖像修補程式進行分層聚類(Coates & Ng,2012)以學習強大的圖像表示。另一種受歡迎的作法,就是訓練auto-encoder(convolutionally, stacked (Vincent et al., 2010), separating the what and where components of the code (Zhao et al., 2015), ladder structures (Rasmus et al., 2015)),將影像編碼為一個緊湊的代碼,並將該代碼解碼,盡可能的接近原始圖像。這種方法也被證明可以從影像像素中學習到好的表徵表示。深度信念網路(Lee et al., 2009)也被證明可以在學習結構表示上可以有好的表示

2.2 GENERATING NATURAL IMAGES

Generative image models are well studied and fall into two categories: parametric and nonparametric. The non-parametric models often do matching from a database of existing images, often matching patches of images, and have been used in texture synthesis (Efros et al., 1999), super-resolution (Freeman et al., 2002) and in-painting (Hays & Efros, 2007). Parametric models for generating images has been explored extensively (for example on MNIST digits or for texture synthesis (Portilla & Simoncelli, 2000)).

生成影像模型經過充份的研究並分成兩個類別:參數與非參數。非參數模型通常從既有已存在的影像中進行匹配,通常匹配影像的區塊,已被應用於紋理合成(Efros et al., 1999),超分辨率(Freeman et al., 2002)與繪圖(Hays & Efros, 2007)。應用於生成圖像的參數模型已被廣泛探討(for example on MNIST digits or for texture synthesis (Portilla & Simoncelli, 2000))。

However, generating natural images of the real world have had not much success until recently. A variational sampling approach to generating images (Kingma & Welling, 2013) has had some success, but the samples often suffer from being blurry. Another approach generates images using an iterative forward diffusion process (Sohl-Dickstein et al., 2015). Generative Adversarial Networks (Goodfellow et al., 2014) generated images suffering from being noisy and incomprehensible. A laplacian pyramid extension to this approach (Denton et al., 2015) showed higher quality images, but they still suffered from the objects looking wobbly because of noise introduced in chaining multiple models. A recurrent network approach (Gregor et al., 2015) and a deconvolution network approach (Dosovitskiy et al., 2014) have also recently had some success with generating natural images. However, they have not leveraged the generators for supervised tasks.

然而,時至近日,生成真實世界的自然影像依然沒有太大的成功。一種變異性抽樣生成影像(Kingma & Welling, 2013)取得一些進展,但它的採樣通常是模糊不清。另一種實現影像生成的方法就是使用迭代前饋擴散過程(Sohl-Dickstein et al., 2015)。生成對抗式網路(Goodfellow et al., 2014)生成的影像受到嘈雜並難以理解。這種方法的拉普拉斯金字塔延伸(Denton et al., 2015)呈現了更高品質的照片,但因為在連結多個模型中引入的噪音,它們仍然受到物體看起來搖擺不定的影響。遞迴網路(Gregor et al., 2015)實現與反卷積(Dosovitskiy et al., 2014)實現生成自然影像也在近期取得初步成果。然而他們並沒有將Generator應用在監督式任務上。

2.3 VISUALIZING THE INTERNALS OF CNNS

One constant criticism of using neural networks has been that they are black-box methods, with little understanding of what the networks do in the form of a simple human-consumable algorithm. In the context of CNNs, Zeiler et. al. (Zeiler & Fergus, 2014) showed that by using deconvolutions and filtering the maximal activations, one can find the approximate purpose of each convolution filter in the network. Similarly, using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters (Mordvintsev et al.).

人們對神經網路一直以來的批評就是它們是一個黑盒子,對神經網路在simple human-consumable的作用知之甚少。在CNNs領域中,Zeiler et. al. (Zeiler & Fergus, 2014)表明,透過反卷積與濾波器最大化啟動,人們可以大致瞭解每一個卷積濾波器在神經網路上的作用。同樣的,對輸入使用梯度下降讓我們檢查啟動某些濾波器子集的理想影像(Mordvintsev et al.)

3 APPROACH AND MODEL ARCHITECTURE

Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. This motivated the authors of LAPGAN (Denton et al., 2015) to develop an alternative approach to iteratively upscale low resolution generated images which can be modeled more reliably. We also encountered difficulties attempting to scale GANs using CNN architectures commonly used in the supervised literature.

歷史上,嚐試使用CNNs去構造GANs的作法並不是很成功,這促使LAPGAN的作者(Denton et al., 2015)開發出能在迭代方法下提高解析度層次生成影象的替代方案。我們在嚐試利用CNN結構來架構GANs在監督式應用上也遇到困難。

However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models. Core to our approach is adopting and modifying three recently demonstrated changes to CNN architectures.

然後,在經過一番模型摸索之後,我們定義了一系列的架構,這些架構可以在一系列的資料集中穩定訓練,並且允許訓練高解析度而且更深的生成模型。實現的核心方法就是採用並調整三個最近說明CNN架構的三種變化。

The first is the all convolutional net (Springenberg et al., 2014) which replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial downsampling. We use this approach in our generator, allowing it to learn its own spatial upsampling, and discriminator.

首先是all convolutional net(全卷積層)(Springenberg et al., 2014),它以步幅(stride)卷積來替代確定性的分離池化函數(such as maxpooling),允許網路自己學習空間的下採樣(downsampling)。我們使用這種方式來實現Generator,允許網路自己學習上採樣(upsampling)與Discriminator。

Second is the trend towards eliminating fully connected layers on top of convolutional features. The strongest example of this is global average pooling which has been utilized in state of the art image classification models (Mordvintsev et al.). We found global average pooling increased model stability but hurt convergence speed. A middle ground of directly connecting the highest convolutional features to the input and output respectively of the generator and discriminator worked well.

其次是在卷積特徵最上層消除完全連接層(fully connected layers)的趨勢。最強而有力的範例就是使用全域平均池化(global average pooling),那已被應用於最先進的影像辨識模型。我們發現global average pooling增加了模型的穩定性,但犧牲了收斂的速度。折衷的作法是直接將最高層的卷積特徵與輸入連接起來,而Generator與Discriminator各自輸出,效果不錯。

The first layer of the GAN, which takes a uniform noise distribution Z as input, could be called fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example model architecture.

GAN的第一層會以平均分佈的噪點Z做為輸入,這可以稱為fully connected,它只是矩陣乘法,但結果要重新調整維度至4-dimensional tensor(四維張量NHWC),並被做為卷積堆疊的起始。對Discriminator而言,最後一層卷積層平展(flatten)之後送入Sigmoid function輸出。(See Fig. 1 for a visualization of an example model architecture. )

Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient flow in deeper models.

第三是Batch Normalization(BN Layer)(Ioffe & Szegedy, 2015),它通過將每個神經單元的輸入標準化為零均值和單位方差來穩定學習。這有助於解決因為初始化不良而導致的訓練問題,並有助於較深模型中的梯度流動。

This proved critical to get deep generators to begin learning, preventing the generator from collapsing all samples to a single point which is a common failure mode observed in GANs. Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability.

事實證明這對deep generator開始訓練是非常關鍵的,預防Generator將所有的抽樣往一個點傾斜,這是GANs上常見的失敗。然而,直接對所有的網路層做BN將導致抽樣震盪以及模型的不穩定。

個人理解上面提到的是,Generator學習過於集中在某一個Ganssian Distribution上,這會造成生成的照片多數相似。

This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer. The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013).

要避免在Generator的輸出層以及Discriminator的輸入層實作BN。Generator的輸出層以外使用relu,而輸出層則使用tanh做為啟動函數。我們觀察到,使用有界的啟動函數(bounded activation)有助於模型迅速在訓練分佈中覆蓋顏色空間。在Discriminator中,我們發現leaky rectified activation(Maas et al., 2013) (Xu et al., 2015)效果很好,特別是針對更高解析度的建模。這是跟原始GAN論文的比較,而原始GAN中使用了maxout activation(Goodfellow et al., 2013)

Architecture guidelines for stable Deep Convolutional GANs
• Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
• Use batchnorm in both the generator and the discriminator.
• Remove fully connected hidden layers for deeper architectures.
• Use ReLU activation in generator for all layers except for the output, which uses Tanh.
• Use LeakyReLU activation in the discriminator for all layers.

穩定的DCGAN架構指南:

  • 以步幅卷積(strided convolutions)取代所有的池化層(discriminator),並使用微步卷積(fractional stride convolution)(generator)
    • 可參考知乎討論
    • strided convolutions:較大步幅的卷積
    • fractional stride convolution:轉置卷積
      • keras.layers.onv2DTranspose
  • 在Generator與Discriminator使用BatchNorm
  • 移除掉深度結構中的屬fully connected的隱藏層
  • Generator使用relu,除了最後的輸出層使用tanh
  • Discriminator都使用LeakyRelu

4 DETAILS OF ADVERSARIAL TRAINING

We trained DCGANs on three datasets, Large-scale Scene Understanding (LSUN) (Yu et al., 2015), Imagenet-1k and a newly assembled Faces dataset. Details on the usage of each of these datasets are given below

我們在三個資料集上訓練DCGANs,Large-scale Scene Understanding(大規模場景理解)(LSUN),Imagenet-1k以及新創的臉部資料集。下面給定每個資料的使用細節。

No pre-processing was applied to training images besides scaling to the range of the tanh activation function [-1, 1]. All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128. All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02. In the LeakyReLU, the slope of the leak was set to 0.2 in all models. While previous GAN work has used momentum to accelerate training, we used the Adam optimizer (Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead. Additionally, we found leaving the momentum term β1 at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training.

除了縮放資料至tanh activation function[-1, 1]之外,沒有對訓練照片做預處理。所有的模型皆以小批量隨機梯度下降(SGD)訓練,batch-size設置為128。所有的權重皆以零為中心點的正態分佈初始化,標準差設置為0.02。所有模型中的LeakyRelu皆設置slope of the leak為0.2。雖然之前的GAN已經使用momentum來加速訓練,但我們使用Adam最佳化(Kingma & Ba, 2014)和調整參數。我們發現,Adam建議的學習效率0.001太高,因此使用0.0002。另外,我們發現
保持Adam動量項目β1在建議值0.9的話會導致訓練振盪和不穩定,調降為0.5的時候有助於穩定訓練。

Figure 1: DCGAN generator used for LSUN scene modeling. A 100 dimensional uniform distribution Z is projected to a small spatial extent convolutional representation with many feature maps.

A series of four fractionally-strided convolutions (in some recent papers, these are wrongly called deconvolutions) then convert this high level representation into a 64 × 64 pixel image. Notably, no fully connected or pooling layers are used.

Figure 1: DCGAN生成器使用使用LSUN場景資料建模。將100維均勻分佈Z投影到具有許多特徵圖的小空間範圍卷積表示。

連續的4個fractionally-strided convolutions(微步卷積)(最近的論文中,這被錯誤的解讀為反卷積),將這高階表示轉換成64x64-Pixel的影像。值得注意的是,沒有使用任何的fully connectedpooling layers

4.1 LSUN

As visual quality of samples from generative image models has improved, concerns of over-fitting and memorization of training samples have risen. To demonstrate how our model scales with more data and higher resolution generation, we train a model on the LSUN bedrooms dataset containing a little over 3 million training examples.

Recent analysis has shown that there is a direct link between how fast models learn and their generalization performance (Hardt et al., 2015). We show samples from one epoch of training (Fig.2), mimicking online learning, in addition to samples after convergence (Fig.3), as an opportunity to demonstrate that our model is not producing high quality samples via simply overfitting/memorizing training examples. No data augmentation was applied to the images.

隨著從Generator生成的照片品質不斷提高,對訓練樣本的過擬合與模型的記憶效果的擔憂也逐漸提升。為了說明我們的模型如何隨著更多資料與更高分辨率的生成而擴展,我們在LSUN超過300萬訓練樣本的臥室資料集上訓練模型。

最近的分析表明,模型學習速度和泛化效能之間存在著直接關係(Hardt et al., 2015)。我們展示訓練一次迭代的樣本(Fig.2),模仿線上學習,加上收斂之後的採樣(Fig.3),這是一個證明我們的模型不是通過簡單過度擬合/記憶訓練來產出高質量樣本的機會。沒有對影像實作資料增強。

4.1.1 DEDUPLICATION

To further decrease the likelihood of the generator memorizing input examples (Fig.2) we perform a simple image de-duplication process. We fit a 3072-128-3072 de-noising dropout regularized RELU autoencoder on 32x32 downsampled center-crops of training examples. The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving technique (Srivastava et al., 2014) and provides a convenient form of semantic-hashing, allowing for linear time de-duplication. Visual inspection of hash collisions showed high precision with an estimated false positive rate of less than 1 in 100. Additionally, the technique detected and removed approximately 275,000 near duplicates, suggesting a high recall.

為了進一步降低Generator記憶輸入樣本的可能性(Fig.2),我們執行了一個簡單的去除重覆影像的程序。我們在32x32下採樣中心區域訓練資料上訓練了一個3072-128-3072 de-noising dropout regularized RELU的自動編碼器(autoencoder)。然後通過Relu做閥值處理,將得到的編碼二值化,這已經被證明是一種有效的資訊保存技術(Srivastava et al., 2014),並提供了一種方便的語義雜湊形式,允許線性時間重復資料消除。目視檢查雜湊碰撞顯示高精度,fp(false postive假陽)小於1100。此外,該技術檢測並移除大約275,000個接近重覆項目,這意味著高召回率(recall)

4.2 FACES

We scraped images containing human faces from random web image queries of peoples names. The people names were acquired from dbpedia, with a criterion that they were born in the modern era. This dataset has 3M images from 10K people. We run an OpenCV face detector on these images, keeping the detections that are sufficiently high resolution, which gives us approximately 350,000 face boxes. We use these face boxes for training. No data augmentation was applied to the images

我們用人名隨機從網頁搜集包含人臉的照片。姓名是從dbpedia取得,有一個標準是,他們是出生於現在的現代人。這資料集有3,000,000張照片,來自10,000人員。我們在這些照片上用Opencv做人臉偵測,保持高解析度的檢測,這給了我們將近350,000臉部框。我們使用這些臉部框做訓練,沒有執行任何的資料增強在這些照片上。

Figure 2: Generated bedrooms after one training pass through the dataset. Theoretically, the model could learn to memorize training examples, but this is experimentally unlikely as we train with a small learning rate and minibatch SGD. We are aware of no prior empirical evidence demonstrating memorization with SGD and a small learning rate.

Figure 2: 透過資料集訓練一次迭代生成的臥室照片。理論上,模型可以學習到記憶訓練資料,但是這個實驗不大可能,因為我們用較小的學習效率以及小批量SGD來訓練模型。

Figure 3: Generated bedrooms after five epochs of training. There appears to be evidence of visual under-fitting via repeated noise textures across multiple samples such as the base boards of some of the beds.

Figure 3: 訓練迭代五次之後生成的臥室照片。似乎有證據顯示,在多個樣品(像是某些床上的基板)上通過重覆的噪點紋理而視覺欠擬合。

4.3 IMAGENET-1K

We use Imagenet-1k (Deng et al., 2009) as a source of natural images for unsupervised training. We train on 32 × 32 min-resized center crops. No data augmentation was applied to the images.

我們使用Imagenet-1k(Deng et al., 2009)做為未監督式訓練的自然影像來源。我們將影像重新調整至32x32並取中間區域。沒有在影像上執行資料增強。

5 EMPIRICAL VALIDATION OF DCGANS CAPABILITIES

5.1 CLASSIFYING CIFAR-10 USING GANS AS A FEATURE EXTRACTOR

One common technique for evaluating the quality of unsupervised representation learning algorithms is to apply them as a feature extractor on supervised datasets and evaluate the performance of linear models fitted on top of these features.

評估非監督式學習演算法的常見作法就是將它們做為特徵提取器應用在監督式資料集上,並評估擬合在這些特徵上的線性模型的效能。

On the CIFAR-10 dataset, a very strong baseline performance has been demonstrated from a well tuned single layer feature extraction pipeline utilizing K-means as a feature learning algorithm. When using a very large amount of feature maps (4800) this technique achieves 80.6% accuracy. An unsupervised multi-layered extension of the base algorithm reaches 82.0% accuracy (Coates & Ng, 2011).

CIFAR-10資料集上,利用K-means作為特徵學習算法通過調校好的單層特徵提取管道,已經證明了非常強的基線性能。當使用非常大量的特徵圖(feature maps)(4800)的時候,這個技術達到準確率80.6%。基本演算法的非監督式多層隱藏層可以達到82%的準確率。(Coates & Ng, 2011)

To evaluate the quality of the representations learned by DCGANs for supervised tasks, we train on Imagenet-1k and then use the discriminator’s convolutional features from all layers, maxpooling each layers representation to produce a 4 × 4 spatial grid. These features are then flattened and concatenated to form a 28672 dimensional vector and a regularized linear L2-SVM classifier is trained on top of them.

為了評估DCGAN對監督式的表示學習的品質,我們在Imagenet-1K訓練,並使用來自Discriminator的所有層的卷積特徵,最大池化每一層的表示,以產生4x4的空間網格。這些特徵接著被平展串接為28672維度向量,然後以正規化的L2-SVM線性分類器訓練。

This achieves 82.8% accuracy, out performing all K-means based approaches. Notably, the discriminator has many less feature maps (512 in the highest layer) compared to K-means based techniques, but does result in a larger total feature vector size due to the many layers of 4 × 4 spatial locations. The performance of DCGANs is still less than that of Exemplar CNNs (Dosovitskiy et al., 2015), a technique which trains normal discriminative CNNs in an unsupervised fashion to differentiate between specifically chosen, aggressively augmented, exemplar samples from the source dataset.

這達到82.8%的準確率,超過所有以K-means實現的方法。值得注意的是,相較K-means為基礎的技術,Discriminator具有更少的特徵圖(最高層為512),但是由於4×4個空間位置的多個層,確實導致更大的總特徵向量大小。DCGAN的效能仍然低於Exemplar CNNs(Dosovitskiy et al., 2015),該技術以非監督方法訓練正常判別的CNNs,以便從來源資料集區分特別選擇、積極增強的範例樣本。

Further improvements could be made by finetuning the discriminator’s representations, but we leave this for future work. Additionally, since our DCGAN was never trained on CIFAR-10 this experiment also demonstrates the domain robustness of the learned features.

可以透過調校Discriminator的表示來進一步提高效能,但我們保留到未來再作。此外,我們的DCGAN從未在CIFAR-10資料集上訓練,這個實驗也說明學習特徵的領域魯棒性。

Table 1: CIFAR-10 classification results using our pre-trained model. Our DCGAN is not pretrained on CIFAR-10, but on Imagenet-1k, and the features are used to classify CIFAR-10 images.

(上圖為Table 1)
Table 1: CIFAR-10使用我們的預訓練模型分類結果。我們的DCGAN並沒有在CIFAR-10資料集上訓練過,只在Imagenet-1k,用Imagenet-1k訓練出來的特徵來分類CIFAR-10的照片

5.2 CLASSIFYING SVHN DIGITS USING GANS AS A FEATURE EXTRACTOR

On the StreetView House Numbers dataset (SVHN)(Netzer et al., 2011), we use the features of the discriminator of a DCGAN for supervised purposes when labeled data is scarce. Following similar dataset preparation rules as in the CIFAR-10 experiments, we split off a validation set of 10,000 examples from the non-extra set and use it for all hyperparameter and model selection.

StreetView House Numbers dataset (SVHN)(Netzer et al., 2011)資料集上,當標記資料不足情況下,我們使用DCGAN的Discriminator的特徵,應用於監督式任務上。依循類別於CIFAR-10實驗的資料準備規則,我們從資料集(非額外)取出10,000樣本,用它們來做超參數及模型的選擇。

1000 uniformly class distributed training examples are randomly selected and used to train a regularized linear L2-SVM classifier on top of the same feature extraction pipeline used for CIFAR-10. This achieves state of the art (for classification using 1000 labels) at 22.48% test error, improving upon another modifcation of CNNs designed to leverage unlabled data (Zhao et al., 2015).

Additionally, we validate that the CNN architecture used in DCGAN is not the key contributing factor of the model’s performance by training a purely supervised CNN with the same architecture on the same data and optimizing this model via random search over 64 hyperparameter trials (Bergstra & Bengio, 2012). It achieves a signficantly higher 28.87% validation error.

從類別中均勻選擇1000個隨機樣本,並應用於訓練使用與CIFAR-10所使用的相同特徵提取管理的一個正規化的線性L2-SVM分類器。錯誤率22.48%,這到達當今世上最先進(1000個類別的分類任務),利用未標記資料改進了另一種CNNs設計 (Zhao et al., 2015)

此外,我們透過在相同資料集上訓練一個相同架構的純監督式CNN,並通過隨機搜尋超過64個超參數試驗優化模型,來驗證DCGAN使用的CNN架構並不是模型效能的關鍵因素(Bergstra & Bengio, 2012)。它實現了明顯更高的28.87%驗證錯誤。

6 INVESTIGATING AND VISUALIZING THE INTERNALS OF THE NETWORKS

We investigate the trained generators and discriminators in a variety of Ways. We do not do any kind of nearest neighbor search on the training set. Nearest neighbors in pixel or feature space are trivially fooled (Theis et al., 2015) by small image transforms. We also do not use log-likelihood metrics to quantitatively assess the model, as it is a poor (Theis et al., 2015) metric.

我們以各種方式調查訓練好的Generator與Discriminator。我們並沒有在訓練資料集上做任何最近鄰搜尋。像素或特徵空間中的最近鄰域被小圖像變換(Theis et al., 2015)瑣碎地愚弄了。我們也沒有使用log-likelihood來量化評估模型,因為它是一個不好的指標(Theis et al., 2015)。

6.1 WALKING IN THE LATENT SPACE

The first experiment we did was to understand the landscape of the latent space. Walking on the manifold that is learnt can usually tell us about signs of memorization (if there are sharp transitions) and about the way in which the space is hierarchically collapsed. If walking in this latent space results in semantic changes to the image generations (such as objects being added and removed), we can reason that the model has learned relevant and interesting representations. The results are shown in Fig.4.

第一個實驗我們去瞭解潛在空間的橫向排法。走在學習的流形上可以告訴我們關於記憶的跡象(如果有突變點的話)以及關於空間階層式分解的方式。如果行走在這些潛在空間會導致影像生成的語意改變(例如物件的新增或移除),我們可以說明模型已經學習到相關而且有趣的表示。結果如Fig.4所示。

landscape: 參考國家教育研究院-資訊與通信術語辭典

6.2 VISUALIZING THE DISCRIMINATOR FEATURES

Previous work has demonstrated that supervised training of CNNs on large image datasets results in very powerful learned features (Zeiler & Fergus, 2014). Additionally, supervised CNNs trained on scene classification learn object detectors (Oquab et al., 2014). We demonstrate that an unsupervised DCGAN trained on a large image dataset can also learn a hierarchy of features that are interesting. Using guided backpropagation as proposed by (Springenberg et al., 2014), we show in Fig.5 that the features learnt by the discriminator activate on typical parts of a bedroom, like beds and windows. For comparison, in the same figure, we give a baseline for randomly initialized features that are not activated on anything that is semantically relevant or interesting.

之前的工作已經證明,將大量資料以監督式學習-CNNs訓練可以產生非常強大的學習功能(Zeiler & Fergus, 2014)。此外,以監督式學習-CNNs訓練場景分類學習物體偵測(Oquab et al., 2014)。我們證明,非監式學習-DCGAN訓練大型資料集也可以學習到有趣的階層特徵。使用(Springenberg et al., 2014)提出的guids backpropagation(簡中譯導向反向傳播),我們在Fig.5說明,Discriminator學習到的特徵對臥室的典型部份有反應,像是床或窗戶。為了進行對比,在相同的圖上,我們給出了隨機初始化特徵的基線,這些特徵在語義相關或有趣的任何事物上都沒有沒反應。

6.3 MANIPULATING THE GENERATOR REPRESENTATION

6.3.1 FORGETTING TO DRAW CERTAIN OBJECTS

In addition to the representations learnt by a discriminator, there is the question of what representations the generator learns. The quality of samples suggest that the generator learns specific object representations for major scene components such as beds, windows, lamps, doors, and miscellaneous furniture. In order to explore the form that these representations take, we conducted an experiment to attempt to remove windows from the generator completely.

除了Discriminator學習到的表示之外,還有問題是,Generator學到什麼表示。採樣的質量說明Generator學習到主要場景物件的特定的物體表示,像是床、窗、燈、門與雜項傢俱。為了探索Generator所採用的表示形式,我們進行了一個實驗,嚐試從Generator完全的移除窗戶。

On 150 samples, 52 window bounding boxes were drawn manually. On the second highest convolution layer features, logistic regression was fit to predict whether a feature activation was on a window (or not), by using the criterion that activations inside the drawn bounding boxes are positives and random samples from the same images are negatives. Using this simple model, all feature maps with weights greater than zero ( 200 in total) were dropped from all spatial locations. Then, random new samples were generated with and without the feature map removal.

在150個樣本上,手動繪製52個邊界框。在第二高層的卷積層特徵,邏輯斯迴歸擬合來預測特徵是否對窗戶有反應(或沒有),通過使用這個準則,啟動函數對邊界框內判定為真,而從相同照片隨機採樣得到為假。使用這個簡單模型,所有特徵圖權重大於零的部份(總計200個)皆從空間位置移除。然後,在有與沒有移除特徵圖情況下隨機生成新樣本。

The generated images with and without the window dropout are shown in Fig.6, and interestingly, the network mostly forgets to draw windows in the bedrooms, replacing them with other objects.

生成的照片,有與沒有丟棄窗戶的部份顯示於Fig.6,有趣的是,神經網路幾乎忘了在臥室繪製窗戶而是以其它物件替代。

Figure 4:
Top rows: Interpolation between a series of 9 random points in Z show that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom. In the 6th row, you see a room without a window slowly transforming into a room with a giant window. In the 10th row, you see what appears to be a TV slowly being transformed into a window.

Figure 4:
最上面一行:在Z中一系列的九個隨機點插值顯示,學習到的空間是平順轉換,每張照片看起來都像是臥室。第六行,你看到房間從沒有窗戶慢慢的轉換為有大窗戶。第十行,你看到電視慢慢的變成窗戶。

6.3.2 VECTOR ARITHMETIC ON FACE SAMPLES

In the context of evaluating learned representations of words (Mikolov et al., 2013) demonstrated that simple arithmetic operations revealed rich linear structure in representation space. One canonical example demonstrated that the vector(”King”) - vector(”Man”) + vector(”Woman”) resulted in a vector whose nearest neighbor was the vector for Queen. We investigated whether similar structure emerges in the Z representation of our generators. We performed similar arithmetic on the Z vectors of sets of exemplar samples for visual concepts. Experiments working on only single samples per concept were unstable, but averaging the Z vector for three examplars showed consistent and stable generations that semantically obeyed the arithmetic. In addition to the object manipulation shown in (Fig. 7), we demonstrate that face pose is also modeled linearly in Z space (Fig. 8).

在評估單詞學習表示背景中(Mikolov et al., 2013)證明了簡單的數學運算在表示空間中顯示豐富的線性結構。一個典型的範例說明了vector('King')-vector('Man')+vector('Woman')得到的是最近鄰居vector('Queen')。我們調查了相似的結構是否出現在我們的Generator-Z表示中。我們對視覺概念的示例樣本集的vector-Z做類似的計算。每個概念只處理單個樣本的實驗是不穩定的,但是三個樣本平均的vector-Z顯示了在語義上遵循該算法的一致且穩定的世代。除了(Fig. 7)中所示的物件操作外,我們還證明了在Z空間中臉部姿態也是線性建模的(Fig. 8)。

These demonstrations suggest interesting applications can be developed using Z representations learned by our models. It has been previously demonstrated that conditional generative models can learn to convincingly model object attributes like scale, rotation, and position (Dosovitskiy et al., 2014). This is to our knowledge the first demonstration of this occurring in purely unsupervised models. Further exploring and developing the above mentioned vector arithmetic could dramatically reduce the amount of data needed for conditional generative modeling of complex image distributions.

這些證明表明了,可以透過我們的模型學習所得的Z表示來開發有趣的應用程式。這已經在先前證明過,條件生成模型可以學習令人信服的建模對象屬性,如尺度、角度、位置(Dosovitskiy et al., 2014)。據我們所知,這是第一次在純非監督式模型中證明這一點。進一步的探索及開發上述的向量運算式可以明顯减少複雜影像分佈條件生成建模所需的資料量。

Figure 5: On the right, guided backpropagation visualizations of maximal axis-aligned responses for the first 6 learned convolutional features from the last convolution layer in the discriminator. Notice a significant minority of features respond to beds - the central object in the LSUN bedrooms dataset. On the left is a random filter baseline. Comparing to the previous responses there is little to no discrimination and random structure.

Figure 5: 右邊,從Discriminator的最後一層卷積層前六個卷積特徵可視化濾波器。注意到,有相當的反應對應到床-LSUN臥室資料集中心物件。左邊是隨機生成濾波器基線。與之前的反應相比,幾乎沒有區別和隨機結構。

Figure 6: Top row: un-modified samples from model. Bottom row: the same samples generated with dropping out ”window” filters. Some windows are removed, others are transformed into objects with similar visual appearance such as doors and mirrors. Although visual quality decreased, overall scene composition stayed similar, suggesting the generator has done a good job disentangling scene representation from object representation. Extended experiments could be done to remove other objects from the image and modify the objects the generator draws.

Figure 6: 上行: 來自模型未經修改的樣本。下行:拿掉『窗戶』濾波器生成的相同樣本。部份窗戶被移除,其它的被轉換為其它相似物件,像是門或鏡子。雖然視覺品質降低,但整體場景組成維持相似,說明Generator在分離場景表示物件方面做得很好。可以進行延申實驗,從影像移除物件和修改Generator繪製的物件。

7 CONCLUSION AND FUTURE WORK

We propose a more stable set of architectures for training generative adversarial networks and we give evidence that adversarial networks learn good representations of images for supervised learning and generative modeling. There are still some forms of model instability remaining - we noticed as models are trained longer they sometimes collapse a subset of filters to a single oscillating mode. Further work is needed to tackle this from of instability. We think that extending this framework to other domains such as video (for frame prediction) and audio (pre-trained features for speech synthesis) should be very interesting. Further investigations into the properties of the learnt latentspace would be interesting as well.

我們提出一套更穩定的架構來訓練對抗式生成網路(GAN),並提出證據證明,對於監督式學習和生成性建模,對抗性網絡可以學習良好的影像表示。仍然存在某些形式的模型不穩定性,我們注意到,隨著模型訓練時間愈長,它們有時會將一個子集的濾波器折疊成一個單一的振盪模式。需要進一步的努力從穩不定中解決這個問題。我們認為,延申這個框架到其它領域,像影片(用於幀預測),音樂(用由語音合成的預訓練特徵)應該非常有趣。進一步的提查學習到的潛在空間也非常有趣。

Figure 7: Vector arithmetic for visual concepts. For each column, the Z vectors of samples are averaged. Arithmetic was then performed on the mean vectors creating a new vector Y . The center sample on the right hand side is produce by feeding Y as input to the generator. To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale ±0.25 was added to Y to produce the 8 other samples. Applying arithmetic in the input space (bottom two examples) results in noisy overlap due to misalignment.

Figure 7: 視覺概念的向量運算。對每一個欄位,vectors-Z採樣是平均的。對平均向量做計算得到新的vector-Y。右手邊的中心樣本是以Y做輸入給Generator生成。這證明了Generator的插值能力,在Y中加入了用±0.25採樣的均勻雜訊,以生成其它8個樣本。在輸入空間執行運算(底下兩個範例),會導致由於未對準而產生雜訊重疊。

Figure 8: A ”turn” vector was created from four averaged samples of faces looking left vs looking right. By adding interpolations along this axis to random samples we were able to reliably transform their pose.

Figure 8: 一個轉換向量從四個臉部看左、看右的平均樣本建立。通過沿著軸向加入插值到隨機樣本上,我們可以可靠的轉換他們的姿勢。

8 SUPPLEMENTARY MATERIAL

8.1 EVALUATING DCGANS CAPABILITY TO CAPTURE DATA DISTRIBUTIONS

We propose to apply standard classification metrics to a conditional version of our model, evaluating the conditional distributions learned. We trained a DCGAN on MNIST (splitting off a 10K validation set) as well as a permutation invariant GAN baseline and evaluated the models using a nearest neighbor classifier comparing real data to a set of generated conditional samples. We found that removing the scale and bias parameters from batchnorm produced better results for both models. We speculate that the noise introduced by batchnorm helps the generative models to better explore and generate from the underlying data distribution. The results are shown in Table 3 which compares our models with other techniques. The DCGAN model achieves the same test error as a nearest neighbor classifier fitted on the training dataset - suggesting the DCGAN model has done a superb job at modeling the conditional distributions of this dataset. At one million samples per class, the DCGAN model outperforms InfiMNIST (Loosli et al., 2007), a hand developed data augmentation pipeline which uses translations and elastic deformations of training examples. The DCGAN is competitive with a probabilistic generative data augmentation technique utilizing learned per class transformations (Hauberg et al., 2015) while being more general as it directly models the data instead of transformations of the data.

我們建議將標準分類指標應用於模型條件版本,評估學習到的條件分佈。我們在MNIST上(拆分10k驗證資料集)訓練DCGAN以及排列不變的GAN基線,並且使用最近鄰分類器比較實際資料和一組生成的條件樣本來評估模型。我們發現從batchnorm中除移比例和偏差參數對兩個模型都能產生更好的結果。我們推測由batchnorm引入的噪點有助於生成模型更好的從底層數據分佈中探索和生成。結果顯示在Table3,將我們的模型與其他技術進行比較。DCGAN模型實現了與擬合於訓練數據集上的最近鄰分類器相同的測試誤差,這表明DCGAN模型在這個資料集的建模條件分佈方面做得非常出色。每個類別一百萬個樣本,DCGAN模型優於InfiMNIST(Loosli et al., 2007),一個手工開發的資料增強管道,它使用訓練樣本的平移和彈性變形。DCGAN相較於利用類別轉換學習的概率生成資料增強技術(Hauberg et al., 2015)更通用性,因為它直接模擬資料而不是資料的轉換。

Figure 9: Side-by-side illustration of (from left-to-right) the MNIST dataset, generations from a baseline GAN, and generations from our DCGAN .

Figure 9: 並排(由左至右)MNIST資料集,由不同GAN生成照片

Figure 10: More face generations from our Face DCGAN.

Figure 10: 由Face DCGAN生成的更多臉部照片

Figure 11: Generations of a DCGAN that was trained on the Imagenet-1k dataset.

Figure 11: 以DCGAN生成照片,以Imagenet-1k資料集訓練

2019年7月23日 星期二

Oracle_ORA-01830:在轉換整個輸入字串前,日期格式圖片就結束了

Oracle_ORA-01830:在轉換整個輸入字串前,日期格式圖片就結束了

tags: oracle ORA-01830

異常訊息

ORA-01830:在轉換整個輸入字串前,日期格式圖片就結束了

說明

利用Model執行欄位相減取日期區間的時候出現該錯誤訊息,欄位資料如下:

索引值相減僅第一個錯誤,因此直覺想法就是將文字轉換日期格式失敗。

to_date(column, 'yyyy-mm-dd')調整為to_date(substr(column, 1, 10), 'yyyy-mm-dd')之後正常。

除此之外,也可以調整後面的日期格式,如下:

select to_date('2019/7/23 11:40','yyyy-mm-dd hh24:mi') from dual

2019年7月17日 星期三

SSRS_Switch_踩雷

SSRS_Switch_踩雷

tags: SSRS Switch Reporting Service

今天處理SSRS報表的時候一直撞牆,一個很簡單的Switch判斷函式也無法成功處理,欄位應該是nothing,但Switch判斷到最後就是會出現#錯誤,這種異常訊息通常出現在『除零』的情況,但我一直覺得第一個Switch條件就應該已經將欄位判斷為nothing,但似乎不是我想的那樣。

範例資料集如下:

select 1 a, 1 b from dual
union all
select 2, null from dual
union all
select null, 3 from dual

報表很簡單,如下圖:

=Switch(
 IsNothing(Fields!A.Value), nothing,
 IsNothing(Fields!B.Value), nothing,
 True, (Fields!A.Value - Fields!B.Value) / Fields!A.Value
)

可以看的到條件判斷就是如果A、B任一欄位為nothing,則該欄為nothing,否則就(A-B)/A

但實際執行的時候卻發現,C欄出現#錯誤的訊息,如下圖:

這很怪,詢問老司機之後得知,用Switch,最後還是要預防一下除零的狀況,因此調整如下:

=Switch(
 IsNothing(Fields!A.Value), nothing,
 IsNothing(Fields!B.Value), nothing,
 True, (Fields!A.Value - Fields!B.Value) / IIF(Fields!A.Value is nothing, 1, Fields!A.Value)
)

最後我加了一IIF來判斷,如果是nothing就賦值1,再來一次,神奇的是,就正常了,如下圖:

好吧,也許SSRSSwitch不一樣。

2019年7月16日 星期二

DCGAN

DCGAN

DCGAN,即deep convolutional GAN的簡寫

數據集

CIFAR10為32x32x3的照片,並擁有10個輸出類別,每個類別5,000張,共計50,000張照片

範例說明

為了簡化範例,單純的使用一個類別做測試
這邊會著重在實作,不會有過深的理論說明

作業開始

GAN由Generator與Discriminator兩個nn結合而成,迭代過程大致如下說明:
  1. 初始化Generator-v1與Discriminator-v1
  2. 空間中sample出一筆資料,經過Generator-v1生成
  3. Discriminator-v1驗證真假,發現是假
  4. Generator-v1升級為Generator-v2
  5. 成功騙過Discriminator
  6. Discriminator-v1升級為Discriminator-v2
  7. Discriminator-v2驗證真假,發現是假
作業開始之前定義使用的GPU資源
import os
# 限制gpu資源
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
首先我們建置Generator
載入建置Generator的需求套件
import keras
from keras import layers
import numpy as np
這是後面保有照片會用到的套件
from keras.preprocessing import image
定義dimension,CIFAR10的照片維度為32x32x3
Generator會從空間中sample出一個點,那個點是一個Vector,在下面範例中即是laten_dim的設置
# CIFAR10資料維度
height = 32
width = 32
channel =3

# 預計生成資料維度
laten_dim = 32
  • W:輸入維度
  • F:filter size
  • S:stride
  • P:padding size
  • N:輸入維度
layers.Conv2D:卷積,經過卷積之後的維度計算如下:
N=(WF+2P)/S+1
layers.Conv2DTranspose:反卷積,經過反卷積之後的維度計算如下:
W=(N1)S2P+F
generator_input = keras.Input(shape=(laten_dim, ))
x = layers.Dense(128 * 16 * 16)(generator_input)
x = layers.LeakyReLU()(x)
x = layers.Reshape((16, 16, 128))(x)

x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

# 輸出為32x32
x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

# 圖片channel設置為3,即輸出為32x32x3
x = layers.Conv2D(channel, 7, activation='tanh', padding='same')(x)

generator = keras.models.Model(generator_input, x)
利用Model.summary()來確認模型的資料維度變化
generator.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 32768)             1081344   
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 32768)             0         
_________________________________________________________________
reshape_2 (Reshape)          (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 16, 16, 256)       819456    
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 16, 16, 256)       0         
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 32, 32, 256)       1048832   
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU)    (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 32, 32, 256)       1638656   
_________________________________________________________________
leaky_re_lu_9 (LeakyReLU)    (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 32, 32, 256)       1638656   
_________________________________________________________________
leaky_re_lu_10 (LeakyReLU)   (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 32, 32, 3)         37635     
=================================================================
Total params: 6,264,579
Trainable params: 6,264,579
Non-trainable params: 0
_________________________________________________________________
Generator完成之後我們要架構Discriminator,Discriminator的作用就是判斷給定的資料是真或假,因此它的輸入維度即照片維度(範例為32x32x3)
discriminator_input = layers.Input(shape=(height, width, channel))
x = layers.Conv2D(128, 3)(discriminator_input)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
# 判斷真假,因此輸出為1個unit,並搭配sigmoid
x = layers.Dense(1, activation='sigmoid')(x)

discriminator = keras.models.Model(discriminator_input, x)
利用Model.summary()確認模型維度變化
discriminator.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 30, 30, 128)       3584      
_________________________________________________________________
leaky_re_lu_11 (LeakyReLU)   (None, 30, 30, 128)       0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 14, 14, 128)       262272    
_________________________________________________________________
leaky_re_lu_12 (LeakyReLU)   (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 6, 6, 128)         262272    
_________________________________________________________________
leaky_re_lu_13 (LeakyReLU)   (None, 6, 6, 128)         0         
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 2, 2, 128)         262272    
_________________________________________________________________
leaky_re_lu_14 (LeakyReLU)   (None, 2, 2, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 512)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 513       
=================================================================
Total params: 790,913
Trainable params: 790,913
Non-trainable params: 0
_________________________________________________________________
定義模型最佳化方式
範例使用RMSprop做為最佳化的方式,透過clipvalue來限制梯度範圍
discriminator_optimizer = keras.optimizers.RMSprop(
    lr = 0.0008,
    clipvalue=1.0,
    decay=1e-8
)
discriminator.compile(optimizer=discriminator_optimizer,
                      loss='binary_crossentropy')
Generator與Discriminator都已經設置好了,現在我們要讓他們對抗,幾個點簡單說明:
  1. 原始GAN中雖然提到Generator要訓練多次,但實際上Ian Goodfellow只訓練一次而以
  2. 訓練Generator的時候Discriminator是凍結的
首先,將discriminator凍結
discriminator.trainable = False
gan model的input是一開始所設置的laten_dim,而output的部份則是generator model的output給discriminator model的input所做的判斷,即判斷generator model所生成的資料是真還是假
gan_input = keras.Input(shape=(laten_dim, ))
gan_output = discriminator(generator(gan_input))
gan = keras.models.Model(gan_input, gan_output)
設置gan model的最佳化方式
gan_optimizer = keras.optimizers.RMSprop(
    lr=0.0004,
    clipvalue=1.0,
    decay=1e-8
)
gan.compile(optimizer=gan_optimizer,
            loss='binary_crossentropy')
gan.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         (None, 32)                0         
_________________________________________________________________
model_2 (Model)              (None, 32, 32, 3)         6264579   
_________________________________________________________________
model_3 (Model)              (None, 1)                 790913    
=================================================================
Total params: 7,055,492
Trainable params: 6,264,579
Non-trainable params: 790,913
_________________________________________________________________
現在我們可以開始來訓練模型,上面提到,我們會從空間隨機sample出資料,經過generator model來生成,這筆資料當然假資料,其標記為Negative,另外我們也會有真實的資料,其標記為Positive
首先我們先下載keras自帶的資料集,CIFAR10
  • 0: airplane
  • 1: automobile
  • 2: bird
  • 3: cat
  • 4: deer
  • 5: dog
  • 6: frog
  • 7: horse
  • 8: ship
  • 9: truck
(x_train, y_train), (_, _) = keras.datasets.cifar10.load_data()
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 109s 1us/step
任何資料集都請務必先行驗證確認資料維度
x_train.shape, y_train.shape
((50000, 32, 32, 3), (50000, 1))
下載下來的照片資料,我們只會採用其中一個類別來簡化範例的進行
y_train.flatten() == 5為mask來取得資料
x_train = x_train[y_train.flatten() == 5]
調整之後確認資料維度
x_train.shape
(5000, 32, 32, 3)
後面不需要太過在意y_train,因為對我們目前需求來說,所有真實資料皆為Positive
x_train.dtype
dtype('uint8')
資料標準化
x_train = x_train / 255.
x_train.dtype
dtype('float64')
轉換型別,理論上使用float32足矣
x_train = x_train.astype(np.float32)
x_train.dtype
dtype('float32')
設置幾個簡單參數
# 預計執行迭代次數
iterations = 10000
# 每次處理數量
batch_size = 20
# 預計保存生成照片資料集,依個人需求設置
folder_path = '/tf/GAN/DCGAN' 
前置作業完畢之後就可以正式的來訓練模型
# 記錄起始索引的參數
idx_start = 0
for i in range(iterations):
    # 從高斯分佈空間中隨機sample出資料點
    random_vectors = np.random.normal(size=(batch_size, laten_dim))
    # 從generator model取得output
    # 依我們所設置的模型,會回傳Nx32x32x3的資料
    generator_img = generator.predict(random_vectors)
    # 目前的結束索引
    idx_stop = idx_start + batch_size
    # 利用索引取得相對應的真實照片
    true_imgs = x_train[idx_start: idx_stop]
    # 將真假堆疊在一起
    combian_imgs = np.concatenate([generator_img, true_imgs])
    # 產生相對應的label,生成為假:1,真實照片:0
    labels = np.concatenate([np.ones((batch_size, 1)), 
                             np.zeros((batch_size, 1))])
    
    # 在label中加入隨機的噪點,很多事不能說明為什麼
    # 跟著打牌就不會放槍就是了
    labels += 0.01 * np.random.random(labels.shape)
    
    # 目前已經隨機生成照片,要拿這些照片來訓練discriminator
    discriminator_loss = discriminator.train_on_batch(combian_imgs, labels)
    
    # 現在,discriminator已經知道這些是假照片,因此要更新generator
    # 接下來的生成照片,印象中李弘毅老師的線上課程中有提到,要不要重新生成都可以    
    random_vectors = np.random.normal(size=(batch_size, laten_dim))
    # 產生上面新生成照片的labels
    # 這邊設置的標籤是『Positive』,這是因為我們要欺騙discriminator
    # 讓discriminator感覺這個generator生成的照片是真的
    # 這個過程之後generator就升級了
    generator_labels = np.zeros((batch_size, 1))
    # 訓練gan model,記得這時候訓練的是我們凍結discriminator的模型-gan
    gan_loss = gan.train_on_batch(random_vectors, generator_labels)
    
    # 更新索引
    idx_start = idx_start + batch_size 
    # 判斷索引是否超過資料集索引,超過的話就重新計算
    # 這邊也要特別注意到資料集數量與batch_size的設置關係,要可以整除
    if idx_start > len(x_train) - batch_size:
        idx_start = 0
        
    # 這邊每100次迭代記錄一次
    if i % 100 == 0:
        gan.save_weights('gan.h5')
        
        print('discriminator loss: ', discriminator_loss)
        print('gan loss: ', gan_loss)
        
        img = image.array_to_img(generator_img[0] * 255., scale=False)
        img.save(os.path.join(folder_path, 'epoch-' + str(i) + '-generator.jpg'))
        
        img = image.array_to_img(true_imgs[0] * 255., scale=False)
        img.save(os.path.join(folder_path, 'epoch-' + str(i) + '-true_image.jpg'))        
    
discriminator loss:  5.1067543
gan loss:  13.483853
discriminator loss:  0.55595607
gan loss:  0.9773754
discriminator loss:  0.6778943
gan loss:  0.7218493
discriminator loss:  0.7140575
gan loss:  0.74051535
discriminator loss:  0.6997445
gan loss:  0.74154454
discriminator loss:  0.6921107
gan loss:  0.7020267
discriminator loss:  0.69686
gan loss:  0.7030369
discriminator loss:  0.6800611
gan loss:  0.6471882
discriminator loss:  0.6967187
gan loss:  0.77145576
discriminator loss:  0.7020233
gan loss:  0.71897966
discriminator loss:  0.7126138
gan loss:  0.73591757
discriminator loss:  0.7110381
gan loss:  0.7163498
discriminator loss:  0.6996112
gan loss:  0.71271735
discriminator loss:  0.73551977
gan loss:  0.6986626
discriminator loss:  0.67326313
gan loss:  0.71304166
discriminator loss:  0.7034636
gan loss:  1.0788002
discriminator loss:  0.7141048
gan loss:  0.7726129
discriminator loss:  0.696671
gan loss:  0.720789
discriminator loss:  0.7359967
gan loss:  0.7729639
discriminator loss:  0.69505966
gan loss:  0.65915006
discriminator loss:  0.67265576
gan loss:  0.89056253
discriminator loss:  0.70829356
gan loss:  0.6985985
discriminator loss:  0.6924041
gan loss:  0.6630027
discriminator loss:  0.71791756
gan loss:  0.6948942
discriminator loss:  0.7060956
gan loss:  0.70852387
discriminator loss:  0.69083965
gan loss:  0.7589138
discriminator loss:  0.7217103
gan loss:  0.7043828
discriminator loss:  0.69030714
gan loss:  0.7687561
discriminator loss:  0.7466472
gan loss:  0.83582276
discriminator loss:  0.6909427
gan loss:  0.75084436
discriminator loss:  0.692063
gan loss:  0.66303813
discriminator loss:  0.6578157
gan loss:  0.7915683
discriminator loss:  0.660645
gan loss:  0.717038
discriminator loss:  0.6897063
gan loss:  0.71162593
discriminator loss:  0.72813225
gan loss:  0.7520748
discriminator loss:  0.70937884
gan loss:  0.9043194
discriminator loss:  0.6839577
gan loss:  0.72316694
discriminator loss:  0.68821317
gan loss:  0.6758722
discriminator loss:  0.66450214
gan loss:  0.7818069
discriminator loss:  0.68543863
gan loss:  0.7772101
discriminator loss:  0.72352195
gan loss:  1.1552292
discriminator loss:  0.66245687
gan loss:  0.8431295
discriminator loss:  0.64920413
gan loss:  0.6762968
discriminator loss:  0.6851862
gan loss:  0.6802823
discriminator loss:  0.6546106
gan loss:  0.8981015
discriminator loss:  0.66509783
gan loss:  0.8244335
discriminator loss:  0.71177804
gan loss:  0.64901894
discriminator loss:  0.6947624
gan loss:  0.6984755
discriminator loss:  0.70206606
gan loss:  0.8204377
discriminator loss:  0.7477706
gan loss:  0.9481209
discriminator loss:  0.67449886
gan loss:  0.83607197
discriminator loss:  0.6896846
gan loss:  0.6858331
discriminator loss:  0.6856981
gan loss:  0.71470636
discriminator loss:  0.7168747
gan loss:  0.58466285
discriminator loss:  0.69816226
gan loss:  0.76343626
discriminator loss:  0.7020701
gan loss:  0.8184387
discriminator loss:  0.64522004
gan loss:  0.6753376
discriminator loss:  0.6865121
gan loss:  0.7008316
discriminator loss:  0.7476796
gan loss:  0.87338126
discriminator loss:  0.7263964
gan loss:  0.6889228
discriminator loss:  0.70218945
gan loss:  0.6831338
discriminator loss:  0.6857745
gan loss:  0.6565727
discriminator loss:  0.68626153
gan loss:  0.6991853
discriminator loss:  0.68162763
gan loss:  0.74716157
discriminator loss:  0.7115513
gan loss:  0.7860044
discriminator loss:  0.7063864
gan loss:  0.74256736
discriminator loss:  0.6947535
gan loss:  0.73066074
discriminator loss:  0.70494205
gan loss:  0.64904773
discriminator loss:  0.69508123
gan loss:  0.72680366
discriminator loss:  0.6899625
gan loss:  0.81384104
discriminator loss:  0.7159114
gan loss:  0.7416827
discriminator loss:  0.6912014
gan loss:  0.7360595
discriminator loss:  0.72471696
gan loss:  0.6513818
discriminator loss:  0.6992035
gan loss:  0.6941215
discriminator loss:  0.6863073
gan loss:  0.7213627
discriminator loss:  0.6924833
gan loss:  0.71849513
discriminator loss:  0.6969613
gan loss:  0.72678196
discriminator loss:  0.685106
gan loss:  0.7152079
discriminator loss:  0.68887657
gan loss:  0.73665506
discriminator loss:  0.71015817
gan loss:  0.732918
discriminator loss:  0.67974985
gan loss:  0.8060713
discriminator loss:  0.693206
gan loss:  0.67660487
discriminator loss:  0.68673164
gan loss:  0.7380837
discriminator loss:  0.6858455
gan loss:  0.7205802
discriminator loss:  0.69764405
gan loss:  0.7232281
discriminator loss:  0.7109472
gan loss:  0.66223055
discriminator loss:  0.707724
gan loss:  1.1661718
discriminator loss:  0.6913794
gan loss:  0.6941385
discriminator loss:  0.69227374
gan loss:  0.7601951
discriminator loss:  0.6878415
gan loss:  0.7479371
discriminator loss:  0.71728295
gan loss:  0.83168066
discriminator loss:  0.6864918
gan loss:  0.7533535
discriminator loss:  0.7127117
gan loss:  0.6897801
discriminator loss:  0.69414794
gan loss:  0.7193168
discriminator loss:  0.67149776
gan loss:  0.7141426
discriminator loss:  0.6828767
gan loss:  0.7632409
discriminator loss:  0.67732584
gan loss:  0.7958939
discriminator loss:  0.69132864
gan loss:  0.721274
discriminator loss:  0.71267766
gan loss:  0.71191424
discriminator loss:  0.6940938
gan loss:  0.8811827

確認生成照片

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img_path = 'epoch-9600-generator.jpg'
gen_img = mpimg.imread(img_path)
plt.imshow(gen_img)

像哈巴狗?

柯基與柴犬?

結論

範例可以看的出來,在Keras的高階api協助之下,實作GAN並不是那麼樣子的困難,但GAN的訓練有許多比程式碼還要困難的部份(上面結果可以發現,優化的並不是那麼好),不管是Generator還是Discriminator,過強或過弱對模型來說都不是好事,許多情況下GAN難以訓練,需要的是參數上的不斷調校,唯有不斷的踩雷才能擁有足夠的經驗。

後記:後面繼續訓練約50,000次迭代之後就整個壞掉了!

延伸

單純保存一張照片似乎較難以判斷生成狀況,下面function將圖片串接起來

直接使用keras自帶工具來處理照片保存的作業

import numpy as np
from keras.preprocessing import image
def save_generator_img(generator_img, h_int, w_int, epoch, save_file_name='generator', save_dir='.'):
    """保存generator生成的照片
    function:
        利用keras自帶工具keras.preprocessing.Image保存照片
    
    parameters:
        generator_img: generator生成的numpy object
        h_int: 湊成一張照片的時候高要幾張
        w_ing: 湊成一張照片的時候寬要幾張
        epoch: 第幾次的迭代,保存照片的時候插入檔案名稱
        save_file_name: 檔案名稱,預設為generator
        save_dir: 保存的路徑,預設在執行程式的根目錄    
        
    remark:
        h_int x w_int 不能超過generator_img的長度,一定只能等於
    
    example:
        save_generator_img(generator_img, 4, 5, 9900)
    """
    # 取得資料維度
    N, H, W, C = generator_img.shape
    
    # 驗證拼湊照片數量相符
    assert int(h_int) * int(w_int) == N        
    
    # 開一張全zero的大陣列
    target_img = np.zeros(shape=(h_int * H, w_int * W, C))
    
    # 索引
    now_idx = 0    
    for h_idx in range(h_int):
        for w_idx in range(w_int):
            # 取得照片
            _img = generator_img[now_idx]
            # 取代相對應陣列位置
            target_img[h_idx * H: h_idx * H + H, w_idx * W: w_idx * W + W] = _img                        
            now_idx += 1
    
    
    file_name = os.path.join(save_dir, save_file_name + str(epoch) + '.png')
    save_img = image.array_to_img(target_img * 255., scale=False)
    save_img.save(file_name)                         

2019年7月11日 星期四

PyCUDA Tutorial

PyCUDA Tutorial

官方文檔_pycuda_tutorial

Getting started

在你使用PyCuda之前,你應該import並且初始化

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import Sourcemodule

注意到,你並不需要使用pycuda.autoinit-如果需要的話,也可以手動執行初始化,上下文創建和清理。

Transferring Data

大多數的程式的下一步是將資料轉移進入設備上,在PyCuda,你主要會從本機端轉換numpy array資料(實際上,滿足Python緩衝區接口的所有東西都可以執行,即使是str),我們隨機生成一個4x4的陣列

import numpy
# 亂數生成一個4x4的陣列
a = numpy.random.randh(4, 4)

但是,等等,a是一個雙精數(double precision numbers),但大部份的Nvidia設備只支援單精數(Single precision)

# 轉換為float32格式
a = a.astype(numpy.float32)

最後,我們需要在某一個地方傳輸資料,因此我們需要在設備上分配記憶體空間:

a_gpu = cuda.mem_alloc(a.nbytes)

numpy.nbytes: Total bytes consumed by the elements of the array.

最後一步,我們需要將資料傳輸到GPU上:

cuda.memcpy_htod(a_gpu, a)

Executing a Kernel

在這個教程中,我們將堅持一些簡單的事情:我們將寫程式將a_gpu中的每個輸入值乘2。為此,我們要寫一個相對應的CUDA C語言,並將它提供給pycuda.compiler.SourceModule的構造函數:

mod = SourceModule("""
    __global__ void doublify(float *a)
    {
        int idx = threadIdx.x + threadIdx.y*4;
        a[idx] *= 2;
    }
""")
  • threadIdx 是CUDA C語言的變數

如果沒有任何錯誤,程式碼將被編譯並載入至設備上。我們可以利用pycuda.driver.Function來參照並使用上面定義的C function,指定a_gpu為參數,設置block=(4, 4)

# reference
func = mod.get_function('doublify')
# call it
func(a_gpu, block=(4, 4, 1))

最後,我們取得GPU回饋的資料並呈現它,跟原始的a放一起來觀察:

a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

numpy.empty_like: Return a new array with the same shape and type as a given array.

列印出來的資料如下:(上面是乘2之後的結果,下面是原始結果)

>>> print(a_doubled)
[[ 0.20442943  0.77628285  2.4775066   2.1609256 ]
 [ 0.89208996 -0.31590727 -1.8672403   1.1618725 ]
 [-2.8112352   0.42963934  3.1333017  -0.6093758 ]
 [ 0.6432332   3.4218931   0.45423126 -0.277839  ]]
>>> print(a)
[[ 0.10221472  0.38814142  1.2387533   1.0804628 ]
 [ 0.44604498 -0.15795363 -0.93362015  0.58093625]
 [-1.4056176   0.21481967  1.5666509  -0.3046879 ]
 [ 0.3216166   1.7109466   0.22711563 -0.1389195 ]]

它執行了!這完成了我們的演示,謝天謝地,PyCuda接手並完成所有的清理工作,所以你完成了。

不會翻譯
Stick around for some bonus material in the next section, though.

(你可以在PyCuda資料夾examples/demo.py找到這段範例程式碼)

Shortcuts for Explicit Memory Copies

pycuda.driver.In, pycuda.driver.Out,pycuda.driver.InOut這三個參數處理程序可以簡化記憶體傳輸作業。舉例來說,如果用a來代替建立a_gpu,那下面程式碼可以使用:

func(cuda.InOut(a), block=(4, 4, 1))

Prepared Invocations

函式呼叫使用內建的pycuda.driver.Function.__call__()方法導致多餘的類型識別開銷(見Device Interface)。為了實現上述無多餘開銷的效果,要將函式綁定參數類別(如Python的標準庫struct),然後呼叫它,這也避免使用numpy.number顯式分配參數大小。

grid = (1, 1)
block = (4, 4, 1)
func.prepare('P')
func.prepared_call(grid, block, a_gpu)

Bonus: Abstracting Away the Complications

使用[pycuda.gpuarray.GPUArray](Bonus: Abstracting Away the Complications),寫的更精簡但可以達到一樣的效果。

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy

a_gpu = gpuarray.to_gpu(numpy.random.randn(4, 4).astype(numpy.float32))
a_doubled = (2 * a_gpu).get()
print(a_doubled)
print(a_gpu)

Advanced Topics

Structures

(由Nicholas Tung貢獻, 範例程式碼置於examples/demo_struct.py)

假設我們有下列結構,用來將多個可變長度陣列加倍(乘2):

mod = SourceModule("""
    struct DoubleOperation {
        int datalen, __padding; // so 64-bit ptrs can be aligned
        float *ptr;
    };

    __global__ void double_array(DoubleOperation *a) {
        a = &a[blockIdx.x];
        for (int idx = threadIdx.x; idx < a->datalen; idx += blockDim.x) {
            a->ptr[idx] *= 2;
        }
    }
    """)

Grid中的每一個Block(參考CUDA文件)將會使其中一個array加倍。for迴圈允許的資料元素比執行緒翻倍更多,但如果可以保證有足夠的執行緒數,則效率不高。下一步,一個基於這個結構的封裝類就被建立,兩個array將會被實作:

class DoubleOpStruct:
    mem_size = 8 + numpy.intp(0).nbytes
    def __init__(self, array, struct_arr_ptr):
        self.data = cuda.to_device(array)
        self.shape, self.dtype = array.shape, array.dtype
        cuda.memcpy_htod(int(struct_arr_ptr), numpy.getbuffer(numpy.int32(array.size)))
        cuda.memcpy_htod(int(struct_arr_ptr) + 8, numpy.getbuffer(numpy.intp(int(self.data))))
    def __str__(self):
        return str(cuda.from_device(self.data, self.shape, self.dtype))

struct_arr = cuda.mem_alloc(2 * DoubleOpStruct.mem_size)
do2_ptr = int(struct_arr) + DoubleOpStruct.mem_size

array1 = DoubleOpStruct(numpy.array([1, 2, 3], dtype=numpy.float32), struct_arr)
array2 = DoubleOpStruct(numpy.array([0, 4], dtype=numpy.float32), do2_ptr)
print("original arrays", array1, array2)

上面的程式碼使用了pycuda.driver.to_device()pycuda.driver.from_device()兩個函數來分配與複製數值,並演示如何使用已分配的記憶體區塊的偏移量。最後,這程式碼可以執行;下面演示如果對array做加倍,以及只針對第二個array做加倍:

func = mod.get_function("double_array")
func(struct_arr, block = (32, 1, 1), grid=(2, 1))
print("doubled arrays", array1, array2)

func(numpy.intp(do2_ptr), block = (32, 1, 1), grid=(1, 1))
print("doubled second only", array1, array2, "\n")

Where to go from here

當你熟悉上面基礎之後,就儘管去Device Interface。更多的範例,可以檢閱examples/資料夾底下的資料,這個資料夾也包含了數種關於GPU與CPU之間的計算差異。作為如何完成工作的參考,PyCuda的測試套件置於test/資料夾底下,也可能有所幫助。