Neural Aesthetic

Week 14: Generating Holy Scripture

I’ve been thinking about meaningful datasets and what makes them so. I’ve also been thinking about this in the context of what might be more-or-less available to source. Sacred religious texts might meet both criteria. Faith is deeply personal to people and religious stories have been told since the beginning, yes?

PART 1 - COLLECT THE DATA
According to these Pew Research reports from 2012 and 2017, most people in the world belong to a religious group. In order of magnitude, the largest groups are Christians, Muslims, Hindus, and Buddhists. I tasked myself with finding significant scriptures for each of their religions.

In some cases this meant learning what those are in the first place and quickly realizing that it’s not necessarily an easy answer. Language, stories, and texts evolve and develop differently over time and geographies in their expressions and interpretations. Which scriptures are read and the particular versions varies by denomination.

For training a machine learning model, I looked for documents translated into English. Any translation raises questions of accuracy and meanings that are lost or gained. Then again, these stories have been a part of the written and oral traditions for so long; are they not already the result of thousands of years of human telephone?

In addition, I sought to find documents as digital text (not scanned books), “complete” texts (as opposed to selections), and those without commentary and analysis (at least for now).

So yeah, considering all of these points, it got complicated real quick. And once I knew what I was looking for, it wasn’t necessarily easy to find. I have more questions now since I started this attempt. This project is much larger in scope than for the short time that I currently have. Let’s just say, in ITP spirit, that this is an earnest prototype.

Problematic as it may be for a number of reasons, not in the least because I’m sure it’s grossly incomplete, here’s a list of what I managed to find and where I found it. I welcome any and all comments and suggestions.

Christianity
The King James Bible from Project Gutenberg

Islam
The Quran translated by Mohammed Marmaduke Pickthall from Islam101

Hinduism

  1. Four Vedas from the Internet Archive, includes:

    • Rig Veda translated by RT Griffith

    • Yajur Veda translated by AB Keith

    • Hymns of Sama Veda translated by RT Griffith

    • Hymns of Atharva Veda translated by M Bloomfied

  2. The Upanishads translated by Swami Paramananda from Project Gutenberg

Buddhism
The Tipitaka or The Pāli Canon texts of the Theravada tradition (a reference chart), all below from ReadingFaithfully.org.

  1. Vinaya Pitaka (selections from) translated by I.B. Horner

  2. Sutta Pitaka

  3. Abhidhamma Pitaka (do not have)

Here’s a comparison of the included texts: Christian 25%, Islamic 5%, Hindu 19%, and Buddhist 51%.

PART II - PREPARE THE DATA
I collected eleven documents total. Those that I sourced as ePubs I converted to PDFs using this online tool. Then, I used Adobe Acrobat to convert all PDFs into Rich Text Format (RTF) files. Next, I used TextEdit to convert those to plain text files (Format > Make Plain Text) although I could have used textutil for this (a comparison later on showed no difference in the output). In some cases, such as for the Bible, the Qur’an, and the Upanishads, I used Text Wrangler to remove the artificial line breaks in the middle of the line (Text > Remove Line Breaks). I’m not sure what compelled me to make these decisions—perhaps muscle memory from my previous charRNN tests? It was useful to deal with each file individually at first to remove document details about where it came from (e.g. all the Project Gutenberg info) and the translators’ introductions and such. But maybe I should leave in this info? Thinking about Caroline Sinders’ Feminist Data Set work here.

The documents, when compared to one another, show variation in line spacing: some are single-spaced, others doubled, while others contain a mix. In the end, I decided to leave it—this will likely impact the look of the output results.

In addition, during the file format conversion many diacritics did not convert well. And so continues the story of translation and interpretation…

Following my notes from before, I used textutil to concatenate all files into one document titled input.txt: textutil -cat txt *.txt

In the end, my dataset totaled ~18MB.

PART III - TRAIN THE MODEL
As before when working with text, I decided to use the ml5js version of a Multi-layer Recurrent Neural Network (LSTM, RNN) in order to generate text at the character level. Many of my classmates have argued that this has been ineffective for them, but I was pleased with the results from my previous experiments so I’ll stick with it for now.

I also used Spell.run again because they provide access to clusters of GPUs for faster training than Paperspace. Nabil Hassein’s tutorial is an excellent resource for using Spell and training a LSTM model in the ml5js world. Here is a quick summary of my steps:

  1. On my local computer, mkdir scripture

  2. cd scripture

  3. virtualenv env

  4. source env/bin/activate

  5. git clone https://github.com/ml5js/training-lstm.git

  6. cd training-lstm/

  7. mkdir data

  8. mv input.txt into dir data

  9. adjust the hyperparamters via nano run.sh (which lives inside training-lstm). Using this as a reference, I applied these settings for my 18MB file:
    --rnn_size 1024 \
    --num_layers 3 \
    --seq_length 256 \
    --batch_size 128 \
    --num_epochs 50 \

  10. pip install spell

  11. spell login

  12. enter username & password

  13. spell upload data/input.txt

  14. provide a directory name to store my data input file on spell, I wrote: uploads/scripture

  15. request a machine type and initiate training! spell run -t V100x4 -m uploads/scripture:data “python train.py —-data_dir=./data”

  16. fetch the model into the scripture dir (cd .. out of training-lstm): spell cp runs/11/models (11 is my run #)

Notes:

  1. I selected the machine type V100x4 at $12.24/hour

  2. Start time: 04:24:51am

  3. Finish time: 07:16:09am

  4. Total cost: $34.88

PART IV - USE THE MODEL
Try it! Holy Scripture Generator

Input Text, Trained Model, and Site Code on GitHub

PART V - CONCLUSION
A short one: this experiment does not feel as successful as my previous ones. Perhaps the novelty wore off? Perhaps I need to retrain the model with different hyperparameters (lower the epochs?) to test the outputs? Something is off. But in the end, repeating the process of another LSTM was useful a practice and meditation on additional factors to consider when compiling a dataset.

Week 8: Generating Images with Neural Style Transfer

A popular machine learning method for generation images is style transfer: appropriating the style of one image onto the content of another. Here is jcjohnson’ neural-style model, for which there is excellent documentation and examples.

I cloned the model into my virtual GPU machine in Paperspace (described in this post), and experimented with it in two ways: traditional style transfer and texture synthesis.

PART 1 - STYLE TRANSFER
This technique reminds of me of layering images with masks in Photoshop. To run the model you select and train on a “style” image to apply to a “content” image. There are many adjustable parameters to optimize the process and that impact the look of the resulting image, all documented in the repo.

Here’s an example (command run inside of model’s directory):
$ th neural_style.lua -style_image /home/paperspace/Documents/projects/_pattern.jpg -content_image /home/paperspace/Documents/projects/_portrait.jpg -output_image /home/paperspace/Documents/projects/result/_pattern_portrait.png -style_weight 2e0 -content_weight 3e0 -backend cudnn

My style image is of a pattern, my content image (to receive the style) is a portrait, I’ve defined the location and name of the resulting image, I’ve adjusted how much to weigh the style and content inputs, and finally, made more efficient use of my GPU’s memory by using the -backend cudnn flag. (Channel Zero, anyone?)

Let’s try another example, this time with the default style and content weight settings of 5e0 and 1e2 respectively. It really is like being in an analog dark room. No two prints are alike. Even if you run the model again with the same images and parameters, you get slightly different results.

PART 2 - TEXTURE SYNTHESIS
When you run the model with the content weight equal to 0, it still picks up and applies the learned style to an empty canvas. Again the result changes with every run even if the parameters do not change.

$ th neural_style.lua -style_image /home/paperspace/Documents/projects/clouds.jpg -content_image /home/paperspace/Documents/projects/content.jpg -output_image /home/paperspace/Documents/projects/result/_clouds_texture.png -content_weight 0 -backend cudnn

You can also combine multiple style input images for a collaged-texture synthesis:

$ th neural_style.lua -style_image /home/paperspace/Documents/projects/one.jpg,/home/paperspace/Documents/projects/two.jpg -content_image /home/paperspace/Documents/projects/content.jpg -output_image /home/paperspace/Documents/projects/result/_trees_texture.png -content_weight 0 -backend cudnn

Image Credits: Clouds & Red, Blue, and White Abstract

I guess this okay; I’m still looking for a useful application of this (because Photoshop). But I’m glad I did it. And really I want to know how it handles many many many input style images—like all 6,555 images from my Blade Runner experiment earlier. Unfortunately, Terminal said my argument list was too long.

I then tried half that amount (having extracted 30 frames from each minute of the film) but again got the same response.

It does work with 26 pics, though! Which is what I used when I tested the process (though I already trashed that output image). My next step is to figure out how to get around this…

In the mean time, here are the steps I used to prepare to train the model on multiple style files; I concatenated all the filenames with a comma, stored that into giant string which I wrote to a text file, then stored the contents of that file into a variable, which I then included in the command to start the training process:

  1. Create an empty text file: $ touch text.txt

  2. Create a directory with your image files

  3. Start a python shell: $ python

  4. >>> import os

  5. >>> mydir = ‘/images’

  6. >>> style_string = ','.join([os.path.join(mydir, f) for f in os.listdir(mydir)])

  7. >>> f = open( 'file.txt’, 'w' )

  8. >>> f.write(style_string)

  9. >>> f.close()

  10. Exit python shell (Control + d)

  11. Navigate into neural style directory

  12. $ value=$(</dir/file.txt)

  13. $ echo “$value” (to check the contents of the variable)

Which then leads to this:
$ th neural_style.lua -style_image "$value" -content_image /home/paperspace/Documents/projects/content.jpg -output_image /home/paperspace/Documents/projects/result/_result.png -content_weight 0 -backend cudnn

Week 8: Generating Images with DCGAN-Tensorflow

We made it to generative models! I’ve been learning how to set up and train models on images to generate new pics. Specifically, I’ve been using a Tensorflow implementation of Deep Convolutional Generative Adversarial Network. I’m still understanding how it works, but for now it’s like playing with a new camera that is learning how to make new images based off of the ones I feed it. The process of “developing” photos takes a long time—many hours or days even, and it reminds me of timed waiting in a dark room, not mentioned recording the results along with my input parameters.

This machine learning model needs to train on a lot of images. But where to find thousands upon thousands of pics? Movies! As collections of fast-moving images, they are an excellent source. And since machine learning is a branch of artificial intelligence, I choose Bladerunner 1982.

Part I - Virtual Machine Setup
I returned to Paperspace and setup a GPU, a Quadro M4000 (0.51/hr, $3/mo for public IP, and $5/mo for 50 GB) with Ubunto 16.04 and their ML-in-a-Box template, which has CUDA and the TensorFlow deep learning libraries already set.

With the public IP, I can log into the machine via Terminal, set up my project file structure, git clone repos, train models, and generate new images there, too. It helps to have some practice with the command line and Python beforehand.

Part II - Data Collection
The next step was to acquire a digital version of the movie and extract frames using ffmpeg.

To extract one every second I used:
$ ffmpeg -i BladeRunner.mp4 -vf fps=1 %04d.png -hide_banner

I choose PNG format over JPG because it supports lossless compression, and I wanted to preserve as much detail as possible.

This gave me 7057 images from which I removed out opening studio logos and credits at the end (btw, you can also set the initial extraction range with ffmpeg), which gave me a total of 6,555 images (8.63 GB).

Best to complete this step on the VM as it takes way too long to upload images using Cyberduck or via Jupyter Lab notebook or another way into your machine.

Part III - Data Preparation
This DCGAN-tensorflow (I used Gene Kogan’s fork for a bug fix) expects square images at 128 x 128 pixels*, but my movie frames were 1920 x 800.

First, I wrote a python script using Pillow to copy and resize images to 307 x 128. If I recall correctly, this resize.py file was in the same directory as the original images.

from PIL import Image
import glob, os

size = 307, 128

def resize():
    for infile in glob.glob("*.png"):
        file, ext = os.path.splitext(infile)
        im = Image.open(infile)
        im.thumbnail(size, Image.ANTIALIAS)
        im.save("/dir/dir/dir/dir/images_resized/"+file + "_resized.png", "PNG")

resize()

Then, I made center crops using the ml4a guides/utils/dataset_utils.py by cd-ing into the utils directory and running this (don’t forget to install the requirements! see requirements.txt):

$ python3 dataset_utils.py --input_src /home/paperspace/Documents/data/brStills/ --output_dir /home/paperspace/Documents/projects/BladeRunner/stillsCropped --w 128 --h 128 --centered --action none --save_ext png --save_mode output_only

Along the way, I learned this handy line to count the number of files in a folder to make sure all 6,555 images were there:
$ ls -F |grep -v / | wc -l

*128x128 is so small! But Gene thought 256x256 might be too large. I learned that they are not, but it takes a much longer time to train. I did not pursue this after a 5-epoch run @ 1 hour.

Part III - Train the Model
Before I began training, I created directory with the following folders: checkpoints, samples, and my images.

Next, moving into the DCGAN folder, I ran the following line. Opening the main.py file beforehand, I learned which flag to set to indicate the PNG file format of my images (the default is JPG).

$ python main.py --dataset=images --data_dir /home/paperspace/Documents/projects/BladeRunner128 --input_height 128 --output_height 128 --checkpoint_dir /home/paperspace/Documents/projects/BladeRunner128/checkpoints --sample_dir /home/paperspace/Documents/projects/BladeRunner128/samples --epoch 5 --input_fname_pattern '*.png' --train

I started by running 5 epochs. Each epoch represents one pass through the entire dataset, and to do that, the dataset is broken up into batches. The default batch size for this DCGAN is 64.
My understanding from class was that other than the number of epochs, the hyperparameters for this particular DCGAN do not need tweaking, although I might revisit batch size later. How many epochs, you ask? That’s a good question, and one for which I haven’t found a definitive answer. It’s at this point that folks say it’s more an art than a science because it also depends on the diversity of your data. More epochs means more learning, but you don’t want to overfit…so to be continued… Reference

(Also worth mentioning that I got some initial errors because my single quotes around png were not formatted in an acceptable way. Better to type them in than copy and paste from a text editor.)

Part IV - Generate Images • Train the Model More • Repeat
Generating images is as easy as running the above line in the same place but without the --train, and the results are saved in the samples folder.

I learned that I can continue training from where I left off as long as I keep all the directories intact. The model will look for the checkpoint folder and load the most recent one. Similarly, if I train too much, I can remove the later checkpoints and generate from the most recent one remaining (or so I’m told…need to double-check for this DCGAN).

Here are some results:

After 5 epochs

After 25 epochs (~1 hr 15 min)

After 25 epochs (~1 hr 15 min)

After 125 epochs (~6hr)

After 125 epochs (~6hr)

After 250 epochs (~12 hours)

After 250 epochs (~12 hours)

And here is a random sampling of movie stills from the original dataset:

Screen Shot 2018-11-02 at 1.58.22 PM.png

Many of the DCGAN examples I’ve seen use a dataset that is much more homogenous than mine. The output variation here doesn’t surprise me, and in fact, I was curious to see what would happen if I used a mix of images. That being said, the colors remind me of the film’s palette, and especially at the 250-epoch mark, I see a few faces emerging.

Part V - Train the Model on Linear Image Dataset
During my image preparation process, the pic ordered was shuffled when I cropped the center squares. A general question I have is does the order of the training data matter, especially for a dataset with a linear progression like this one? Along my machine learning meanderings online, I’ve seen nods to shuffling data to improve a model’s learning outcomes. A good next step would be for me to train on an ordered set to see if there are any differences…so let’s do that!

Here’s my own python script for taking the center crop of images:

from PIL import Image
import glob, os

def crop():
    for infile in glob.glob("*.png"):
        file, ext = os.path.splitext(infile)
        im = Image.open(infile)
        width, height = im.size;

        new_width = 128;
        new_height = 128;

        left = (width - new_width)/2
        top = (height - new_height)/2
        right = (width + new_width)/2
        bottom = (height + new_height)/2

        centerCrop = im.crop((left, top, right, bottom))

        centerCrop.save("/dir/dir/dir/dir/"+file + "_crop.png", "PNG")

crop()

This is the semester of never-ending blog posts, but I just keep learning new things to improve my workflow and understanding! For example, since this was going to be another long job, I learned how to use nohup mycommand > myLog.txt & disown to put this process into the background, send the output to a file, and break my Terminal’s connection with it so I could close my Terminal or my computer without interrupting the job. At any point, I can log back into the VM and cat myLog.txt to see the current output of the program. Reference.

$ nohup python main.py --dataset=stillsCropped --data_dir /home/paperspace/Documents/projects/BRlineartrain --input_height 128 --output_height 128 --checkpoint_dir /home/paperspace/Documents/projects/BRlineartrain/checkpoints --sample_dir /home/paperspace/Documents/projects/BRlineartrain/samples --epoch 40 --input_fname_pattern '*.png' --train > myLog.txt & disown

After 210 epochs (~11 hours)

After 210 epochs (~11 hours)

So there’s a tendency towards more completed faces in the samples generated from the model trained on linear dataset. Is that because the neural model likely trained on a succession of frames with an actor in the center of the frame? Is the takeaway to stick with more homogenous datasets, then?

Overall, this experiment was an exercise in getting my bearings with setup and workflow: collecting and prepping my data, setting up a remote computational space (because my own computer is not powerful enough), and learning the mechanics for this particular model. The more I play and try, the more I’ll know which questions to ask and how to use this for a particular application.