Master’s Thesis

This repository contains a refractored version of the code used in Temporal copying and local hallucination for video inpainting.

About the data

Training the model

The first step is to clone this repository and install its dependencies:

git clone https://github.com/davidalvarezdlt/samplernn_pase.git
cd samplernn_pase
pip install -r requirements.txt

apt-get install libturbojpeg

Download the data from this link and extract it inside ./data. The resulting folder structure will look like this:

samplernn_pase/
    data/
        cmu_arctic/
        vctk/
    lightning_logs/
    samplernn_pase/
    config.default.json
    README.md
    requirements.txt

The code has been built using PyTorch Lightning. Read its documentation to get a complete overview of how this repository is organized. In short, you can train the model by calling:

python -m samplernn_pase --model-version <test_version>

Where --model-version is an optional parameter that defines which version to continue the training from. If not set, the training will start from the beginning. You can modify the default parameters of the code by using CLI parameters. Get a complete list of the available parameters by calling:

python -m samplernn_pase --help

For instance, if we want to train the model using acoustic features, with a batch size of 32 and using one GPUs, we would call:

python -m samplernn_pase --conds_utterance_type acoustic --batch_size 32 --gpus 1

Every time you train the model, a new folder inside ./lightning_logs will be created. Each folder represent a different version of the model, containing its checkpoints and auxiliary files. You can test a version by calling:

Testing the model

python -m samplernn_pase --test --model-version <test_version>

The test pipeline will generate some random utterances from the test split and store them in TensorBoard.

Citation

Please cite our thesis if it has been useful for your research:

@thesis{Alvarez2020,
    type = {Master's Thesis},
    author = {David Álvarez de la Torre},
    title = {Temporal copying and local hallucination for video onpainting},
    school = {ETH Zürich},
    year = 2020,
}