Data Preparation

Module containing data pre-processing pipelines.

class master_thesis.data.MasterThesisData(*args: Any, **kwargs: Any)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

Class used to handle data pre-processing steps and persist data splits.

train_bgs_meta

Dictionary containing a mapping between the ids of the different data samples of the backgrounds training dataset and their paths in disk.

train_masks_meta

Dictionary containing a mapping between the ids of the different data samples of the masks training dataset and their paths in disk.

validation_bgs_meta

Dictionary containing a mapping between the ids of the different data samples of the backgrounds validation dataset and their paths in disk.

validation_masks_meta

Dictionary containing a mapping between the ids of the different data samples of the backgrounds validation dataset and their paths in disk.

test_meta

Dictionary containing a mapping between the ids of the test dataset samples and their paths in disk.

kwargs

dictionary containing the CLI arguments required to create an instance of MasterThesisData.

prepare_data()

Prepares the data used to train the model.

  1. Fills the meta class attributes with the mapping between data samples and their paths, for the different datasets and splits.

  2. Cleans those masks samples which are either too small or too big, defined by the --min_mask_size and --max_mask_size CLI arguments.

  3. Samples a small portion of the data so that validation and test pipelines are consistent between epochs.

The results of this method will be saved in --data_ckpt_path so that subsequent calls do not have to wait for this pre-processing step to finish running.

train_dataloader()

Returns the data loader containing the training data.

Returns

DataLoader object containing the training data.

val_dataloader()

Returns the data loader containing the validation data.

Returns

DataLoader object containing the validation data.

test_dataloader()

Returns the data loader containing the test data.

Returns

DataLoader object containing the test data.

static load_loaders_fn(_)

Initializes the different random seeds to a random value between 0 and 1 billion.

static get_meta_got10k(data_folder, split)

Returns the metadata of the GOT-10k dataset.

Parameters
  • data_folder – folder where the dataset is stored.

  • split – split from which to extract the data. Can be either

  • validation. (train or) –

Returns

Dictionary associating sequence ids with the paths in disk of its individual frame backgrounds.

static get_meta_youtube_vos(data_folder, split)

Returns the metadata of the YouTubeVOS dataset.

Parameters
  • data_folder – folder where the dataset is stored.

  • split – split from which to extract the data. Can be either

  • validation. (train or) –

Returns

Dictionary associating sequence ids with the paths in disk of its individual frame masks.

static get_meta_davis(data_folder)

Returns the metadata of the DAVIS dataset.

Parameters

data_folder – folder where the dataset is stored.

Returns

Dictionary associating sequence ids with the paths in disk of its individual frames, both backgrounds and masks.

static add_data_specific_args(parent_parser)

Adds data-specific arguments so that they can be modified directly from the CLI.

Parameters
  • parent_parser – parser object just before adding data-specific

  • arguments.

Returns

Parser object after adding data-specific arguments.