Data Preparation¶
Module containing data pre-processing pipelines.
- class master_thesis.data.MasterThesisData(*args: Any, **kwargs: Any)¶
Bases:
pytorch_lightning.core.datamodule.LightningDataModule
Class used to handle data pre-processing steps and persist data splits.
- train_bgs_meta¶
Dictionary containing a mapping between the ids of the different data samples of the backgrounds training dataset and their paths in disk.
- train_masks_meta¶
Dictionary containing a mapping between the ids of the different data samples of the masks training dataset and their paths in disk.
- validation_bgs_meta¶
Dictionary containing a mapping between the ids of the different data samples of the backgrounds validation dataset and their paths in disk.
- validation_masks_meta¶
Dictionary containing a mapping between the ids of the different data samples of the backgrounds validation dataset and their paths in disk.
- test_meta¶
Dictionary containing a mapping between the ids of the test dataset samples and their paths in disk.
- kwargs¶
dictionary containing the CLI arguments required to create an instance of
MasterThesisData
.
- prepare_data()¶
Prepares the data used to train the model.
Fills the meta class attributes with the mapping between data samples and their paths, for the different datasets and splits.
Cleans those masks samples which are either too small or too big, defined by the
--min_mask_size
and--max_mask_size
CLI arguments.Samples a small portion of the data so that validation and test pipelines are consistent between epochs.
The results of this method will be saved in
--data_ckpt_path
so that subsequent calls do not have to wait for this pre-processing step to finish running.
- train_dataloader()¶
Returns the data loader containing the training data.
- Returns
DataLoader object containing the training data.
- val_dataloader()¶
Returns the data loader containing the validation data.
- Returns
DataLoader object containing the validation data.
- test_dataloader()¶
Returns the data loader containing the test data.
- Returns
DataLoader object containing the test data.
- static load_loaders_fn(_)¶
Initializes the different random seeds to a random value between 0 and 1 billion.
- static get_meta_got10k(data_folder, split)¶
Returns the metadata of the GOT-10k dataset.
- Parameters
data_folder – folder where the dataset is stored.
split – split from which to extract the data. Can be either
validation. (train or) –
- Returns
Dictionary associating sequence ids with the paths in disk of its individual frame backgrounds.
- static get_meta_youtube_vos(data_folder, split)¶
Returns the metadata of the YouTubeVOS dataset.
- Parameters
data_folder – folder where the dataset is stored.
split – split from which to extract the data. Can be either
validation. (train or) –
- Returns
Dictionary associating sequence ids with the paths in disk of its individual frame masks.
- static get_meta_davis(data_folder)¶
Returns the metadata of the DAVIS dataset.
- Parameters
data_folder – folder where the dataset is stored.
- Returns
Dictionary associating sequence ids with the paths in disk of its individual frames, both backgrounds and masks.
- static add_data_specific_args(parent_parser)¶
Adds data-specific arguments so that they can be modified directly from the CLI.
- Parameters
parent_parser – parser object just before adding data-specific
arguments. –
- Returns
Parser object after adding data-specific arguments.