Using Autoencoders in DeepTune

Autoencoders are neural networks that learn how to compress the input images into a compact representation and then reconstruct the images from the compressed form. They consist of two main parts: the encoder, which compresses the input data, and the decoder, which reconstructs the original data from the compressed representation. In DeepTune, we provide the option of using autoencoders for image reconstruction tasks.

Note

The autoencoders functionality is currently in an experimental stage, with DeepTune providing support for basic image reconstruction without providing further details (e.g., loss values, evaluation metrics) for the time being. We recommend using it with caution and providing feedback to help us improve it. AI can produce mistakes. Please verify the results and report any issues you encounter.

We provide an oversight of the autoencoder architecture:

Important

The autoencoder architecture follows an encoder-decoder structure designed to extract and reconstruct hierarchical features. The process begins with a Gaussan blur to the input. The encoder consists of three successive stages, each stage utilizes a 3 by 3 convolutions (or a 5 by 5 in the case of the deeper autoencoder) → Group Normalization → SiLU functon, with a stride of 2 for doubling the feature channels and halving the spatial dimensions. The decoder mirrors the encoder and the model is projecting the feature maps back to the original image space dimensions.

For more information on the Group normalization and SiLU function, please refer to the following resources: Group Normalization and SiLU Function

To use autoencoders in DeepTune, you can follow the following command:

$ python -m autoencoders.autoencoder \
--train_df <str> \
--test_df <str> \
-- num_epochs <int> \
--learning_rate <float> \
[--image_size] <int> \
[--if-grayscale]\
[--use-deeper] \
--out <str> \

Note

The --if-grayscale flag is optional and can be set to True if the input images are grayscale, or False if they are RGB. By default, it is set to False. If the image size is not provided, the input images will be resized to (224, 224) by default. The --use-deeper flag is optional and can be set to True to use a deeper autoencoder architecture, or False to use the basic architecture. By default, it is set to False.

After training is done, the output directory specified with the --out flag will contain the following files:

output_directory
├── reconstructed_images_<yyyymmdd>_<hhmm>
    └── Class <int>
        └── reconstructed_image_<int>.png
        └── ...
├── autoencoder_model.pth

After running the command above, if the user wants to rerun the same model on another holdout set, they can use the following command:

$ python -m autoencoders.autoencoder \
  --test_df <str> \
  --model_weights <str> \
  --out <str> \
  [--input_image_size] <int> \
  [--if-grayscale] \
  [--use-deeper]