FluxSegmentationModels

FluxSegmentationModels is a pure Julia package implementing various semantic segmentation models in Flux.

Available Segmentation Models

ModelSourceImplemented
U-NetU-Net: Convolutional Networks for Biomedical Image Segmentation
FPNFeature Pyramid Networks for Object Detection
SegFormerSegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
SETRRethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
FluxSegmentationModels.UNetType
UNet(encoder_config; decoder_dims=(64,128,256,512,1024), batch_norm=true, upsample_method=:nearest, nclasses=1, inchannels=3)
UNet(encoder, encoder_dims; decoder_dims=(64,128,256,512,1024), batch_norm=true, upsample_method=:nearest, nclasses=1)

Construct a U-Net style segmentation model.

Parameters

  • encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the U-Net.
  • encoder: A Flux.Chain layer containing the blocks of the encoder to be used in the U-Net.
  • encoder_dims: A tuple containing the feature dimension of each encoder block output ordered from first to last.
  • decoder_dims: The feature dimension of each decoder block ordered from top to bottom.
  • batch_norm: If true, a batch norm operation will be applied after each convolution in the decoder.
  • inchannels: The number of channels in the input image.
  • nclasses: The number of output classes for the segmentation task.
  • upsample_method: The method to use for upsampling. Can be :nearest or :bilinear.
source
FluxSegmentationModels.FPNType
FPN(encoder_config; inchannels=3, nclasses=1, pyramid_dim=256, segmentation_dim=128, upsample_method=:nearest, merge_policy=:add, dropout=0.0)
FPN(encoder, encoder_dims; nclasses=1, pyramid_dim=256, segmentation_dim=128, upsample_method=:nearest, merge_policy=:add, dropout=0.0)

A Feature Pyramid Network style decoder. Expects a tuple of block-wise activations as input.

Parameters

  • encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the FPN.
  • inchannels: The number of channels in the input image. Default is 3 for RGB images.
  • nclasses: The number of output classes for the segmentation task. Default is 1.
  • pyramid_dim: The size of the feature pyramid dimension. Default is 256.
  • segmentation_dim: The size of the segmentation dimension. Default is 128.
  • upsample_method: The method to use for upsampling. Can be :nearest or :bilinear.
  • merge_policy: The policy to use for merging features. Can be :add or :concat.
  • dropout: The dropout probability to use after the last layer.
source
FluxSegmentationModels.SegFormerType
SegFormer(encoder_config::EncoderConfig; embed_dim=768, dropout=0.0, nclasses=1, inchannels=3) where N

Construct a SegFormer style segmentation model.

Parameters

  • encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the U-Net.
  • embed_dim: The embedding dimension to use for the decoder. Default is 256.
  • dropout: The dropout probability to use after the last layer of the decoder. Default is 0.0.
  • nclasses: The number of output classes for the segmentation task.
  • inchannels: The number of channels in the input image.
source
FluxSegmentationModels.SETRType
SETR(encoder_config::EncoderConfig; inchannels=3, kw...)
SETR(encoder, encoder_dims; patchsize=(16,16), batch_norm=true, nclasses=1)

Construct a SETR style segmentation model.

Parameters

  • encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the SETR.
  • encoder: A Flux.Chain layer containing the blocks of the encoder to be used in the SETR.
  • encoder_dims: A tuple containing the feature dimension of each encoder block output ordered from first to last.
  • patchsize: The patch size to use for the input to the encoder. Default is (16,16).
  • batch_norm: If true, a batch norm operation will be applied after each convolution in the decoder. Default is true.
  • nclasses: The number of output classes for the segmentation task. Default is 1.
source

Available Encoders

ModelSourceImplemented
ResNetDeep Residual Learning for Image Recognition
ConvNeXtA ConvNet for the 2020s
ViTAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
FluxSegmentationModels.ResNetType
ResNet(;depth=50, pretrain=false)

Configuration for constructing a ResNet encoder.

Parameters

  • depth: The depth of the ResNet architecture. One of 18, 34, 50, 101, or 152.
  • pretrain: If true, the ResNet encoder will be initialized with pretrained weights from ImageNet.
source
FluxSegmentationModels.ConvNeXtType
ConvNeXt(;config=:tiny)

A ConvNeXt style encoder.

Parameters

  • config: The size of the ConvNeXt model to use. Can be :pico, :tiny, :small, :base, :large, or :xlarge.
source
FluxSegmentationModels.ViTType
ViT(;config=:base, patchsize=(16,16), imsize=(256,256), dropout_prob=0.1, mlp_ratio=4.0, qkv_bias=true, pretrain=true)

Construct a ViT style encoder configuration.

Parameters

  • config: The ViT configuration to use. Can be :tiny, :small, :base, :large, or :huge.
  • patchsize: The patch size to use for the encoder stem. Default is (16,16).
  • imsize: The image size to use for the input to the encoder. Default is (224,224).
  • pretrain: Whether to use ImageNet pre-trained weights. Default is false.
source