FluxSegmentationModels
FluxSegmentationModels is a pure Julia package implementing various semantic segmentation models in Flux.
Available Segmentation Models
FluxSegmentationModels.UNet — Type
UNet(encoder_config; decoder_dims=(64,128,256,512,1024), batch_norm=true, upsample_method=:nearest, nclasses=1, inchannels=3)
UNet(encoder, encoder_dims; decoder_dims=(64,128,256,512,1024), batch_norm=true, upsample_method=:nearest, nclasses=1)Construct a U-Net style segmentation model.
Parameters
encoder_config: AnEncoderConfigobject specifying the architecture and configuration of the encoder to be built and used in the U-Net.encoder: AFlux.Chainlayer containing the blocks of the encoder to be used in the U-Net.encoder_dims: A tuple containing the feature dimension of each encoder block output ordered from first to last.decoder_dims: The feature dimension of each decoder block ordered from top to bottom.batch_norm: If true, a batch norm operation will be applied after each convolution in the decoder.inchannels: The number of channels in the input image.nclasses: The number of output classes for the segmentation task.upsample_method: The method to use for upsampling. Can be:nearestor:bilinear.
FluxSegmentationModels.FPN — Type
FPN(encoder_config; inchannels=3, nclasses=1, pyramid_dim=256, segmentation_dim=128, upsample_method=:nearest, merge_policy=:add, dropout=0.0)
FPN(encoder, encoder_dims; nclasses=1, pyramid_dim=256, segmentation_dim=128, upsample_method=:nearest, merge_policy=:add, dropout=0.0)A Feature Pyramid Network style decoder. Expects a tuple of block-wise activations as input.
Parameters
encoder_config: AnEncoderConfigobject specifying the architecture and configuration of the encoder to be built and used in the FPN.inchannels: The number of channels in the input image. Default is3for RGB images.nclasses: The number of output classes for the segmentation task. Default is1.pyramid_dim: The size of the feature pyramid dimension. Default is256.segmentation_dim: The size of the segmentation dimension. Default is128.upsample_method: The method to use for upsampling. Can be:nearestor:bilinear.merge_policy: The policy to use for merging features. Can be:addor:concat.dropout: The dropout probability to use after the last layer.
FluxSegmentationModels.SegFormer — Type
SegFormer(encoder_config::EncoderConfig; embed_dim=768, dropout=0.0, nclasses=1, inchannels=3) where NConstruct a SegFormer style segmentation model.
Parameters
encoder_config: AnEncoderConfigobject specifying the architecture and configuration of the encoder to be built and used in the U-Net.embed_dim: The embedding dimension to use for the decoder. Default is256.dropout: The dropout probability to use after the last layer of the decoder. Default is0.0.nclasses: The number of output classes for the segmentation task.inchannels: The number of channels in the input image.
FluxSegmentationModels.SETR — Type
SETR(encoder_config::EncoderConfig; inchannels=3, kw...)
SETR(encoder, encoder_dims; patchsize=(16,16), batch_norm=true, nclasses=1)Construct a SETR style segmentation model.
Parameters
encoder_config: AnEncoderConfigobject specifying the architecture and configuration of the encoder to be built and used in the SETR.encoder: AFlux.Chainlayer containing the blocks of the encoder to be used in the SETR.encoder_dims: A tuple containing the feature dimension of each encoder block output ordered from first to last.patchsize: The patch size to use for the input to the encoder. Default is(16,16).batch_norm: If true, a batch norm operation will be applied after each convolution in the decoder. Default istrue.nclasses: The number of output classes for the segmentation task. Default is1.
Available Encoders
| Model | Source | Implemented |
|---|---|---|
| ResNet | Deep Residual Learning for Image Recognition | ✅ |
| ConvNeXt | A ConvNet for the 2020s | ✅ |
| ViT | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | ✅ |
FluxSegmentationModels.ResNet — Type
ResNet(;depth=50, pretrain=false)Configuration for constructing a ResNet encoder.
Parameters
depth: The depth of the ResNet architecture. One of18,34,50,101, or152.pretrain: If true, the ResNet encoder will be initialized with pretrained weights from ImageNet.
FluxSegmentationModels.ConvNeXt — Type
ConvNeXt(;config=:tiny)A ConvNeXt style encoder.
Parameters
config: The size of the ConvNeXt model to use. Can be:pico,:tiny,:small,:base,:large, or:xlarge.
FluxSegmentationModels.ViT — Type
ViT(;config=:base, patchsize=(16,16), imsize=(256,256), dropout_prob=0.1, mlp_ratio=4.0, qkv_bias=true, pretrain=true)Construct a ViT style encoder configuration.
Parameters
config: The ViT configuration to use. Can be:tiny,:small,:base,:large, or:huge.patchsize: The patch size to use for the encoder stem. Default is(16,16).imsize: The image size to use for the input to the encoder. Default is(224,224).pretrain: Whether to use ImageNet pre-trained weights. Default isfalse.