FluxSegmentationModels

FluxSegmentationModels is a pure Julia package implementing various semantic segmentation models in Flux.

Available Segmentation Models

Model	Source	Implemented
U-Net	U-Net: Convolutional Networks for Biomedical Image Segmentation	✅
FPN	Feature Pyramid Networks for Object Detection	✅
SegFormer	SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	✅
SETR	Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers	✅

FluxSegmentationModels.UNet — Type

UNet(encoder_config; decoder_dims=(64,128,256,512,1024), batch_norm=true, upsample_method=:nearest, nclasses=1, inchannels=3)
UNet(encoder, encoder_dims; decoder_dims=(64,128,256,512,1024), batch_norm=true, upsample_method=:nearest, nclasses=1)

Construct a U-Net style segmentation model.

Parameters

encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the U-Net.
encoder: A Flux.Chain layer containing the blocks of the encoder to be used in the U-Net.
encoder_dims: A tuple containing the feature dimension of each encoder block output ordered from first to last.
decoder_dims: The feature dimension of each decoder block ordered from top to bottom.
batch_norm: If true, a batch norm operation will be applied after each convolution in the decoder.
inchannels: The number of channels in the input image.
nclasses: The number of output classes for the segmentation task.
upsample_method: The method to use for upsampling. Can be :nearest or :bilinear.

source

FluxSegmentationModels.FPN — Type

FPN(encoder_config; inchannels=3, nclasses=1, pyramid_dim=256, segmentation_dim=128, upsample_method=:nearest, merge_policy=:add, dropout=0.0)
FPN(encoder, encoder_dims; nclasses=1, pyramid_dim=256, segmentation_dim=128, upsample_method=:nearest, merge_policy=:add, dropout=0.0)

A Feature Pyramid Network style decoder. Expects a tuple of block-wise activations as input.

Parameters

encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the FPN.
inchannels: The number of channels in the input image. Default is 3 for RGB images.
nclasses: The number of output classes for the segmentation task. Default is 1.
pyramid_dim: The size of the feature pyramid dimension. Default is 256.
segmentation_dim: The size of the segmentation dimension. Default is 128.
upsample_method: The method to use for upsampling. Can be :nearest or :bilinear.
merge_policy: The policy to use for merging features. Can be :add or :concat.
dropout: The dropout probability to use after the last layer.

source

FluxSegmentationModels.SegFormer — Type

SegFormer(encoder_config::EncoderConfig; embed_dim=768, dropout=0.0, nclasses=1, inchannels=3) where N

Construct a SegFormer style segmentation model.

Parameters

encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the U-Net.
embed_dim: The embedding dimension to use for the decoder. Default is 256.
dropout: The dropout probability to use after the last layer of the decoder. Default is 0.0.
nclasses: The number of output classes for the segmentation task.
inchannels: The number of channels in the input image.

source

FluxSegmentationModels.SETR — Type

SETR(encoder_config::EncoderConfig; inchannels=3, kw...)
SETR(encoder, encoder_dims; patchsize=(16,16), batch_norm=true, nclasses=1)

Construct a SETR style segmentation model.

Parameters

encoder_config: An EncoderConfig object specifying the architecture and configuration of the encoder to be built and used in the SETR.
encoder: A Flux.Chain layer containing the blocks of the encoder to be used in the SETR.
encoder_dims: A tuple containing the feature dimension of each encoder block output ordered from first to last.
patchsize: The patch size to use for the input to the encoder. Default is (16,16).
batch_norm: If true, a batch norm operation will be applied after each convolution in the decoder. Default is true.
nclasses: The number of output classes for the segmentation task. Default is 1.

source

Available Encoders

Model	Source	Implemented
ResNet	Deep Residual Learning for Image Recognition	✅
ConvNeXt	A ConvNet for the 2020s	✅
ViT	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	✅

FluxSegmentationModels.ResNet — Type

ResNet(;depth=50, pretrain=false)

Configuration for constructing a ResNet encoder.

Parameters

depth: The depth of the ResNet architecture. One of 18, 34, 50, 101, or 152.
pretrain: If true, the ResNet encoder will be initialized with pretrained weights from ImageNet.

source

FluxSegmentationModels.ConvNeXt — Type

ConvNeXt(;config=:tiny)

A ConvNeXt style encoder.

Parameters

config: The size of the ConvNeXt model to use. Can be :pico, :tiny, :small, :base, :large, or :xlarge.

source

FluxSegmentationModels.ViT — Type

ViT(;config=:base, patchsize=(16,16), imsize=(256,256), dropout_prob=0.1, mlp_ratio=4.0, qkv_bias=true, pretrain=true)

Construct a ViT style encoder configuration.

Parameters

config: The ViT configuration to use. Can be :tiny, :small, :base, :large, or :huge.
patchsize: The patch size to use for the encoder stem. Default is (16,16).
imsize: The image size to use for the input to the encoder. Default is (224,224).
pretrain: Whether to use ImageNet pre-trained weights. Default is false.

source