Gradient Dude
2.62K subscribers
180 photos
50 videos
2 files
169 links
TL;DR for DL/CV/ML/AI papers from an author of publications at top-tier AI conferences (CVPR, NIPS, ICCV,ECCV).

Most ML feeds go for fluff, we go for the real meat.

YouTube: youtube.com/c/gradientdude
IG instagram.com/gradientdude
Download Telegram
April 2, 2021
April 2, 2021
​​EfficientNetV2: Smaller Models and Faster Training

A new paper from Google Brain with a new SOTA architecture called EfficientNetV2. The authors develop a new family of CNN models that are optimized both for accuracy and training speed. The main improvements are:

- an improved training-aware neural architecture search with new building blocks and ideas to jointly optimize training speed and parameter efficiency;
- a new approach to progressive learning that adjusts regularization along with the image size;

As a result, the new approach can reach SOTA results while training faster (up to 11x) and smaller (up to 6.8x).

Paper: https://arxiv.org/abs/2104.00298

Code will be available here:
https://github.com/google/automl/efficientnetv2

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-effnetv2

#cv #sota #nas #deeplearning
April 2, 2021
This media is not supported in your browser
VIEW IN TELEGRAM
April 3, 2021
​​Self-supervised Learning for Medical images

Due to fixed imaging procedures, medical images like X-ray or CT scans are usually well aligned geometrically.
This gives an opportunity to utilize such an alignment to automatically mine similar pairs of image patches for self-supervised training.

The basic idea is to fix K random locations in the unlabeled medical images (K locations are the same for every image) and crop image patches across different images (which correspond to scans of different patients).
Now we create a surrogate classification task by assigning a unique pseudo-label to every location 1...K.
Authors combine the surrogate classification task with image restoration using a denoising autoencoder: they randomly perturb the cropped patches (color jittering, random noise, random cut-outs) and train a decoder to restore the original view.

However, sometimes the alignment between medical images is not perfect by default and images may depict different body parts. To make sure that the images are aligned, we train an autoencoder on full images (before cropping) and select only similar images by comparing the distances between them in the learned autoencoder latent space.

Authors show that their method is significantly better than other self-supervised learning approaches on medical data and can even be combined with existing self-supervised methods like RotNet (predicting image rotations). But unfortunately, the comparison is rather limited, and they didn't compare to Jigsaw Puzzle, SwaV, or recent contrastive self-supervised methods like MoCO, BYOL, and SimCLR.

πŸ“ Paper
πŸ›  Code & Models

#paper_tldr #cv #self_supervised
April 4, 2021
April 4, 2021
April 5, 2021
Media is too big
VIEW IN TELEGRAM
April 6, 2021
Media is too big
VIEW IN TELEGRAM
April 7, 2021
April 8, 2021
Media is too big
VIEW IN TELEGRAM
April 9, 2021
Forwarded from Neural Shit
April 10, 2021
Forwarded from Self Supervised Boy
Self-supervision paper from arxiv for histopathology CV.

Authors draw inspiration from the process of how histopathologists tend to review the images, and how those images are stored. Histopathology images are multiscale slices of enormous size (tens of thousands pixels by one side), and area experts constantly move through different levels of magnification to keep in mind both fine and coarse structures of the tissue.

Therefore, in this paper the loss is proposed to capture relation between different magnification levels. Authors propose to train network to order concentric patches by their magnification level. They organise it as the classification task β€” network to predict id of the order permutation instead of predicting order itself.

Also, authors proposed specific architecture for this task and appended self-training procedure, as it was shown to boost results even after pre-training.

All this allows them to reach quality increase even in high-data regime.

My description of the architecture and loss expanded here.
Source of the work here.
April 11, 2021
​​DetCon: The Self-supervised Contrastive Detection MethodπŸ₯½
DeepMind

A new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations.

Object-based regions are identified with an approximate, automatic segmentation algorithm based on pixel affinity (bottom). These masks are carried through two stochastic data augmentations and a convolutional feature extractor, creating groups of feature vectors in each view (middle). The contrastive detection objective then pulls together pooled feature vectors from the same mask (across views) and pushes apart features from different masks and different images (top).

🌟Highlights
+ SOTA detection and Instance Segmentation (on COCO) and Semantic Segmentation results (on PASCAL) when pretrained in self-supervised regime on ImageNet, while requiring up to 5Γ— fewer epochs than SimCLR.
+ It also outperforms supervised pretraining on Imagenet.
+ DetCon(SimCLR) converges much faster to reach SOTA: 200 epochs are sufficient to surpass supervised transfer to COCO, and 500 to PASCAL.
+ Linear increase in the number of model parameters (using ResNet-101, ResNet-152, and ResNet-200) brings a linear increase in the accuracy on downstream tasks.
+ Despite only being trained on ImageNet, DetCon(BYOL) matches the performance of Facebook's SEER model that used a higher capacity RegNet architecture and was pretrained on 1 Billion Instagram images.
+ First time a ResNet-50 with self-supervised pretraining on COCO outperforms the supervised pretraining for Transfer to PASCAL
+ The power of DetCon strongly correlates with the quality of the masks. The better the masks used during the self-supervised pretraining stage, the better the accuracy on downstream tasks.

βš™οΈ Method details
DetConS and DetConB, based on two recent self-supervised baselines: SimCLR and BYOL respectively with ResNet-50 backbone.
Authors adopt the data augmentation procedure and network architecture from these methods while applying the proposed Contrastive Detection loss to each.

Each image is randomly augmented twice, resulting in two images: x, x'.
In addition, they compute for each image a set of masks that segment the image into different components.
These masks can be computed using efficient, off-the-shelf, unsupervised segmentation algorithms. In particular, authors use Felzenszwalb-Huttenlocher algorithm a classic segmentation procedure that iteratively merges regions using pixel-based affinity. This algorithm does not require any training and is available in scikit-image. If available, human-annotated segmentations can also be used instead of automatically generated. Each mask (represented as a binary image) is transformed using the same cropping and resizing as used for the underlying RGB image, resulting in two sets of masks {m}, {m'} which are aligned with the augmented images x, x'.

For every mask m associated with the image, authors compute a mask-pooled hidden vector (i.e., similar to regular average pooling but applied only to spatial locations belonging to the same mask).
Then 2-layer MLP is used as a projection on top of the mask-pooled hidden vectors. Note that if you replace masked-pooling with a single global average pooling then you will get exactly SimCLR or BYOL architecture.

Standard contrastive loss based on cross-entropy is used for learning. Positive pair is the latent representations of the same mask from augmented views x and x'. Latent representations of different masks from the same image and from different images in the batch are used as negative samples. Moreover, negative masks are allowed to overlap with a positive one.
April 11, 2021