Gradient Dude

Bored? Here is yet another Fast&Furious backbone for you!

New day - new SOTA on ImageNet🤯

1.3K views14:38

April 2, 2021

Forwarded from Data Science by ODS.ai 🦜

EfficientNetV2: Smaller Models and Faster Training

A new paper from Google Brain with a new SOTA architecture called EfficientNetV2. The authors develop a new family of CNN models that are optimized both for accuracy and training speed. The main improvements are:

- an improved training-aware neural architecture search with new building blocks and ideas to jointly optimize training speed and parameter efficiency;
- a new approach to progressive learning that adjusts regularization along with the image size;

As a result, the new approach can reach SOTA results while training faster (up to 11x) and smaller (up to 6.8x).

Paper: https://arxiv.org/abs/2104.00298

Code will be available here:
https://github.com/google/automl/efficientnetv2

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-effnetv2

#cv #sota #nas #deeplearning

1.2K views14:38

April 2, 2021

1:33

VR Mind Control from NextMind: decode the act of focusing

Sorry Elon, no need to drill skulls anymore!

The NextMind sensor is non-invasive and can read electrical signals from the brains' visual cortex using small electrodes attached to the skin. Then the machine learning is used to decode brain activity and pinpoint the object of focus, allowing you to control game actions with your mind in real-time.

The sensor itself is surprisingly small and light — it fits in the palm of your hand, with two arms that extend slightly beyond that. It easily fits under a baseball cap. You just need to ensure that the nine sets of two-pronged electrode sensors make contact with your skin

Currently, it's just a dev-kit that can be paired with 3rd party VR headsets including Oculus. The kit retails for $399 and can be already preordered. The functional is limited, but it is only the first step, I'm very excited to see the further development of this technology!

Full review is here.
Thanks @ai_newz for the pointer.

1.5K views15:36

April 3, 2021

Self-supervised Learning for Medical images

Due to fixed imaging procedures, medical images like X-ray or CT scans are usually well aligned geometrically.
This gives an opportunity to utilize such an alignment to automatically mine similar pairs of image patches for self-supervised training.

The basic idea is to fix K random locations in the unlabeled medical images (K locations are the same for every image) and crop image patches across different images (which correspond to scans of different patients).
Now we create a surrogate classification task by assigning a unique pseudo-label to every location 1...K.
Authors combine the surrogate classification task with image restoration using a denoising autoencoder: they randomly perturb the cropped patches (color jittering, random noise, random cut-outs) and train a decoder to restore the original view.

However, sometimes the alignment between medical images is not perfect by default and images may depict different body parts. To make sure that the images are aligned, we train an autoencoder on full images (before cropping) and select only similar images by comparing the distances between them in the learned autoencoder latent space.

Authors show that their method is significantly better than other self-supervised learning approaches on medical data and can even be combined with existing self-supervised methods like RotNet (predicting image rotations). But unfortunately, the comparison is rather limited, and they didn't compare to Jigsaw Puzzle, SwaV, or recent contrastive self-supervised methods like MoCO, BYOL, and SimCLR.

📝 Paper
🛠 Code & Models

#paper_tldr #cv #self_supervised

1.5K views14:37

April 4, 2021

Results. TransVW (transferable visual words) is the proposed method.

1.4K views14:37

April 4, 2021

LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions

A framework that learns meaningful directions in GANs' latent space using unsupervised contrastive learning. Instead of discovering fixed directions such as in previous work, this method can discover non-linear directions in pretrained StyleGAN2 and BigGAN models. The discovered directions may be used for image manipulation.

Authors use the differences caused by an edit operation on the feature activations to optimize the identifiability of each direction. The edit operations are modeled by several separate neural nets ∆_i(z) and learning. Given a latent code z and its generated image x = G(z), we seek to find edit operations ∆_i(z) such that the image x' = G(∆_i(z)) has semantically meaningful changes over x while still preserving the identity of x.

📝 Paper
🛠 Code (next week)

#paper_tldr #cv #gan

2.2K views14:59

April 5, 2021

Spectacular Image Stylization using CLIP and DALL-E

As a Style Transfer Dude, I can say that this is super cool. A statue of David by Michelangelo was used as an input image. Then it was morphed towards different styles of famous artists by steering the latent code towards the embeddings of a textual description in CLIP space.

I especially like Picasso's Cubism where it created a half-bull half-human portrait which is one of the typical sujets of Picasso. Rene Magritte's stylization is my second favorite.

I discussed similar techniques for image editing here and here.

📙Colab which contains the most significant parts to reproduce the results: link.

Original youtube video.
Thanks @NeuralShit for the pointer.

#image_gen #gan #style_transfer

1.7K viewsedited 17:40

April 6, 2021

Joker Donald Trump Inauguration Speech🃏

Look Ma, DeepFakes are getting amazingly good! No need to spend thousands of dollars anymore to create such realistic effects.

Borrowed from @NeuroLands

1.6K viewsedited 06:01

April 7, 2021

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement🔥

This paper proposed an improved way to project real images in the StyleGAN latent space (which is required for further image manipulations).

Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate. The initial estimate is set to just average latent code across the dataset. Inverting is done using multiple of forward passes by iteratively feeding the encoder with the output of the previous step along with the original input.

Notably, during inference, ReStyle converges its inversion after a small number of steps (e.g., < 5), taking less than 0.5 seconds per image. This is compared to several minutes per image when inverting using optimization techniques.

The results are impressive! The L2 and LPIPS loss valeus are comparable to optimization-based techniques, while two orders of magnitude faster!

📝 Paper
🛠 Code
👫 Colab

1.6K viewsedited 19:19

April 8, 2021

0:33

0:33

0:30

🌀 Project page with more results

1.7K views19:19

April 8, 2021

Monkey is playing Pong just using the power of its mind (no joystick)🔥

New demo from Neuralink. A monkey called Pager is playing video games using brain signals for in-game manipulations.
I'm just curious how much more precise is invasive neuralink versus some non-invasive electroencephalography-based sensors?

Now imagine someone with paralysis using a smartphone/computer with their mind. This will be invaluable. I'm not even saying about controlling bionic arms and legs.

2.0K views15:48

April 9, 2021

Forwarded from Neural Shit

1.5K views01:12

April 10, 2021

Ярослав's Notion on Notion

Forwarded from Self Supervised Boy

Self-supervision paper from arxiv for histopathology CV.

Authors draw inspiration from the process of how histopathologists tend to review the images, and how those images are stored. Histopathology images are multiscale slices of enormous size (tens of thousands pixels by one side), and area experts constantly move through different levels of magnification to keep in mind both fine and coarse structures of the tissue.

Therefore, in this paper the loss is proposed to capture relation between different magnification levels. Authors propose to train network to order concentric patches by their magnification level. They organise it as the classification task — network to predict id of the order permutation instead of predicting order itself.

Also, authors proposed specific architecture for this task and appended self-training procedure, as it was shown to boost results even after pre-training.

All this allows them to reach quality increase even in high-data regime.

My description of the architecture and loss expanded here.
Source of the work here.

Self-supervised driven consistency training for annotation efficient histopathology image analysis | Notion

In this paper authors gain insight for the new loss from the way histopathologists work with images. Since the enormous scale of the images for histopathological research it is stored in pyramid-like structure with different zoom level, so researches tend…

1.6K views14:03

April 11, 2021