read-paper-list

semantic segmentation/object detection/light-weight network/instance segmentation

Deep-base-network

ImageNet Classification with Deep Convolutional Neural Networks(AlexNet)
Very Deep Convolutional Networks For Large-Scale Image Recognition(VGG)
Network In Network(NIN)
Going Deeper with Convolutions(GoogleNet)
Deep Residual Learning for Image Recognition(ResNet)
Densely Connected Convolutional Networks(DenseNet)
Squeeze-and-Excitation Networks(SENet)
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks(GENet)
Non-local Neural Networks
Convolutional Neural Networks with layer reuse(LruNet)
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond(GCNet)
Rethinking ImageNet Pre-training
Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

light-weight network

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size(SqueezeNet)
Mobilenets: Efficient convolutional neural networks for mobile vision applications(Mobilenet V1)
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices(ShuffleNet V1)
Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation(Mobilenet V2)
SqueezeNext: Hardware-Aware Neural Network Design(SqueezeNext)
CondenseNet: An Efficient DenseNet using Learned Group Convolutions(CondenseNet)
Pelee: A Real-Time Object Detection System on Mobile Devices(PeleeNet)
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design(ShuffleNet V2)
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation(ESPNet)
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions(ChannelNets)
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network(ESPNetV2)
Interleaved Group Convolutions for Deep Neural Networks(IGCV1)
IGCV2: Interleaved Structured Sparse Convolutional Neural Networks(IGCV2)
IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks(IGCV3)
MnasNet: Platform-Aware Neural Architecture Search for Mobile(MnasNet)
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search(FBNet)
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks(EfficientNet)
DiCENet: Dimension-wise Convolutions for Efficient Networks(DiCENet)
Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

semantic segmentation

Fully Convolutional Networks for Semantic Segmentation(FCN)
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation(SegNet)
U-Net: Convolutional Networks for Biomedical Image Segmentation(UNet)
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs(Deeplab v1)
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution,and Fully Connected CRFs(Deeplab v2)
Understanding Convolution for Semantic Segmentation(DUC)
Pyramid Scene Parsing Network(PSPNet)
Large Kernel Matters – Improve Semantic Segmentation by Global Convolutional Network(GCN)
Rethinking Atrous Convolution for Semantic Image Segmentation(Deeplab v3)
DenseASPP for Semantic Segmentation in Street Scenes（DenseASPP）
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation(Deeplab v3plus）
Context Encoding for Semantic Segmentation(EncNet)
Learning a Discriminative Feature Network for Semantic Segmentation(DFN)
Smoothed Dilated Convolutions for Improved Dense Prediction(SDC)
Pyramid Attention Network for Semantic Segmentation(PAN)
Exploring Context with Deep Structured models for Semantic Segmentation(FeatMap-Net)
ExFuse: Enhancing Feature Fusion for Semantic Segmentation(ExFuse)
Dilated Residual Networks(DRN)
Dual Attention Network for Scene Segmentation(DANet)
OCNet:Object Context Network for Scene Parsing(OCNet)
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation(RefineNet)
Dense Relation Network: Learning Consistent And Context-Aware Prepresentation For Semantic Image Segmentation(DRN)
CCNet: Criss-Cross Attention for Semantic Segmentation(CCNet)
Unified Perceptual Parsing for Scene Understanding(UPerNet)
Tree-structured Kronecker Convolutional Networks for Semantic Segmentation(TKNet)
NeuroIoU: Learning a Surrogate Loss for Semantic Segmentation(NeuroIoU)
Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregation
GFF: Gated Fully Fusion for Semantic Segmentation（GFF）
Learning Fully Dense Neural Networks for Image Semantic Segmentation（FDNet）
ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation(ZigZagNet)
Adaptive Pyramid Context Network for Semantic Segmentation(APCNet)
Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation
ACFNet: Attentional Class Feature Network for Semantic Segmentation(ACFNet)
Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images
Dual Graph Convolutional Network for Semantic Segmentation
Global Aggregation then Local Distribution in Fully Convolutional Networks
Dynamic Multi-scale Filters for Semantic Segmentation
Unifying Training and Inference for Panoptic Segmentation
Semantic Flow for Fast and Accurate Scene Parsing
AlignSeg: Feature-Aligned Segmentation Networks
Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks
Context Prior for Scene Segmentation

fast/real-time segmentation

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation(ENet)
ICNet for Real-Time Semantic Segmentation(ICNet)
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation(BiSeNet)
LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation(LinkNet)
Rtseg: Real-Time Semantic Segmentation Comparative Study
Shuffleseg: Real-Time Semantic Segmentation Network(Shuffleseg)
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation(ESPNet)
Light-Weight RefineNet for Real-Time Semantic Segmentation(Light-Weight RefineNet)
LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation(LinkNet)
D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction(D-LinkNet)
CGNet: A Light-weight Context Guided Network for Semantic Segmentation(CGNet)
Efficient ConvNet for Real-time Semantic Segmentation
A Comparative Study of Real-time Semantic Segmentation for Autonomous Driving
ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time(ContextNet)
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network(ESPNetV2)
ShelfNet for Real-time semantic segmentation(ShelfNet)
ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation(ERFNet)
Concentrated-Comprehensive Convolutions for lightweight semantic segmentation(CCCNet)
DSNet for Real-Time Driving Scene Semantic Segmentation(DSNet)
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation(EDANet)
Fast-SCNN: Fast Semantic Segmentation Network(Fast-SCNN)
Guided Upsampling Network for Real-Time Semantic Segmentation(GUN)
In Defense of Pre-trained ImageNet Architecturesfor Real-time Semantic Segmentation of Road-driving Images(SwiftNetRN)
Residual Pyramid Learning for Single-Shot Semantic Segmentation(RPNet)
DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation(DFANet)
DSNet: An Efficient CNN for Road Scene Segmentation(DSNet)
Spatial Sampling Network for Fast Scene Understanding
RGPNET: A REAL-TIME GENERAL PURPOSE SEMANTIC SEGMENTATION
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation
FASTERSEG: SEARCHING FOR FASTER REAL-TIME SEMANTIC SEGMENTATION
Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Customizable Architecture Search for Semantic Segmentation
Semantic Flow for Fast and Accurate Scene Parsing
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
ASNet: Aggregated Scale Transformations for Real-Time Semantic Segmentation

Deep object detection

Rich feature hierarchies for accurate object detection and semantic segmentation(R-CNN)
SSD: Single Shot MultiBox Detector(SSD)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(Faster R-CNN)
Feature Pyramid Networks for Object Detection(FPN)
Is Faster R-CNN Doing Well for Pedestrian Detection?(RPN_BF)
Training Region-based Object Detectors with Online Hard Example Mining(OHEM)
Receptive Field Block Net for Accurate and Fast Object Detection(RFBNet)
Focal Loss for Dense Object Detection(RetinaNet)
Single-Shot Refinement Neural Network for Object Detection(RefinDet)
PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection(PVANET)
Multi-label learning of part detectors for heavily occluded pedestrian detection(JL-TopS)
Graininess-aware Deep Feature Learning for Pedestrian Detection(GDFL)
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network(M2Det)
CFENet: An Accurate and Efficient Single-Shot Object Detector for Autonomous Driving(CFENet)
ScratchDet: Training Single-Shot Object Detectors from Scratch(ScratchDet)
Pooling Pyramid Network for Object Detection（PPN）
ThunderNet: Towards Real-time Generic Object Detection(ThunderNet)
Light-Weight RetinaNet for Object Detection
CornerNet: Detecting Objects as Paired Keypoints(CornerNet)
Bottom-up Object Detection by Grouping Extreme and Center Points(ExtremeNet)
RepPoints: Point Set Representation for Object Detection(RepPoints)
FCOS: Fully Convolutional One-Stage Object Detection(FCOS)
Mask-Guided Attention Network for Occluded Pedestrian Detection
Learning Rich Features at High-Speed for Single-Shot Object Detection.
Dynamic Anchor Feature Selection for Single-Shot Object Detection.
Contextual Attention for Hand Detection in the Wild
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
Multiple Anchor Learning for Visual Object Detection
NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection
Is Sampling Heuristics Necessary in Training Deep Object Detectors?
Rethinking Classification and Localization for Object Detection
Multiple Anchor Learning for Visual Object Detection
Learning from Noisy Anchors for One-stage Object Detection
Learning a Unified Sample Weighting Network for Object Detection∗
D2Det: Towards High Quality Object Detection and Instance Segmentation
AugFPN: Improving Multi-scale Feature Learning for Object Detection
Scale-Equalizing Pyramid Convolution for Object Detection

Face Detection

S3FD: Single Shot Scale-invariant Face Detector(SFD)
FaceBoxes: A CPU Real-time Face Detector with High Accuracy(FaceBoxes)
Detecting Face with Densely Connected Face Proposal Network(DCFPN)
SSH: Single Stage Headless Face Detector(SSH)
DSFD: Dual Shot Face Detector(DSFD)
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks(MTCNN)
PyramidBox: A Context-assisted Single Shot Face Detector(PyramidBox)
SRN:Selective Refinement Network for High Performance Face Detection(SRN)
Single Shot Attention-Based Face Detector(AFN)
Improved Selective Refinement Network for Face Detection(ISRN)
PyramidBox++: High Performance Detector for Finding Tiny Face(PyramidBox++)
RetinaFace: Single-stage Dense Face Localisation in the Wild(RetinaFace)

Instance segmentation

Fully Convolutional Instance-aware Semantic Segmentation(FCIS)
Instance-aware Semantic Segmentation via Multi-task Network Cascades(MNC)
Mask R-CNN
Mask Scoring R-CNN
Path Aggregation Network for Instance Segmentation(PANet)
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free(RetinaMask)
YOLACT Real-time Instance Segmentation(YOLACT)
Parsing R-CNN for Instance-Level Human Analysis(Parsing R-CNN)
BlitzNet: A Real-Time Deep Network for Scene Understanding(BlitzNet)
Hybrid Task Cascade for Instance Segmentation(HTC)
Triply Supervised Decoder Networks for Joint Detection and Segmentation(TripleNet)
ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation(ZigZagNet)
Bounding Box Embedding for Single Shot Person Instance Segmentation
Shape-aware Feature Extraction for Instance Segmentation
Real-Time Panoptic Segmentation from Dense Detections
EmbedMask: Embedding Coupling for One-stage Instance Segmentation
PolyTransform: Deep Polygon Transformer for Instance Segmentation
SOLO: Segmenting Objects by Locations
RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation
SSAP: Single-Shot Instance Segmentation With Affinity Pyramid
YOLACT++：Better Real-time Instance Segmentation
SAIS: Single-stage Anchor-free Instance Segmentation
PolarMask: Single Shot Instance Segmentation with Polar Representation
BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation

Mutil-task learning

End-to-End Multi-Task Learning with Attention.
Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics.
BlitzNet: A Real-Time Deep Network for Scene Understanding.
Triply Supervised Decoder Networks for Joint Detection and Segmentation
Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving.
Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation.
GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Network
MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning
Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems
MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning
AP-MTL: Attention Pruned Multi-task Learning Model for Real-time Instrument Detection and Segmentation in Robot-assisted Surgery

non-deep object detection

Robust Real-Time Face Detection(Haar+Adaboost)
Integral Channel Features(ICF)
The Fastest Pedestrian Detector in the West(FPDW)
Fast Feature Pyramids for Object Detection(ACF)
Local Decorrelation for Improved Pedestrian Detection(LDCF)
Convolutional Channel Features(CCF)
Informed Haar-like Features Improve Pedestrian Detection(InformedHaar)
Fast Pedestrian Detection for Mobile Devices(FastCF)
Pedestrian detection at 100 Frames Per Second(VeryFast)
To Boost or Not to Boost? On the Limits of Boosted Trees for Object Detection(ACF+/LDCF+)
Filtered channel features for pedestrian detection(Checkerboard)
Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry(NNNF)
Aggregate Channel Features for Multi-view Face Detection(ACFFace)
Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning(SpatialPooling+)
BAdaCost: Multi-class Boosting with Costs(BAdaCost)
Exploring Prior Knowledge for Pedestrian Detection(SCCPriors)
A Fast, Modular Scene Understanding System using Context-Aware Object Detection(SC-ACF）
Ten Years of Pedestrian Detection,What Have We Learned?(Katamari)
How Far are We from Solving Pedestrian Detection?
What Can Help Pedestrian Detection?
Taking a Deeper Look at Pedestrians
Semantic Channels for Fast Pedestrian Detection(MRFC+Semantic)
Fast Boosting based Detection using Scale Invariant Multimodal Multiresolution Filtered Features
Learning Multilayer Channel Features for Pedestrian Detection
Fast and Robust Object Detection Using Visual Subcategories
Learning to Detect Vehicles by Clustering Appearance Patterns(Subcat)
Looking at Pedestrians at Different Scales: A Multiresolution Approach and Evaluations(MR-ACF)
Multiresolution models for object detection
Face Detection without Bells and Whistles
Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework
An Exploration of Why and When Pedestrian Detection Fails
Discriminative Sub-categorization