推荐阅读：

ICCV2021最新信息及论文下载贴（Papers/Codes/Project/PaperReading／Demos/直播分享／论文分享会等）

官网链接：http://iccv2021.thecvf.com
时间：2021年10月11日-10月17日
论文接收公布时间：2021年7月22日

1.ICCV2021接受论文/代码分方向汇总（更新中）

分类目录：

1. 检测

2D目标检测(2D Object Detection)
视频目标检测(Video Object Detection)
3D目标检测(3D Object Detection)
人物交互检测(HOI Detection)
伪装目标检测(Camouflaged Object Detection)
旋转目标检测(Rotation Object Detection)
显著性目标检测(Saliency Object Detection)
图像异常检测/表面缺陷检测(Anomally Detection in Image)
关键点检测(Keypoint Detection)
边缘检测(Edge Detection)

2. 分割(Segmentation)

图像分割(Image Segmentation)
全景分割(Panoptic Segmentation)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
超像素(Superpixel)
视频目标分割(Video Object Segmentation)
参考图像分割(Referring Image Segmentation)
抠图(Matting)
密集预测(Dense Prediction)

3. 图像处理(Image Processing)

超分辨率(Super Resolution)
图像复原/图像增强(Image Restoration)
图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)
图像去噪/去模糊/去雨去雾(Image Denoising)
图像编辑/修复(Image Edit/Image Inpainting)
图像翻译(Image Translation)
图像质量评估(Image Quality Assessment)
风格迁移(Style Transfer)
网络视频传输(Neural Video Delivery)

4. 估计(Estimation)

姿态估计(Pose Estimation)
手势估计(Gesture Estimation)
光流/位姿/运动估计(Flow/Pose/Motion Estimation)
深度估计(Depth Estimation)

5. 图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)
行人重识别/检测(Re-Identification/Detection)
图像/视频字幕(Image/Video Caption)

6. 人脸(Face)

人脸识别/检测(Facial Recognition/Detection)
人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

10. 文本检测/识别(Text Detection/Recognition)

11. 遥感图像(Remote Sensing Image)

12. GAN/生成式/对抗式(GAN/Generative/Adversarial)

13. 图像生成/合成(Image Generation/Image Synthesis)

视图合成(View Synthesis)

14. 场景图(Scene Graph)

场景图生成(Scene Graph Generation)
场景图预测(Scene Graph Prediction)
场景图理解(Scene Graph Understanding)

16. 视觉推理/视觉问答(Visual Reasoning/VQA)

17. 图像分类(Image Classification)

18. 神经网络设计与优化(Neural Network Design & Optimization)

CNN
Attention
Transformer
图神经网络(GNN)
神经网络架构搜索(NAS)
损失函数(Loss Function)
可视化/可解释性(Visualization/Interpretability)

19. 模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)
剪枝(Pruning)
量化(Quantization)

20. 模型训练/泛化/预测(Model Training/Generalization/Prediction)

噪声标签(Noisy Label)
长尾分布(Long-Tailed Distribution)
分布外样本(Out of Distribution Examples)

22. 数据处理(Data Processing)

数据增广(Data Augmentation)
表征学习(Representation Learning)
归一化/正则化(Batch Normalization)
图像聚类(Image Clustering)
图像压缩(Image Compression)
异常检测(Anomaly Detection)

24. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)

25. 持续学习(Continual Learning/Life-long Learning)

26. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

28. 对比学习(Contrastive Learning)

29. 增量学习(Incremental Learning)

30. 强化学习(Reinforcement Learning)

32. 多模态学习(Multi-Modal Learning)

视听学习(Audio-visual Learning)
视觉语言(Visual & Language)

33. 视觉预测(Vision-based Prediction)

检测

2D目标检测(2D Object Detection)

[14] FOVEA: Foveated Image Magnification for Autonomous Navigation
paper | project

[13] DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
paper

[12] G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
paper

[11] Vector-Decomposed Disentanglement for Domain-Invariant Object Detection
paper

[10] Oriented R-CNN for Object Detection
paper | code

[9] Conditional DETR for Fast Training Convergence
paper | code

[8] Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters
paper | code

[7] GraphFPN: Graph Feature Pyramid Network for Object Detection
paper
解读：复旦&港大提出GraphFPN：用图特征金字塔提升目标检测性能！

[6] SimROD: A Simple Adaptation Method for Robust Object Detection
paper

[5] Active Learning for Deep Object Detection via Probabilistic Modeling
paper

[4] Detecting Invisible People
paper | project | video

[3] Conditional Variational Capsule Network for Open Set Recognition
paper | code

[2] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
解读：无需检测器提取特征！LeCun团队提出MDETR：实现真正的端到端多模态推理

[1] DetCo: Unsupervised Contrastive Learning for Object Detection
paper | code
解读：性能优于何恺明团队MoCo v2，DetCo：为目标检测定制任务的对比学习

3D目标检测(3D Object Detection)

[6] LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
paper

[5] RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
paper

[4] Is Pseudo-Lidar needed for Monocular 3D Object detection?
paper

[3] Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
paper | code

[2] Geometry Uncertainty Projection Network for Monocular 3D Object Detection
paper

[1] Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
paper

视频目标检测(Video Object Detection)

[1] Social Fabric: Tubelet Compositions for Video Relation Detection
paper | code

人物交互检测(HOI Detection)

[1] Exploiting Scene Graphs for Human-Object Interaction Detection
paper | code

显著性目标检测(Saliency Object Detection)

[2] Specificity-preserving RGB-D Saliency Detection
paper | code

[1] Disentangled High Quality Salient Object Detection
paper

伪装目标检测(Camouflaged Object Detection)

[1] TransForensics: Image Forgery Localization with Dense Self-Attention
paper

图像异常检测/表面缺陷检测(Anomally Detection in Image)

[2] DRÆM -- A discriminatively trained reconstruction embedding for surface anomaly detection
paper

[1] Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection
paper

边缘检测(Edge Detection)

[2] Pixel Difference Networks for Efficient Edge Detection
paper | code

[1] RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
paper

分割(Segmentation)

图像分割(Image Segmentation)

[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project

[1] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code

实例分割(Instance Segmentation)

[6] A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation
paper

[5] Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
paper | code

[4] SOTR: Segmenting Objects with Transformers
paper | code

[3] Hierarchical Aggregation for 3D Instance Segmentation
paper | code

[2] Crossover Learning for Fast Online Video Instance Segmentation
code

[1] Instances as Queries
paper | code

语义分割(Semantic Segmentation)

[21] Mining Contextual Information Beyond Image for Semantic Segmentation
paper

[20] Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
paper | project

[19] Self-Regulation for Semantic Segmentation
paper

[18] Multi-Anchor Active Domain Adaptation for Semantic Segmentation(Oral)
paper

[17] Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
paper

[16] Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
paper

[15] LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation(Oral)
paper

[14] Dual Path Learning for Domain Adaptation of Semantic Segmentation
paper | code

[13] Deep Metric Learning for Open World Semantic Segmentation
paper

[12] Complementary Patch for Weakly Supervised Semantic Segmentation
paper

[11] RECALL: Replay-based Continual Learning in Semantic Segmentation
paper

[10] Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
paper ｜ code

[9] Learning Meta-class Memory for Few-Shot Semantic Segmentation
paper

[8] Personalized Image Semantic Segmentation
paper

[7] VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
paper | code

[6] Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
paper

[5] ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation(点云语义分割)
paper

[4] Domain Adaptive Video Segmentation via Temporal Consistency Regularization(video semantic segmentation)
paper | code

[3] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
paper

[2] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code

[1] Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
paper | code

视频目标分割(Video Object Segmentation)

[2] Joint Inductive and Transductive Learning for Video Object Segmentation
paper | code

[1] Full-Duplex Strategy for Video Object Segmentation
paper | project

参考图像分割(Referring Image Segmentation)

[1] Vision-Language Transformer and Query Generation for Referring Segmentation
paper | code

密集预测(Dense Prediction)

[1] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
paper | code

人脸(Face)

[1] Learning Facial Representations from the Cycle-consistency of Face
paper

人脸识别/检测(Facial Recognition/Detection)

[4] TransFER: Learning Relation-aware Facial Expression Representations with Transformers
paper

[3] Understanding and Mitigating Annotation Bias in Facial Expression Recognition
paper

[2] SynFace: Face Recognition with Synthetic Data
paper

[1] PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
paper

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[5] FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
paper

[4] Disentangled Lifespan Face Synthesis
paper | code

[3] MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement(音频驱动面部动画)
paper | video

[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code

[1] HeadGAN: One-shot Neural Head Synthesis and Editing
paper

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

[1] Exploring Temporal Coherence for More General Video Face Forgery Detection
paper

三维视觉(3D Vision)

[3] Differentiable Surface Rendering via Non-Differentiable Sampling
paper

[2] M3D-VTON: A Monocular-to-3D Virtual Try-On Network(3D试穿)
paper

[1] Score-Based Point Cloud Denoising
paper

点云(Point Cloud)

[14] A Robust Loss for Point Cloud Registration
paper

[13] CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
paper | project

[12] Voxel-based Network for Shape Completion by Leveraging Edge Generation
paper | code

[11] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers(点云补全)(Oral)
paper | code

[10] ME-PCN: Point Completion Conditioned on Mask Emptiness(点云补全)
paper

[9] Adaptive Graph Convolution for Point Cloud Analysis
paper | code

[8] PICCOLO: Point Cloud-Centric Omnidirectional Localization
paper

[7] AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds
paper | code

[6] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
paper | code

[5] DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
paper

[4] Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
paper | code

[3] (Just) A Spoonful of Refinements Helps the Registration Error Go Down(Oral)
paper

[2] Learning with Noisy Labels for Robust Point Cloud Segmentation(点云分割)
paper | code

[1] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
paper | project

三维重建(3D Reconstruction)

[12] Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
paper

[11] Gravity-Aware Monocular 3D Human-Object Reconstruction
paper | code

[10] 3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces
paper | code

[9] VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
paper

[8] Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction
paper

[7] Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility
paper | code

[6] Deep Hybrid Self-Prior for Full 3D Mesh Generation
paper | project

[5] PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
paper

[4] Learning Canonical 3D Object Representation for Fine-Grained Recognition
paper

[3] ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
paper

[2] Discovering 3D Parts from Image Collections
paper | project

[1] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
paper | code

神经网络设计与优化(Neural Network Structure Design & Optimization)

[2] Unifying Nonlocal Blocks for Neural Networks
paper

[1] Energy-Based Open-World Uncertainty Modeling for Confidence Calibration(置信度校准)
paper

CNN

[3] MicroNet: Improving Image Recognition with Extremely Low FLOPs
paper | code1 | code2

[2] Learning to Resize Images for Computer Vision Tasks
paper

[1] Bias Loss for Mobile Neural Networks
paper
解读：超越MobileNet V3 | 详解SkipNet+Bias Loss=轻量化模型新的里程碑

Attention

[6] Causal Attention for Unbiased Visual Recognition
paper | code

[5] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification(因果推理)(细粒度识别)
paper | code

[4] Residual Attention: A Simple but Effective Method for Multi-Label Recognition
paper

[3] Fast Convergence of DETR with Spatially Modulated Co-Attention
paper | code

[2] SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition
paper | code

[1] FcaNet: Frequency Channel Attention Networks
paper | code

Transformer

[10] An Empirical Study of Training Self-Supervised Vision Transformers(Oral)
paper
解读：解决训练不稳定性，何恺明团队新作来了！自监督学习+Transformer=MoCoV3

[9] LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference
paper | code
解读：FaceBook提出LeViT，0.077ms的单图处理速度却拥有ResNet50的精度

[8] Emerging Properties in Self-Supervised Vision Transformers
paper | code
解读：当Transformer遇见自监督学习！Facebook重磅开源DINO

[7] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
paper | code
解读：ResNet被全面超越了，是Transformer干的：依图科技开源“可大可小”T2T-ViT，轻量版优于MobileNet

[6] Vision Transformer with Progressive Sampling
paper | code

[5] Rethinking and Improving Relative Position Encoding for Vision Transformer
paper | code
解读：Vision Transformer中的相对位置编码

[4] AutoFormer: Searching Transformers for Visual Recognition
paper | code

[3] Rethinking Spatial Dimensions of Vision Transformers
paper | code

[2] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code

[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读：金字塔视觉Transformer(PVT)：用于密集预测的多功能backbone

神经网络架构搜索(NAS)

[4] Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
paper | code

[3] BN-NAS: Neural Architecture Search with Batch Normalization
paper

[2] NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
paper

[1] AutoFormer: Searching Transformers for Visual Recognition
paper | code

损失函数(Loss Function)

[3] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code
解读：拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code

[1] Orthogonal Projection Loss
paper | code

可视化/可解释性(Visualization/Interpretability)

[1] Finding Representative Interpretations on Convolutional Neural Networks
paper

模型训练/泛化(Model Training/Generalization)

[3] MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach(多任务学习)
paper

[2] Impact of Aliasing on Generalization in Deep Convolutional Networks
paper

[1] Learning Compatible Embeddings
paper | code

噪声标签(Noisy Label)

[2] NGC: A Unified Framework for Learning with Open-World Noisy Data
paper

[1] Learning with Noisy Labels via Sparse Regularization
paper | code

长尾分布(Long-Tailed Distribution)

[2] Learning of Visual Relations: The Devil is in the Tails
paper

[1] ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot(Oral)
paper | code

分布外样本检测(Out of Distribution Detection)

[4] Semantically Coherent Out-of-Distribution Detection
paper | project

[3] NGC: A Unified Framework for Learning with Open-World Noisy Data
paper

[2] Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning
paper

[1] CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
paper

模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)

[5] Multi-Task Self-Training for Learning General Representations(多任务学习)
paper

[4] G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
paper

[3] Online Multi-Granularity Distillation for GAN Compression
paper | code

[2] Distilling Holistic Knowledge with Graph Neural Networks
paper | code

[1] AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
paper | code

剪枝(Pruning)

剪枝(Pruning)

量化(Quantization)

[2] Distance-aware Quantization
paper

[1] Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
paper | code

图像生成/合成(Image Generation/Image Synthesis)

[9] Image Inpainting via Conditional Texture and Structure Dual Generation
paper | code

[8] Dual Projection Generative Adversarial Networks for Conditional Image Generation
paper

[7] Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates(手势生成)
paper | code

[6] Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
paper | code

[5] ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models(Oral)
paper

[4] Toward Spatially Unbiased Generative Models
paper

[3] A Light Stage on Every Desk
paper | project

[2] Handwriting Transformers
paper

[1] On Generating Transferable Targeted Perturbations
paper | code

视图合成(View Synthesis)

[1] PixelSynth: Generating a 3D-Consistent Experience from a Single Image
paper | project

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[15] Towards Vivid and Diverse Image Colorization with Generative Color Prior(图像着色)
paper

[14] Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
paper

[13] Unsupervised Geodesic-preserved Generative Adversarial Networks for Unconstrained 3D Pose Transfer
paper ｜ code

[12] Online Multi-Granularity Distillation for GAN Compression
paper | code

[11] AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
paper | code

[10] Meta Gradient Adversarial Attack
paper

[9] Sketch Your Own GAN
paper | code | project
解读：用一张草图创建GAN模型，新手也能玩转，朱俊彦团队新研究入选ICCV 2021

[8] Feature Importance-aware Transferable Adversarial Attacks
paper | code

[7] From Continuity to Editability: Inverting GANs with Consecutive Images
paper | code

[6] Learnable Boundary Guided Adversarial Training
paper | code

[5] Transporting Causal Mechanisms for Unsupervised Domain Adaptation(Oral)
paper

[4] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage | video

[3] HeadGAN: One-shot Neural Head Synthesis and Editing
paper

[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project

[1] EigenGAN: Layer-Wise Eigen-Learning for GANs
paper | code

图像处理(Image Processing)

[4] Towards Vivid and Diverse Image Colorization with Generative Color Prior(图像着色)
paper

[3] Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling
paper | code

[2] Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
paper

[1] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper

超分辨率(Super Resolution)

[2] Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution
paper | code

[1] Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
paper | code

图像复原/图像增强(Image Restoration)

[2] Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
paper

[1] Spatially-Adaptive Image Restoration using Distortion-Guided Networks
paper | code

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

[1] CANet: A Context-Aware Network for Shadow Removal
paper

图像去噪/去模糊/去雨去雾(Image Denoising)

[1] Rethinking Coarse-to-Fine Approach in Single Image Deblurring
paper | code

图像编辑/修复(Image Edit/Image Inpainting)

[2] GAN Inversion for Out-of-Range Images with Geometric Transformations
paper | code

[1] Occlusion-Aware Video Object Inpainting(视频修复)
paper

风格迁移(Style Transfer)

[5] SSH: A Self-Supervised Framework for Image Harmonization(图像协调)
paper | code

[4] Domain-Aware Universal Style Transfer
paper

[3] AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
paper | code1 | code2

[2] ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity(风格迁移)
paper

[1] Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts(字体生成)
paper | code

图像质量评估(Image Quality Assessment)

[1] MUSIQ: Multi-scale Image Quality Transformer
paper

网络视频传输(Neural Video Delivery)

[1] Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
paper | code

估计(Estimation)

姿态估计(Human Pose Estimation)

[10] Probabilistic Modeling for Human Mesh Recovery
paper | code

[9] DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders(Oral)
paper | code

[8] Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation
paper | code

[7] EventHPE: Event-based 3D Human Pose and Shape Estimation
paper

[6] HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton
paper | code

[5] Online Knowledge Distillation for Efficient Pose Estimation
paper

[4] Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
paper

[3] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code

[2] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project

[1] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project

光流/位姿/运动估计(Flow/Pose/Motion Estimation)

[1] SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
paper

深度估计(Depth Estimation)

[7] PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility(Oral)
paper | code

[6] Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation(Oral)
paper | code

[5] StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
paper | code

[4] Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
paper

[3] Towards Interpretable Deep Networks for Monocular Depth Estimation
paper | code

[2] Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
paper

[1] MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
paper

图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

[7] Cross-category Video Highlight Detection via Set-based Learning(视频高光检测)
paper | code

[6] Universal Cross-Domain Retrieval: Generalizing Across Classes and Domains
paper

[5] ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
paper

[4] Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
paper | code

[3] DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
paper

[2] Hand Image Understanding via Deep Multi-Task Learning(手部图像理解)
paper

[1] Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
paper

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)

[8] Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
paper | code

[7] Group-aware Contrastive Regression for Action Quality Assessment(动作质量评估)
paper

[6] Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization(动作定位)
paper | code

[5] Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization(动作定位)
paper | code

[4] Elaborative Rehearsal for Zero-shot Action Recognition
paper | code

[3] Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
paper

[2] Enriching Local and Global Contexts for Temporal Action Localization
paper

[1] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
paper | code

行人重识别/检测(Re-Identification/Detection)

[7] Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
paper

[6] Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
paper

[5] Towards Discriminative Representation Learning for Unsupervised Person Re-identification
paper

[4] Learning Instance-level Spatial-Temporal Patterns for Person Re-identification
paper | Cleaned database

[3] An Intermediate Domain Module for Domain Adaptive Person Re-ID(Oral)
paper | code

[2] Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
paper

[1] TransReID: Transformer-based Object Re-Identification
paper | code
解读：来自Transformer的降维打击：ReID各项任务全面领先，阿里&浙大提出TransReID

图像/视频字幕(Image/Video Caption)

[1] End-to-End Dense Video Captioning with Parallel Decoding
paper | code

视觉定位(Visual Localization)

[5] Few-shot Visual Relationship Co-localization
paper

[4] PICCOLO: Point Cloud-Centric Omnidirectional Localization
paper

[3] Normalization Matters in Weakly Supervised Object Localization
paper

[2] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
paper | code

[1] Boundary-sensitive Pre-training for Temporal Localization in Videos
paper

图像匹配(Image Matching)

[6] Learning to Match Features with Seeded Graph Matching Network
paper | code

[5] Pixel-Perfect Structure-from-Motion with Featuremetric Refinement
paper | code

[4] Progressive Correspondence Pruning by Consensus Learning
paper | code | project
解读：CLNet：基于一致性学习的渐进式匹配筛选

[3] Multi-scale Matching Networks for Semantic Correspondence
paper

[2] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code

[1] COTR: Correspondence Transformer for Matching Across Images
paper

三维视觉(3D Vision)

[2] Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
paper

[1] MVTN: Multi-View Transformation Network for 3D Shape Recognition
paper

目标跟踪(Object Tracking)

[9] MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
paper

[8] Learning Spatio-Temporal Transformer for Visual Tracking
paper | code
解读：屠榜目标跟踪！大连理工和MSRA提出STARK：基于Transformer的目标跟踪器

[7] Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
paper

[6] Video Annotation for Visual Tracking via Selection and Refinement
paper

[5] Saliency-Associated Object Tracking
paper

[4] Learn to Match: Automatic Matching Network Design for Visual Tracking
paper | code

[3] HiFT: Hierarchical Feature Transformer for Aerial Tracking
paper | code

[2] Learning to Adversarially Blur Visual Object Tracking
paper | code

[1] Detecting Invisible People
paper | project | video

医学影像(Medical Imaging)

[2] Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
paper

[1] Generative Adversarial Registration for Improved Conditional Deformable Templates
paper | code | homepage

文本检测/识别(Text Detection/Recognition)

[5] From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
paper | code&dataset

[4] Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
paper

[3] Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
paper

[2] Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
paper

[1] Towards the Unseen: Iterative Text Recognition by Distilling from Errors
paper

遥感图像(Remote Sensing Image)

[4] Structured Outdoor Architecture Reconstruction by Exploration and Classification
paper

[3] Change is Everywhere Single-Temporal Supervised Object Change Detection for High Spatial Resolution Remote Sensing Imagery(变化检测)
paper | code

[2] Geography-Aware Self-Supervised Learning
paper

[1] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
paper ｜ code

场景图(Scene Graph)

场景图生成(Scene Graph Generation)

[5] Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
paper

[4] Target Adaptive Context Aggregation for Video Scene Graph Generation
paper | code

[3] Unconditional Scene Graph Generation
paper

[2] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
paper
解读：用于视频场景图生成的时空上下文Transformer

[1] Unconstrained Scene Generation with Locally Conditioned Radiance Fields
paper

场景图预测(Scene Graph Prediction)

[1] Generative Compositional Augmentations for Scene Graph Prediction
paper | code

数据处理(Data Processing)

数据增广(Data Augmentation)

[3] BiaSwap: Removing dataset bias with bias-tailored swapping augmentation
paper

[2] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
paper | code

[1] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
paper
解读：“白嫖”性能的MixMo，一种新的数据增强or模型融合方法

异常检测(Anomaly Detection)

[3] A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction
paper | code

[2] Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping
paper

[1] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
paper | code

表征学习(Representation Learning)

[4] Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
paper

[3] Collaborative Unsupervised Visual Representation Learning from Decentralized Data
paper

[2] Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
paper

[1] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project

图像压缩(Image Compression)

[1] Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform
paper | code

归一化/正则化(Batch Normalization)

图像聚类(Image Clustering)

[5] A Unified Objective for Novel Class Discovery(Oral)
paper | code

[4] Instance Similarity Learning for Unsupervised Feature Representation
paper | code

[3] Graph Constrained Data Representation Learning for Human Motion Segmentation(人体运动分割)
paper

[2] Improve Unsupervised Pretraining for Few-label Transfer
paper

[1] Clustering by Maximizing Mutual Information Across Views
paper

小样本学习/零样本学习(Few-shot/Zero-shot Learning)

[6] Binocular Mutual Learning for Improving Few-shot Classification
paper

[5] Field-Guide-Inspired Zero-Shot Learning
paper

[4] Relational Embedding for Few-Shot Classification
paper

[3] Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder
paper

[2] Transductive Few-Shot Classification on the Oblique Manifold
paper

[1] FREE: Feature Refinement for Generalized Zero-Shot Learning
paper | code

持续学习(Continual Learning/Life-long Learning)

[4] Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process
paper | code

[3] Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
paper

[2] RECALL: Replay-based Continual Learning in Semantic Segmentation
paper

[1] Few-Shot and Continual Learning with Attentive Independent Mechanisms
paper | code

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[16] Learning Cross-modal Contrastive Features for Video Domain Adaptation
paper

[15] Learning to Diversify for Single Domain Generalization
paper

[14] PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation
paper | code

[13] Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
paper

[12] Semantic Concentration for Domain Adaptation
paper

[11] Dual Path Learning for Domain Adaptation of Semantic Segmentation
paper | code

[10] Zero-Shot Domain Adaptation with a Physics Prior(Oral)
paper | code

[9] BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation
paper

[8] Domain Generalization via Gradient Surgery

paper

[7] Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
paper | code

[6] Adversarial Unsupervised Domain Adaptation with Conditional and Label Shift: Infer, Align and Iterate
paper

[5] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper

[4] Improve Unsupervised Pretraining for Few-label Transfer
paper

[3] Generalized Source-free Domain Adaptation
homepage | code

[2] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
paper ｜ code

[1] Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling(迁移学习)
paper

度量学习(Metric Learning)

[6] Deep Relational Metric Learning
paper | code

[5] LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning
paper

[4] Towards Interpretable Deep Metric Learning with Structural Matching
paper | code

[3] AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
paper | code

[2] Deep Metric Learning for Open World Semantic Segmentation
paper

[1] Learning with Memory-based Virtual Classes for Deep Metric Learning
paper

增量学习(Incremental Learning)

[2] Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting
paper | code

[1] Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning
paper | code | project

对比学习(Contrastive Learning)

[6] TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
paper

[5] Self-Supervised Video Representation Learning with Meta-Contrastive Network(对比学习)(元学习)(表征学习)(动作识别)
paper

[4] Improving Contrastive Learning by Visualizing Feature Transformation
paper ｜ visualization tools and codes

[3] Parametric Contrastive Learning
paper | code

[2] Geography-Aware Self-Supervised Learning
paper

[1] CoMatch: Semi-supervised Learning with Contrastive Graph Regularization
paper | code

主动学习(Active Learning)

[2] Semi-Supervised Active Learning with Temporal Output Discrepancy
paper | code

[1] Active Learning for Deep Object Detection via Probabilistic Modeling
paper

视觉推理/视觉问答(Visual Reasoning/VQA)

[3] Greedy Gradient Ensemble for Robust Visual Question Answering
paper | code

[2] On the hidden treasure of dialog in video question answering
paper

[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code | project

元学习(Meta Learning)

[1] Self-Supervised Video Representation Learning with Meta-Contrastive Network(对比学习)(元学习)(表征学习)(动作识别)
paper

多模态学习(Multi-Modal Learning)

视听学习(Audio-visual Learning)

[1] The Right to Talk: An Audio-Visual Transformer Approach
paper

视觉语言(Visual & Language)

[1] LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
paper | project

视觉预测(Vision-based Prediction)

[9] DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
paper

[8] Generating Smooth Pose Sequences for Diverse Human Motion Prediction
paper | code

[7] MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction(人体运动预测)
paper ｜ code

[6] RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting(运动预测)
paper | project

[5] SLAMP: Stochastic Latent Appearance and Motion Prediction(运动预测)
paper

[4] Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction(轨迹预测)
paper

[3] Personalized Trajectory Prediction via Distribution Discrimination(轨迹预测)
paper | code

[2] Human Trajectory Prediction via Counterfactual Analysis(轨迹预测)
paper | code

[1] On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors
paper

数据集(Dataset)

[8] From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
paper | code&dataset

[7] LOKI: Long Term and Key Intentions for Trajectory Prediction(轨迹预测)
paper | dataset

[6] Who's Waldo? Linking People Across Text and Images(Oral)
paper | project

[5] Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark(违禁物品检测)
paper

[4] Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision(地标照片集)
paper | project

[3] Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
paper | dataset

[2] OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild
paper | project

[1] 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface(4D重建)
paper | dataset | video

暂无分类


SketchLattice: Latticed Representation for Sketch Manipulation
paper

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation(视觉导航)
paper

Learning Signed Distance Field for Multi-view Surface Reconstruction(Oral)
paper

BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
paper

Stochastic Scene-Aware Motion Prediction(运动合成)(运动预测)
paper | project

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach(自动驾驶)(强化学习)
paper

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision
paper | code

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation(视频插帧)
paper | code

Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
paper

DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
paper

MT-ORL: Multi-Task Occlusion Relationship Learning
paper | code

ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study
paper

Invisible Backdoor Attack with Sample-Speciﬁc Triggers(后门学习)
paper
解读：具有样本特定触发器的隐形后门攻击

SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
paper

Learning to Cut by Watching Movies
paper | project

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction(Oral)
paper | code

Internal Video Inpainting by Implicit Long-range Propagation
paper

CanvasVAE: Learning to Generate Vector Graphic Documents
paper

TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning(多标签学习)
paper

Out-of-Core Surface Reconstruction via Global TGV Minimization
paper

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting(人群计数)
paper

Spatial Uncertainty-Aware Semi-Supervised Crowd Counting(人群计数)
paper

Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting(人群计数)
paper | code

Self-Conditioned Probabilistic Learning of Video Rescaling(视频压缩)
paper

Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives(手势生成)
paper

Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
paper

Click to Move: Controlling Video Generation with Sparse Motion
paper | code

Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data（视频翻译/医学/视频合成）
paper

Pathdreamer: A World Model for Indoor Navigation(视觉导航)
paper

IPOKE: POKING A STILL IMAGE FOR CONTROLLED STOCHASTIC VIDEO SYNTHESIS
paper | code | project

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
paper | project

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
paper | code

2. ICCV2021 Oral（更新中）

[34] Learning Signed Distance Field for Multi-view Surface Reconstruction(Oral)
paper

[33] PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility(Oral)
paper | code

[32] Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation(Oral)
paper | code

[31] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers(点云补全)(Oral)
paper | code

[30] DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders(Oral)
paper | code

[29] A Unified Objective for Novel Class Discovery(Oral)
paper | code

[28] Multi-Anchor Active Domain Adaptation for Semantic Segmentation(Oral)
paper

[27] Who's Waldo? Linking People Across Text and Images(Oral)
paper | project

[26] LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation(Oral)
paper

[25] Zero-Shot Domain Adaptation with a Physics Prior(Oral)
paper | code

[24] An Empirical Study of Training Self-Supervised Vision Transformers(Oral)
paper
解读：解决训练不稳定性，何恺明团队新作来了！自监督学习+Transformer=MoCoV3

[23] Paint Transformer: Feed Forward Neural Painting with Stroke Prediction(Oral)
paper | code

[22] (Just) A Spoonful of Refinements Helps the Registration Error Go Down(Oral)
paper

[21] ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models(Oral)
paper

[20] ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot(Oral)
paper | code

[19] An Intermediate Domain Module for Domain Adaptive Person Re-ID(Oral)
paper | code

[18] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper

[17] Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code

[16] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code
解读：拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

[15] Transporting Causal Mechanisms for Unsupervised Domain Adaptation
paper

[14] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
[paper](https://arxiv.org/abs/2107.11264

[13] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code

[12] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code

[11] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage

[10] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code

[9] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project

[8] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project

[7] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code

[6] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper

[5] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
解读：无需检测器提取特征！LeCun团队提出MDETR：实现真正的端到端多模态推理

[4] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读：金字塔视觉Transformer(PVT)：用于密集预测的多功能backbone

[3] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code

[2] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project

[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code

3. ICCV2021论文解读汇总（更新中）

[16] An Empirical Study of Training Self-Supervised Vision Transformers(Oral)
paper
解读：解决训练不稳定性，何恺明团队新作来了！自监督学习+Transformer=MoCoV3

[15] Bias Loss for Mobile Neural Networks
paper
解读：超越MobileNet V3 | 详解SkipNet+Bias Loss=轻量化模型新的里程碑

[14] Rethinking and Improving Relative Position Encoding for Vision Transformer
paper | code
解读：Vision Transformer中的相对位置编码

[13] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
paper
解读：用于视频场景图生成的时空上下文Transformer

[12] LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference
paper | code
解读：FaceBook提出LeViT，0.077ms的单图处理速度却拥有ResNet50的精度

[11] Progressive Correspondence Pruning by Consensus Learning
paper | code | project
解读：CLNet：基于一致性学习的渐进式匹配筛选

[10] Invisible Backdoor Attack with Sample-Speciﬁc Triggers(后门学习)
paper
解读：具有样本特定触发器的隐形后门攻击

[9] GraphFPN: Graph Feature Pyramid Network for Object Detection
paper
解读：复旦&港大提出GraphFPN：用图特征金字塔提升目标检测性能！

[8] Learning Spatio-Temporal Transformer for Visual Tracking
paper | code
解读：屠榜目标跟踪！大连理工和MSRA提出STARK：基于Transformer的目标跟踪器

[7] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
paper | code
解读：ResNet被全面超越了，是Transformer干的：依图科技开源“可大可小”T2T-ViT，轻量版优于MobileNet

[6] Sketch Your Own GAN
paper | code | project
解读：用一张草图创建GAN模型，新手也能玩转，朱俊彦团队新研究入选ICCV 2021

[5] DetCo: Unsupervised Contrastive Learning for Object Detection
paper | code
解读：性能优于何恺明团队MoCo v2，DetCo：为目标检测定制任务的对比学习

[4] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
解读：无需检测器提取特征！LeCun团队提出MDETR：实现真正的端到端多模态推理

[3] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
paper
解读：“白嫖”性能的MixMo，一种新的数据增强or模型融合方法

[2] TransReID: Transformer-based Object Re-Identification
paper | code
解读：来自Transformer的降维打击：ReID各项任务全面领先，阿里&浙大提出TransReID

[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读：金字塔视觉Transformer(PVT)：用于密集预测的多功能backbone

Files

ICCV2021.md

Latest commit

History

ICCV2021.md

File metadata and controls

ICCV2021最新信息及论文下载贴（Papers/Codes/Project/PaperReading／Demos/直播分享／论文分享会等）

目录

1.ICCV2021接受论文/代码分方向汇总（更新中）

分类目录：

检测

2D目标检测(2D Object Detection)

3D目标检测(3D Object Detection)

视频目标检测(Video Object Detection)

人物交互检测(HOI Detection)

显著性目标检测(Saliency Object Detection)

伪装目标检测(Camouflaged Object Detection)

图像异常检测/表面缺陷检测(Anomally Detection in Image)

边缘检测(Edge Detection)

分割(Segmentation)

图像分割(Image Segmentation)

实例分割(Instance Segmentation)

语义分割(Semantic Segmentation)

视频目标分割(Video Object Segmentation)

参考图像分割(Referring Image Segmentation)

密集预测(Dense Prediction)

人脸(Face)

人脸识别/检测(Facial Recognition/Detection)

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

三维视觉(3D Vision)

点云(Point Cloud)

三维重建(3D Reconstruction)

神经网络设计与优化(Neural Network Structure Design & Optimization)

CNN

Attention

Transformer

神经网络架构搜索(NAS)

损失函数(Loss Function)

可视化/可解释性(Visualization/Interpretability)

模型训练/泛化(Model Training/Generalization)

噪声标签(Noisy Label)

长尾分布(Long-Tailed Distribution)

分布外样本检测(Out of Distribution Detection)

模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)

剪枝(Pruning)

剪枝(Pruning)

量化(Quantization)

图像生成/合成(Image Generation/Image Synthesis)

视图合成(View Synthesis)

GAN/生成式/对抗式(GAN/Generative/Adversarial)

图像处理(Image Processing)

超分辨率(Super Resolution)

图像复原/图像增强(Image Restoration)

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

图像去噪/去模糊/去雨去雾(Image Denoising)

图像编辑/修复(Image Edit/Image Inpainting)

风格迁移(Style Transfer)

图像质量评估(Image Quality Assessment)

网络视频传输(Neural Video Delivery)

估计(Estimation)

姿态估计(Human Pose Estimation)

光流/位姿/运动估计(Flow/Pose/Motion Estimation)

深度估计(Depth Estimation)

图像&视频检索/理解(Image&Video Retrieval/Video Understanding)

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)

行人重识别/检测(Re-Identification/Detection)

图像/视频字幕(Image/Video Caption)

视觉定位(Visual Localization)

图像匹配(Image Matching)

三维视觉(3D Vision)

目标跟踪(Object Tracking)

医学影像(Medical Imaging)

文本检测/识别(Text Detection/Recognition)

遥感图像(Remote Sensing Image)

场景图(Scene Graph)

场景图生成(Scene Graph Generation)

场景图预测(Scene Graph Prediction)

数据处理(Data Processing)