Skip to content

Latest commit

 

History

History
1980 lines (1254 loc) · 87.8 KB

ICCV2021.md

File metadata and controls

1980 lines (1254 loc) · 87.8 KB

ICCV2021最新信息及论文下载贴(Papers/Codes/Project/PaperReading/Demos/直播分享/论文分享会等)

官网链接:http://iccv2021.thecvf.com
时间:2021年10月11日-10月17日
论文接收公布时间:2021年7月22日

相关问题:



目录

1. ICCV2021接受论文/代码分方向汇总(更新中)
2. ICCV2021 Oral(更新中)
3. ICCV2021论文解读汇总(更新中)



分类目录:



[14] FOVEA: Foveated Image Magnification for Autonomous Navigation
paper | project

[13] DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
paper

[12] G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
paper

[11] Vector-Decomposed Disentanglement for Domain-Invariant Object Detection
paper

[10] Oriented R-CNN for Object Detection
paper | code

[9] Conditional DETR for Fast Training Convergence
paper | code

[8] Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters
paper | code

[7] GraphFPN: Graph Feature Pyramid Network for Object Detection
paper
解读:复旦&港大提出GraphFPN:用图特征金字塔提升目标检测性能!

[6] SimROD: A Simple Adaptation Method for Robust Object Detection
paper

[5] Active Learning for Deep Object Detection via Probabilistic Modeling
paper

[4] Detecting Invisible People
paper | project | video

[3] Conditional Variational Capsule Network for Open Set Recognition
paper | code

[2] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
解读:无需检测器提取特征!LeCun团队提出MDETR:实现真正的端到端多模态推理

[1] DetCo: Unsupervised Contrastive Learning for Object Detection
paper | code
解读:性能优于何恺明团队MoCo v2,DetCo:为目标检测定制任务的对比学习

[6] LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
paper

[5] RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
paper

[4] Is Pseudo-Lidar needed for Monocular 3D Object detection?
paper

[3] Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather
paper | code

[2] Geometry Uncertainty Projection Network for Monocular 3D Object Detection
paper

[1] Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency
paper

[1] Social Fabric: Tubelet Compositions for Video Relation Detection
paper | code

[1] Exploiting Scene Graphs for Human-Object Interaction Detection
paper | code

[2] Specificity-preserving RGB-D Saliency Detection
paper | code

[1] Disentangled High Quality Salient Object Detection
paper

[1] TransForensics: Image Forgery Localization with Dense Self-Attention
paper

[2] DRÆM -- A discriminatively trained reconstruction embedding for surface anomaly detection
paper

[1] Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection
paper

[2] Pixel Difference Networks for Efficient Edge Detection
paper | code

[1] RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
paper


[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project

[1] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code

[6] A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation
paper

[5] Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
paper | code

[4] SOTR: Segmenting Objects with Transformers
paper | code

[3] Hierarchical Aggregation for 3D Instance Segmentation
paper | code

[2] Crossover Learning for Fast Online Video Instance Segmentation
code

[1] Instances as Queries
paper | code

[21] Mining Contextual Information Beyond Image for Semantic Segmentation
paper

[20] Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation
paper | project

[19] Self-Regulation for Semantic Segmentation
paper

[18] Multi-Anchor Active Domain Adaptation for Semantic Segmentation(Oral)
paper

[17] Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
paper

[16] Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
paper

[15] LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation(Oral)
paper

[14] Dual Path Learning for Domain Adaptation of Semantic Segmentation
paper | code

[13] Deep Metric Learning for Open World Semantic Segmentation
paper

[12] Complementary Patch for Weakly Supervised Semantic Segmentation
paper

[11] RECALL: Replay-based Continual Learning in Semantic Segmentation
paper

[10] Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
papercode

[9] Learning Meta-class Memory for Few-Shot Semantic Segmentation
paper

[8] Personalized Image Semantic Segmentation
paper

[7] VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
paper | code

[6] Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
paper

[5] ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation(点云语义分割)
paper

[4] Domain Adaptive Video Segmentation via Temporal Consistency Regularization(video semantic segmentation)
paper | code

[3] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
paper

[2] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code

[1] Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
paper | code

[2] Joint Inductive and Transductive Learning for Video Object Segmentation
paper | code

[1] Full-Duplex Strategy for Video Object Segmentation
paper | project

[1] Vision-Language Transformer and Query Generation for Referring Segmentation
paper | code

[1] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction
paper | code


[1] Learning Facial Representations from the Cycle-consistency of Face
paper

[4] TransFER: Learning Relation-aware Facial Expression Representations with Transformers
paper

[3] Understanding and Mitigating Annotation Bias in Facial Expression Recognition
paper

[2] SynFace: Face Recognition with Synthetic Data
paper

[1] PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition
paper

[5] FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
paper

[4] Disentangled Lifespan Face Synthesis
paper | code

[3] MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement(音频驱动面部动画)
paper | video

[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code

[1] HeadGAN: One-shot Neural Head Synthesis and Editing
paper

[1] Exploring Temporal Coherence for More General Video Face Forgery Detection
paper


[3] Differentiable Surface Rendering via Non-Differentiable Sampling
paper

[2] M3D-VTON: A Monocular-to-3D Virtual Try-On Network(3D试穿)
paper

[1] Score-Based Point Cloud Denoising
paper

[14] A Robust Loss for Point Cloud Registration
paper

[13] CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing
paper | project

[12] Voxel-based Network for Shape Completion by Leveraging Edge Generation
paper | code

[11] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers(点云补全)(Oral)
paper | code

[10] ME-PCN: Point Completion Conditioned on Mask Emptiness(点云补全)
paper

[9] Adaptive Graph Convolution for Point Cloud Analysis
paper | code

[8] PICCOLO: Point Cloud-Centric Omnidirectional Localization
paper

[7] AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds
paper | code

[6] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
paper | code

[5] DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
paper

[4] Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching
paper | code

[3] (Just) A Spoonful of Refinements Helps the Registration Error Go Down(Oral)
paper

[2] Learning with Noisy Labels for Robust Point Cloud Segmentation(点云分割)
paper | code

[1] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
paper | project

[12] Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image
paper

[11] Gravity-Aware Monocular 3D Human-Object Reconstruction
paper | code

[10] 3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces
paper | code

[9] VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
paper

[8] Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for Single-view Garment Reconstruction
paper

[7] Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility
paper | code

[6] Deep Hybrid Self-Prior for Full 3D Mesh Generation
paper | project

[5] PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion
paper

[4] Learning Canonical 3D Object Representation for Fine-Grained Recognition
paper

[3] ELLIPSDF: Joint Object Pose and Shape Optimization with a Bi-level Ellipsoid and Signed Distance Function Description
paper

[2] Discovering 3D Parts from Image Collections
paper | project

[1] PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
paper | code


[2] Unifying Nonlocal Blocks for Neural Networks
paper

[1] Energy-Based Open-World Uncertainty Modeling for Confidence Calibration(置信度校准)
paper

[3] MicroNet: Improving Image Recognition with Extremely Low FLOPs
paper | code1 | code2

[2] Learning to Resize Images for Computer Vision Tasks
paper

[1] Bias Loss for Mobile Neural Networks
paper
解读:超越MobileNet V3 | 详解SkipNet+Bias Loss=轻量化模型新的里程碑

[6] Causal Attention for Unbiased Visual Recognition
paper | code

[5] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification(因果推理)(细粒度识别)
paper | code

[4] Residual Attention: A Simple but Effective Method for Multi-Label Recognition
paper

[3] Fast Convergence of DETR with Spatially Modulated Co-Attention
paper | code

[2] SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition
paper | code

[1] FcaNet: Frequency Channel Attention Networks
paper | code

[10] An Empirical Study of Training Self-Supervised Vision Transformers(Oral)
paper
解读:解决训练不稳定性,何恺明团队新作来了!自监督学习+Transformer=MoCoV3

[9] LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference
paper | code
解读:FaceBook提出LeViT,0.077ms的单图处理速度却拥有ResNet50的精度

[8] Emerging Properties in Self-Supervised Vision Transformers
paper | code
解读:当Transformer遇见自监督学习!Facebook重磅开源DINO

[7] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
paper | code
解读:ResNet被全面超越了,是Transformer干的:依图科技开源“可大可小”T2T-ViT,轻量版优于MobileNet

[6] Vision Transformer with Progressive Sampling
paper | code

[5] Rethinking and Improving Relative Position Encoding for Vision Transformer
paper | code
解读:Vision Transformer中的相对位置编码

[4] AutoFormer: Searching Transformers for Visual Recognition
paper | code

[3] Rethinking Spatial Dimensions of Vision Transformers
paper | code

[2] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code

[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读:金字塔视觉Transformer(PVT):用于密集预测的多功能backbone

[4] Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
paper | code

[3] BN-NAS: Neural Architecture Search with Batch Normalization
paper

[2] NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models
paper

[1] AutoFormer: Searching Transformers for Visual Recognition
paper | code

[3] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code
解读:拒绝调参,显著提点!检测分割任务的新损失函数RS Loss开源

[2] Focal Frequency Loss for Image Reconstruction and Synthesis
paper | code

[1] Orthogonal Projection Loss
paper | code

[1] Finding Representative Interpretations on Convolutional Neural Networks
paper


[3] MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach(多任务学习)
paper

[2] Impact of Aliasing on Generalization in Deep Convolutional Networks
paper

[1] Learning Compatible Embeddings
paper | code

[2] NGC: A Unified Framework for Learning with Open-World Noisy Data
paper

[1] Learning with Noisy Labels via Sparse Regularization
paper | code

[2] Learning of Visual Relations: The Devil is in the Tails
paper

[1] ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot(Oral)
paper | code

[4] Semantically Coherent Out-of-Distribution Detection
paper | project

[3] NGC: A Unified Framework for Learning with Open-World Noisy Data
paper

[2] Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning
paper

[1] CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue
paper


[5] Multi-Task Self-Training for Learning General Representations(多任务学习)
paper

[4] G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation
paper

[3] Online Multi-Granularity Distillation for GAN Compression
paper | code

[2] Distilling Holistic Knowledge with Graph Neural Networks
paper | code

[1] AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
paper | code

[2] Distance-aware Quantization
paper

[1] Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
paper | code


[9] Image Inpainting via Conditional Texture and Structure Dual Generation
paper | code

[8] Dual Projection Generative Adversarial Networks for Conditional Image Generation
paper

[7] Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates(手势生成)
paper | code

[6] Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation
paper | code

[5] ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models(Oral)
paper

[4] Toward Spatially Unbiased Generative Models
paper

[3] A Light Stage on Every Desk
paper | project

[2] Handwriting Transformers
paper

[1] On Generating Transferable Targeted Perturbations
paper | code

[1] PixelSynth: Generating a 3D-Consistent Experience from a Single Image
paper | project


[15] Towards Vivid and Diverse Image Colorization with Generative Color Prior(图像着色)
paper

[14] Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
paper

[13] Unsupervised Geodesic-preserved Generative Adversarial Networks for Unconstrained 3D Pose Transfer
papercode

[12] Online Multi-Granularity Distillation for GAN Compression
paper | code

[11] AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
paper | code

[10] Meta Gradient Adversarial Attack
paper

[9] Sketch Your Own GAN
paper | code | project
解读:用一张草图创建GAN模型,新手也能玩转,朱俊彦团队新研究入选ICCV 2021

[8] Feature Importance-aware Transferable Adversarial Attacks
paper | code

[7] From Continuity to Editability: Inverting GANs with Consecutive Images
paper | code

[6] Learnable Boundary Guided Adversarial Training
paper | code

[5] Transporting Causal Mechanisms for Unsupervised Domain Adaptation(Oral)
paper

[4] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage | video

[3] HeadGAN: One-shot Neural Head Synthesis and Editing
paper

[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper | code | project

[1] EigenGAN: Layer-Wise Eigen-Learning for GANs
paper | code


[4] Towards Vivid and Diverse Image Colorization with Generative Color Prior(图像着色)
paper

[3] Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling
paper | code

[2] Accelerating Atmospheric Turbulence Simulation via Learned Phase-to-Space Transform
paper

[1] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper

[2] Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution
paper | code

[1] Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
paper | code

[2] Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables
paper

[1] Spatially-Adaptive Image Restoration using Distortion-Guided Networks
paper | code

[1] CANet: A Context-Aware Network for Shadow Removal
paper

[1] Rethinking Coarse-to-Fine Approach in Single Image Deblurring
paper | code

[2] GAN Inversion for Out-of-Range Images with Geometric Transformations
paper | code

[1] Occlusion-Aware Video Object Inpainting(视频修复)
paper

[5] SSH: A Self-Supervised Framework for Image Harmonization(图像协调)
paper | code

[4] Domain-Aware Universal Style Transfer
paper

[3] AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
paper | code1 | code2

[2] ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity(风格迁移)
paper

[1] Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts(字体生成)
paper | code

[1] MUSIQ: Multi-scale Image Quality Transformer
paper

[1] Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
paper | code


[10] Probabilistic Modeling for Human Mesh Recovery
paper | code

[9] DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders(Oral)
paper | code

[8] Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation
paper | code

[7] EventHPE: Event-based 3D Human Pose and Shape Estimation
paper

[6] HandFoldingNet: A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton
paper | code

[5] Online Knowledge Distillation for Efficient Pose Estimation
paper

[4] Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows
paper

[3] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code

[2] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project

[1] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project

[1] SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
paper

[7] PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility(Oral)
paper | code

[6] Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation(Oral)
paper | code

[5] StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation
paper | code

[4] Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation
paper

[3] Towards Interpretable Deep Networks for Monocular Depth Estimation
paper | code

[2] Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark
paper

[1] MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments
paper


[7] Cross-category Video Highlight Detection via Set-based Learning(视频高光检测)
paper | code

[6] Universal Cross-Domain Retrieval: Generalizing Across Classes and Domains
paper

[5] ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
paper

[4] Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
paper | code

[3] DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
paper

[2] Hand Image Understanding via Deep Multi-Task Learning(手部图像理解)
paper

[1] Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
paper

[8] Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
paper | code

[7] Group-aware Contrastive Regression for Action Quality Assessment(动作质量评估)
paper

[6] Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization(动作定位)
paper | code

[5] Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization(动作定位)
paper | code

[4] Elaborative Rehearsal for Zero-shot Action Recognition
paper | code

[3] Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
paper

[2] Enriching Local and Global Contexts for Temporal Action Localization
paper

[1] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
paper | code

[7] Multi-Expert Adversarial Attack Detection in Person Re-identification Using Context Inconsistency
paper

[6] Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences
paper

[5] Towards Discriminative Representation Learning for Unsupervised Person Re-identification
paper

[4] Learning Instance-level Spatial-Temporal Patterns for Person Re-identification
paper | Cleaned database

[3] An Intermediate Domain Module for Domain Adaptive Person Re-ID(Oral)
paper | code

[2] Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
paper

[1] TransReID: Transformer-based Object Re-Identification
paper | code
解读:来自Transformer的降维打击:ReID各项任务全面领先,阿里&浙大提出TransReID

[1] End-to-End Dense Video Captioning with Parallel Decoding
paper | code


[5] Few-shot Visual Relationship Co-localization
paper

[4] PICCOLO: Point Cloud-Centric Omnidirectional Localization
paper

[3] Normalization Matters in Weakly Supervised Object Localization
paper

[2] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
paper | code

[1] Boundary-sensitive Pre-training for Temporal Localization in Videos
paper

[6] Learning to Match Features with Seeded Graph Matching Network
paper | code

[5] Pixel-Perfect Structure-from-Motion with Featuremetric Refinement
paper | code

[4] Progressive Correspondence Pruning by Consensus Learning
paper | code | project
解读:CLNet:基于一致性学习的渐进式匹配筛选

[3] Multi-scale Matching Networks for Semantic Correspondence
paper

[2] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code

[1] COTR: Correspondence Transformer for Matching Across Images
paper


[2] Unsupervised Dense Deformation Embedding Network for Template-Free Shape Correspondence
paper

[1] MVTN: Multi-View Transformation Network for 3D Shape Recognition
paper


[9] MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
paper

[8] Learning Spatio-Temporal Transformer for Visual Tracking
paper | code
解读:屠榜目标跟踪!大连理工和MSRA提出STARK:基于Transformer的目标跟踪器

[7] Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds
paper

[6] Video Annotation for Visual Tracking via Selection and Refinement
paper

[5] Saliency-Associated Object Tracking
paper

[4] Learn to Match: Automatic Matching Network Design for Visual Tracking
paper | code

[3] HiFT: Hierarchical Feature Transformer for Aerial Tracking
paper | code

[2] Learning to Adversarially Blur Visual Object Tracking
paper | code

[1] Detecting Invisible People
paper | project | video


[2] Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
paper

[1] Generative Adversarial Registration for Improved Conditional Deformable Templates
paper | code | homepage


[5] From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
paper | code&dataset

[4] Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection
paper

[3] Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
paper

[2] Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation
paper

[1] Towards the Unseen: Iterative Text Recognition by Distilling from Errors
paper


[4] Structured Outdoor Architecture Reconstruction by Exploration and Classification
paper

[3] Change is Everywhere Single-Temporal Supervised Object Change Detection for High Spatial Resolution Remote Sensing Imagery(变化检测)
paper | code

[2] Geography-Aware Self-Supervised Learning
paper

[1] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
papercode


[5] Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs
paper

[4] Target Adaptive Context Aggregation for Video Scene Graph Generation
paper | code

[3] Unconditional Scene Graph Generation
paper

[2] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
paper
解读:用于视频场景图生成的时空上下文Transformer

[1] Unconstrained Scene Generation with Locally Conditioned Radiance Fields
paper

[1] Generative Compositional Augmentations for Scene Graph Prediction
paper | code


[3] BiaSwap: Removing dataset bias with bias-tailored swapping augmentation
paper

[2] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
paper | code

[1] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
paper
解读:“白嫖”性能的MixMo,一种新的数据增强or模型融合方法

[3] A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction
paper | code

[2] Weakly Supervised Temporal Anomaly Segmentation with Dynamic Time Warping
paper

[1] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
paper | code

[4] Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
paper

[3] Collaborative Unsupervised Visual Representation Learning from Decentralized Data
paper

[2] Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
paper

[1] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project

[1] Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform
paper | code

[5] A Unified Objective for Novel Class Discovery(Oral)
paper | code

[4] Instance Similarity Learning for Unsupervised Feature Representation
paper | code

[3] Graph Constrained Data Representation Learning for Human Motion Segmentation(人体运动分割)
paper

[2] Improve Unsupervised Pretraining for Few-label Transfer
paper

[1] Clustering by Maximizing Mutual Information Across Views
paper


[6] Binocular Mutual Learning for Improving Few-shot Classification
paper

[5] Field-Guide-Inspired Zero-Shot Learning
paper

[4] Relational Embedding for Few-Shot Classification
paper

[3] Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder
paper

[2] Transductive Few-Shot Classification on the Oblique Manifold
paper

[1] FREE: Feature Refinement for Generalized Zero-Shot Learning
paper | code


[4] Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process
paper | code

[3] Continual Neural Mapping: Learning An Implicit Scene Representation from Sequential Observations
paper

[2] RECALL: Replay-based Continual Learning in Semantic Segmentation
paper

[1] Few-Shot and Continual Learning with Attentive Independent Mechanisms
paper | code


[16] Learning Cross-modal Contrastive Features for Video Domain Adaptation
paper

[15] Learning to Diversify for Single Domain Generalization
paper

[14] PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation
paper | code

[13] Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation
paper

[12] Semantic Concentration for Domain Adaptation
paper

[11] Dual Path Learning for Domain Adaptation of Semantic Segmentation
paper | code

[10] Zero-Shot Domain Adaptation with a Physics Prior(Oral)
paper | code

[9] BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation
paper

[8] Domain Generalization via Gradient Surgery

paper

[7] Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation
paper | code

[6] Adversarial Unsupervised Domain Adaptation with Conditional and Label Shift: Infer, Align and Iterate
paper

[5] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper

[4] Improve Unsupervised Pretraining for Few-label Transfer
paper

[3] Generalized Source-free Domain Adaptation
homepage | code

[2] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data(迁移学习)
papercode

[1] Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling(迁移学习)
paper


[6] Deep Relational Metric Learning
paper | code

[5] LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning
paper

[4] Towards Interpretable Deep Metric Learning with Structural Matching
paper | code

[3] AGKD-BML: Defense Against Adversarial Attack by Attention Guided Knowledge Distillation and Bi-directional Metric Learning
paper | code

[2] Deep Metric Learning for Open World Semantic Segmentation
paper

[1] Learning with Memory-based Virtual Classes for Deep Metric Learning
paper


[2] Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting
paper | code

[1] Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning
paper | code | project


[6] TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
paper

[5] Self-Supervised Video Representation Learning with Meta-Contrastive Network(对比学习)(元学习)(表征学习)(动作识别)
paper

[4] Improving Contrastive Learning by Visualizing Feature Transformation
papervisualization tools and codes

[3] Parametric Contrastive Learning
paper | code

[2] Geography-Aware Self-Supervised Learning
paper

[1] CoMatch: Semi-supervised Learning with Contrastive Graph Regularization
paper | code


[2] Semi-Supervised Active Learning with Temporal Output Discrepancy
paper | code

[1] Active Learning for Deep Object Detection via Probabilistic Modeling
paper


[3] Greedy Gradient Ensemble for Robust Visual Question Answering
paper | code

[2] On the hidden treasure of dialog in video question answering
paper

[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code | project


[1] Self-Supervised Video Representation Learning with Meta-Contrastive Network(对比学习)(元学习)(表征学习)(动作识别)
paper


[1] The Right to Talk: An Audio-Visual Transformer Approach
paper

[1] LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
paper | project


[9] DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets
paper

[8] Generating Smooth Pose Sequences for Diverse Human Motion Prediction
paper | code

[7] MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction(人体运动预测)
papercode

[6] RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting(运动预测)
paper | project

[5] SLAMP: Stochastic Latent Appearance and Motion Prediction(运动预测)
paper

[4] Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction(轨迹预测)
paper

[3] Personalized Trajectory Prediction via Distribution Discrimination(轨迹预测)
paper | code

[2] Human Trajectory Prediction via Counterfactual Analysis(轨迹预测)
paper | code

[1] On Exposing the Challenging Long Tail in Future Prediction of Traffic Actors
paper


[8] From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
paper | code&dataset

[7] LOKI: Long Term and Key Intentions for Trajectory Prediction(轨迹预测)
paper | dataset

[6] Who's Waldo? Linking People Across Text and Images(Oral)
paper | project

[5] Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark(违禁物品检测)
paper

[4] Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision(地标照片集)
paper | project

[3] Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach
paper | dataset

[2] OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild
paper | project

[1] 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface(4D重建)
paper | dataset | video


SketchLattice: Latticed Representation for Sketch Manipulation
paper

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation(视觉导航)
paper

Learning Signed Distance Field for Multi-view Surface Reconstruction(Oral)
paper

BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies
paper

Stochastic Scene-Aware Motion Prediction(运动合成)(运动预测)
paper | project

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach(自动驾驶)(强化学习)
paper

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision
paper | code

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation(视频插帧)
paper | code

Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring
paper

DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities
paper

MT-ORL: Multi-Task Occlusion Relationship Learning
paper | code

ProAI: An Efficient Embedded AI Hardware for Automotive Applications - a Benchmark Study
paper

Invisible Backdoor Attack with Sample-Specific Triggers(后门学习)
paper
解读:具有样本特定触发器的隐形后门攻击

SUNet: Symmetric Undistortion Network for Rolling Shutter Correction
paper

Learning to Cut by Watching Movies
paper | project

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction(Oral)
paper | code

Internal Video Inpainting by Implicit Long-range Propagation
paper

CanvasVAE: Learning to Generate Vector Graphic Documents
paper

TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning(多标签学习)
paper

Out-of-Core Surface Reconstruction via Global TGV Minimization
paper

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting(人群计数)
paper

Spatial Uncertainty-Aware Semi-Supervised Crowd Counting(人群计数)
paper

Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting(人群计数)
paper | code

Self-Conditioned Probabilistic Learning of Video Rescaling(视频压缩)
paper

Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives(手势生成)
paper

Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
paper

Click to Move: Controlling Video Generation with Sparse Motion
paper | code

Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data(视频翻译/医学/视频合成)
paper

Pathdreamer: A World Model for Indoor Navigation(视觉导航)
paper

IPOKE: POKING A STILL IMAGE FOR CONTROLLED STOCHASTIC VIDEO SYNTHESIS
paper | code | project

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
paper | project

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
paper | code



[34] Learning Signed Distance Field for Multi-view Surface Reconstruction(Oral)
paper

[33] PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility(Oral)
paper | code

[32] Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation(Oral)
paper | code

[31] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers(点云补全)(Oral)
paper | code

[30] DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders(Oral)
paper | code

[29] A Unified Objective for Novel Class Discovery(Oral)
paper | code

[28] Multi-Anchor Active Domain Adaptation for Semantic Segmentation(Oral)
paper

[27] Who's Waldo? Linking People Across Text and Images(Oral)
paper | project

[26] LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation(Oral)
paper

[25] Zero-Shot Domain Adaptation with a Physics Prior(Oral)
paper | code

[24] An Empirical Study of Training Self-Supervised Vision Transformers(Oral)
paper
解读:解决训练不稳定性,何恺明团队新作来了!自监督学习+Transformer=MoCoV3

[23] Paint Transformer: Feed Forward Neural Painting with Stroke Prediction(Oral)
paper | code

[22] (Just) A Spoonful of Refinements Helps the Registration Error Go Down(Oral)
paper

[21] ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models(Oral)
paper

[20] ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot(Oral)
paper | code

[19] An Intermediate Domain Module for Domain Adaptive Person Re-ID(Oral)
paper | code

[18] Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation(Oral)
paper

[17] Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework(Oral)(人群计数)
paper | code

[16] Rank & Sort Loss for Object Detection and Instance Segmentation(Oral)
paper | code
解读:拒绝调参,显著提点!检测分割任务的新损失函数RS Loss开源

[15] Transporting Causal Mechanisms for Unsupervised Domain Adaptation
paper

[14] Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation(Oral)
[paper](https://arxiv.org/abs/2107.11264

[13] Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation(Oral)
paper | code

[12] Human Pose Regression with Residual Log-likelihood Estimation(Oral)
paper | code

[11] Robustness via Cross-Domain Ensembles(Oral)
paper | code | model | homepage

[10] Warp Consistency for Unsupervised Learning of Dense Correspondences(Oral)
paper | code

[9] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop(Oral)
paper | code | project

[8] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper | video | project

[7] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper | code

[6] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper

[5] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
解读:无需检测器提取特征!LeCun团队提出MDETR:实现真正的端到端多模态推理

[4] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读:金字塔视觉Transformer(PVT):用于密集预测的多功能backbone

[3] Mining Latent Classes for Few-shot Segmentation(Oral)
paper | code

[2] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper | project

[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper | code



[16] An Empirical Study of Training Self-Supervised Vision Transformers(Oral)
paper
解读:解决训练不稳定性,何恺明团队新作来了!自监督学习+Transformer=MoCoV3

[15] Bias Loss for Mobile Neural Networks
paper
解读:超越MobileNet V3 | 详解SkipNet+Bias Loss=轻量化模型新的里程碑

[14] Rethinking and Improving Relative Position Encoding for Vision Transformer
paper | code
解读:Vision Transformer中的相对位置编码

[13] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
paper
解读:用于视频场景图生成的时空上下文Transformer

[12] LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference
paper | code
解读:FaceBook提出LeViT,0.077ms的单图处理速度却拥有ResNet50的精度

[11] Progressive Correspondence Pruning by Consensus Learning
paper | code | project
解读:CLNet:基于一致性学习的渐进式匹配筛选

[10] Invisible Backdoor Attack with Sample-Specific Triggers(后门学习)
paper
解读:具有样本特定触发器的隐形后门攻击

[9] GraphFPN: Graph Feature Pyramid Network for Object Detection
paper
解读:复旦&港大提出GraphFPN:用图特征金字塔提升目标检测性能!

[8] Learning Spatio-Temporal Transformer for Visual Tracking
paper | code
解读:屠榜目标跟踪!大连理工和MSRA提出STARK:基于Transformer的目标跟踪器

[7] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
paper | code
解读:ResNet被全面超越了,是Transformer干的:依图科技开源“可大可小”T2T-ViT,轻量版优于MobileNet

[6] Sketch Your Own GAN
paper | code | project
解读:用一张草图创建GAN模型,新手也能玩转,朱俊彦团队新研究入选ICCV 2021

[5] DetCo: Unsupervised Contrastive Learning for Object Detection
paper | code
解读:性能优于何恺明团队MoCo v2,DetCo:为目标检测定制任务的对比学习

[4] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper | code | project | colab
解读:无需检测器提取特征!LeCun团队提出MDETR:实现真正的端到端多模态推理

[3] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
paper
解读:“白嫖”性能的MixMo,一种新的数据增强or模型融合方法

[2] TransReID: Transformer-based Object Re-Identification
paper | code
解读:来自Transformer的降维打击:ReID各项任务全面领先,阿里&浙大提出TransReID

[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper | code
解读:金字塔视觉Transformer(PVT):用于密集预测的多功能backbone