重要网站
OpenAI维护的强化学习的网站,介绍了经典算法并有配套的code。
教材材料
《Reinforcement Learning: An Introduction》
强化学习经典入门必读书,作者是Sutton,很多后来的学习资料大多数可以追溯到这本书。
入门博客
视频课程
李宏毅老师《强化学习》
周博磊老师 《强化学习》
经典基础算法
Q-Learning | DQN |Policy Gradient |Reinforce |Actor-Critic | Soft Q - Learning
TRPO |PPO | GAE |TD3
单智能体强化学习
intrinsic reward 系列
RND (ICLR2019) 论文链接 |Never give up (ICLR2020) 论文链接
graph prior 系列
PKG Net (Intel AI Lab) 论文链接 |NERVENET (ICLR2018) 论文链接|SMP(ICML2020)论文链接 |AMORPHEUS(ICLR2021)论文链接
meta RL系列
Nav(ICLR2017)论文链接 |LSTMA2C(DeepMind2016)论文链接 |GradientDescent(NIPS2016)论文链接
多智能体强化学习
mean-field 系列
mean-field MARL (ICML 2018) 论文链接 |Multi Type MFMARL (AAMAS 2020) 论文链接 |Partially Observable MFMARL (AAMAS 2021) 论文链接
StarCraft 系列
VDN (AAMAS 2018) 论文链接 | QMIX (ICML 2018) 论文链接|RMC (NIPS 2018) 论文链接 |COMA (AAAI 2018) 论文链接|QTRAN (ICML 2019) 论文链接 | ROMA (ICML 2020) 论文链接 | WQMIX (NIPS 2020) 论文链接 | LICA (NIPS 2020) 论文链接 | VMIX (AAAI 2021) 论文链接 | DOP (ICLR 2021) 论文链接 | QPLEX (ICLR 2021) 论文链接 | RODE (ICLR 2021) 论文链接 |Qatten 论文链接
communication 系列
CommNet (NIPS 2016)论文链接 | DIAL (NIPS 2016)论文链接 | IC3Net (ICLR 2019)论文链接 | TarMAC (ICML 2019)论文链接 | MAAC (ICML 2019)论文链接 | ATOC (NIPS 2019)论文链接 | SchedNet (ICLR 2019)论文链接 | GA-Comm (AAAI 2020) 论文链接 | NeurComm (ICLR 2020)论文链接 | meets Natural Language (ACL 2020) 论文链接 | Pragmatic (NIPS 2020) 论文链接 | Dynamic population-based meta-learning (未中)论文链接
graph 系列
DGN (ICLR 2020)论文链接 | HAMA (AAAI 2020)论文链接 | G2ANet (AAAI 2020)论文链接 | Flowcomm (AAMAS 2021)论文链接|MAGIC (AAMAS 2021)论文链接 |MAGnet 论文链接 | Transfer (AAMAS 2020) 论文链接
grouping 系列
LSC(未中) 论文链接|SePS (ICML 2021)论文链接
Baselines 系列
IQL (ICML 1993)论文链接| IA2C (ICML 2016)论文链接|MADDPG (NIPS 2017) 论文链接| MAA2C (ICML 2019) 论文链接|MAPPO (未中) 论文链接 | IPPO (未中)论文链接
Survey 系列
Benchmarking in Cooperative Tasks 链接 |
Behavioral Diversity 系列
FCP (NeurlPS 2021) 论文链接 | Investigating Partner Diversification Methods in Cooperative MARL (ICONIP 2020) 论文链接 |Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination (ICLR在投 2022) 论文链接 | SOV and SP in Mixed-Motive RL (AAMAS 2020) 论文链接 |TrajeDi (AAMAS 2021) 论文链接 | Learning to Cooperate with Unseen Agent via Meta-Reinforcement Learning (ArXiv 2021) 论文链接
Zero-sum 系列
Nash-VI (ICML2021) 论文链接 | VI-ULCB (ICML2021) 论文链接 | Near-Optimal Reinforcement Learning with Self-Play (NIPS2020) 论文链接
General-sum 系列
CE-V-Learning (未中) 论文链接 | V-learning OMD (未中) 论文链接 | When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently (未中) 论文链接
V-learning SGD (未中) 论文链接
Competitive RL 系列
Independent Policy Gradient Methods for Competitive Reinforcement Learning (NIPS 2020) 论文链接
Coordination Graphs 系列
Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs (RoboCup 2005) 论文链接 | DICG (AAMAS 2021) 论文链接 |DCG(ICML 2020 ) 论文链接
因果强化学习
Generalised Policy Learning系列
Transfer learning in multi-armed bandits: A causal approach(IJCAI2017)论文链接
Interventions - When and Where系列
Structrual casual bandits:Where to intervene?(NeurIPS2018)论文链接
Counterfactual Decision Making系列
Counterfactual Data-Fusion for Online Reinforcement Learners(ICML2017)论文链接
离线强化学习
Model-free系列
CQL(NIPS2020)论文链接 |BCQ(ICML2019)论文链接 |PLAS(NIPS2020)论文链接 |CRR(NIPS2020)论文链接 |PLOFF论文链接 |OPAL(ICLR2021)论文链接
Model-based系列
MOPO(NIPS2020)论文链接 |COMBO(未中论文链接 )|RepBM(ICLR2021)论文链接 |DeepAveragers论文链接 | GrBAL (ICLR 2019) 论文链接 | MBPO (NIPS 2019) 论文链接
Benchmark
RLUnplugged(NIPS2020)论文链接 |NeoRL(未中)论文链接 |D4RL(未中)论文链接
零样本学习
Without Any Labels系列
CURL(未中)论文链接 | DrQ(CoRL2021)论文链接 | DBC(未中)论文链接 | SECANT(ICML2021)论文链接
With Labels Only in Training Set系列
AugWM(ICML2021)论文链接 | PAD(未中)论文链接
With Labels in Both Training and Testing Sets系列
Morphological HRL(IWSLT2019)论文链接
知识蒸馏
Distilling the Knowledge in a Neural Network (2015 未中 最早的)论文链接 | Reinforced Multi-Teacher Selection for Knowledge Distillation (AAAI 2021) 论文链接
对比学习
survey (2020) 论文链接 | CURL (ICML 2020) 论文链接 | Fair Contrastive Learning for Facial Attribute Classification (CVPR 2022) 论文链接 | Robust Contrastive Learning Using Negative Samples with Diminished Semantics (NeurIPS 2021) 论文链接 | CLINE (ACL 2021) 论文链接 | Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning 论文链接 | Generalizing Reinforcement Learning through Fusing Self-Supervised Learning into Intrinsic Motivation (AAAI 2022) 论文链接 | Divide and Contrast: Self-Supervised Learning From Uncurated Data (ICCV 2021) 论文链接
图神经网络
Streaming Graph Neural Networks (2020)论文链接|Inductive Matrix Completion Using Graph Autoencoder (2021)论文链接