强化学习RL

强化学习入门

重要网站

Spinning up

OpenAI维护的强化学习的网站，介绍了经典算法并有配套的code。

教材材料

《Reinforcement Learning: An Introduction》

官方网站｜PDF链接

强化学习经典入门必读书，作者是Sutton，很多后来的学习资料大多数可以追溯到这本书。

入门博客

lil-log
1. A (Long) Peek into Reinforcement Learning
2. Policy Gradient Algorithms

视频课程

李宏毅老师《强化学习》

Youtube

周博磊老师《强化学习》

Github ｜B站

经典基础算法

Q-Learning ｜ DQN ｜Policy Gradient ｜Reinforce ｜Actor-Critic | Soft Q - Learning

DPG ｜DDPG

TRPO ｜PPO ｜ GAE ｜TD3

强化学习进阶

单智能体强化学习

intrinsic reward 系列

RND (ICLR2019) 论文链接｜Never give up (ICLR2020) 论文链接

graph prior 系列

PKG Net (Intel AI Lab) 论文链接｜NERVENET (ICLR2018) 论文链接｜SMP(ICML2020)论文链接｜AMORPHEUS(ICLR2021)论文链接

meta RL系列

Nav(ICLR2017)论文链接｜LSTMA2C(DeepMind2016)论文链接｜GradientDescent(NIPS2016)论文链接

多智能体强化学习

mean-field 系列

mean-field MARL (ICML 2018) 论文链接｜Multi Type MFMARL (AAMAS 2020) 论文链接｜Partially Observable MFMARL (AAMAS 2021) 论文链接

StarCraft 系列

communication 系列

graph 系列

DGN （ICLR 2020）论文链接 | HAMA （AAAI 2020）论文链接 | G2ANet （AAAI 2020）论文链接 | Flowcomm （AAMAS 2021）论文链接｜MAGIC （AAMAS 2021）论文链接｜MAGnet 论文链接 | Transfer (AAMAS 2020) 论文链接

grouping 系列

LSC(未中) 论文链接｜SePS (ICML 2021)论文链接

Baselines 系列

IQL （ICML 1993）论文链接｜ IA2C （ICML 2016）论文链接｜MADDPG （NIPS 2017）论文链接｜ MAA2C （ICML 2019）论文链接｜MAPPO （未中）论文链接｜ IPPO （未中）论文链接

Survey 系列

Benchmarking in Cooperative Tasks 链接 |

Behavioral Diversity 系列

FCP (NeurlPS 2021) 论文链接 | Investigating Partner Diversification Methods in Cooperative MARL (ICONIP 2020) 论文链接 |Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination (ICLR在投 2022) 论文链接 | SOV and SP in Mixed-Motive RL (AAMAS 2020) 论文链接 |TrajeDi (AAMAS 2021) 论文链接 | Learning to Cooperate with Unseen Agent via Meta-Reinforcement Learning (ArXiv 2021) 论文链接

Zero-sum 系列

Nash-VI (ICML2021) 论文链接 | VI-ULCB (ICML2021) 论文链接 | Near-Optimal Reinforcement Learning with Self-Play (NIPS2020) 论文链接

General-sum 系列

CE-V-Learning (未中) 论文链接 | V-learning OMD (未中) 论文链接 | When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently (未中) 论文链接

V-learning SGD (未中) 论文链接

Competitive RL 系列

Independent Policy Gradient Methods for Competitive Reinforcement Learning (NIPS 2020) 论文链接

Coordination Graphs 系列

Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs (RoboCup 2005) 论文链接 | DICG (AAMAS 2021) 论文链接 |DCG(ICML 2020 ) 论文链接

因果强化学习

Generalised Policy Learning系列

Transfer learning in multi-armed bandits: A causal approach（IJCAI2017）论文链接

Interventions - When and Where系列

Structrual casual bandits:Where to intervene?（NeurIPS2018）论文链接

Counterfactual Decision Making系列

Counterfactual Data-Fusion for Online Reinforcement Learners（ICML2017）论文链接

离线强化学习

Model-free系列

CQL(NIPS2020)论文链接｜BCQ(ICML2019)论文链接｜PLAS(NIPS2020)论文链接｜CRR(NIPS2020)论文链接｜PLOFF论文链接｜OPAL(ICLR2021)论文链接

Model-based系列

MOPO(NIPS2020)论文链接｜COMBO(未中论文链接 )｜RepBM(ICLR2021)论文链接｜DeepAveragers论文链接 | GrBAL (ICLR 2019) 论文链接 | MBPO (NIPS 2019) 论文链接

Benchmark

RLUnplugged(NIPS2020)论文链接｜NeoRL(未中)论文链接｜D4RL(未中)论文链接

零样本学习

Without Any Labels系列

CURL(未中)论文链接 | DrQ(CoRL2021)论文链接 | DBC(未中)论文链接 | SECANT(ICML2021)论文链接

With Labels Only in Training Set系列

AugWM(ICML2021)论文链接 | PAD(未中)论文链接

With Labels in Both Training and Testing Sets系列

Morphological HRL(IWSLT2019)论文链接

知识蒸馏

Distilling the Knowledge in a Neural Network （2015 未中最早的）论文链接 | Reinforced Multi-Teacher Selection for Knowledge Distillation (AAAI 2021) 论文链接

对比学习

survey (2020) 论文链接 | CURL (ICML 2020) 论文链接 | Fair Contrastive Learning for Facial Attribute Classification (CVPR 2022) 论文链接 | Robust Contrastive Learning Using Negative Samples with Diminished Semantics (NeurIPS 2021) 论文链接 | CLINE (ACL 2021) 论文链接 | Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning 论文链接 | Generalizing Reinforcement Learning through Fusing Self-Supervised Learning into Intrinsic Motivation (AAAI 2022) 论文链接 | Divide and Contrast: Self-Supervised Learning From Uncurated Data (ICCV 2021) 论文链接

图神经网络

Streaming Graph Neural Networks (2020)论文链接|Inductive Matrix Completion Using Graph Autoencoder (2021)论文链接

Page updated

Google Sites

Report abuse