diff --git a/jose.00306/10.21105.jose.00306.crossref.xml b/jose.00306/10.21105.jose.00306.crossref.xml
new file mode 100644
index 0000000..2156a01
--- /dev/null
+++ b/jose.00306/10.21105.jose.00306.crossref.xml
@@ -0,0 +1,357 @@
+
+
+
+ 20251029140344-e3dfa4e41078c0a8e498df4eba0998a1024e19e9
+ 20251029140344
+
+ JOSS Admin
+ admin@theoj.org
+
+ The Open Journal
+
+
+
+
+ Journal of Open Source Education
+ JOSE
+ 2577-3569
+
+ 10.21105/jose
+ https://jose.theoj.org
+
+
+
+
+ 10
+ 2025
+
+
+ 8
+
+ 92
+
+
+
+ Reinforcement Learning: A Comprehensive Open-Source Course
+
+
+
+ Ali Hassan Ali
+ Abdelwanis
+
+ Department of Interconnected Automation Systems, University of Siegen, Germany
+
+ https://orcid.org/0009-0001-5853-5900
+
+
+ Barnabas
+ Haucke-Korber
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0000-0003-0862-2069
+
+
+ Darius
+ Jakobeit
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0009-0002-1576-2465
+
+
+ Wilhelm
+ Kirchgässner
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0000-0001-9490-1843
+
+
+ Marvin
+ Meyer
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0009-0008-2879-7118
+
+
+ Maximilian
+ Schenke
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0000-0001-5427-9527
+
+
+ Hendrik
+ Vater
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0009-0005-0654-8741
+
+
+ Oliver
+ Wallscheid
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0000-0001-9362-8777
+
+
+ Daniel
+ Weber
+
+ Department of Power Electronics and Electrical Drives, Paderborn University, Germany
+
+ https://orcid.org/0000-0003-3367-5998
+
+
+
+ 10
+ 29
+ 2025
+
+
+ 306
+
+
+ 10.21105/jose.00306
+
+
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+ http://creativecommons.org/licenses/by/4.0/
+
+
+
+ Software archive
+ 10.5281/zenodo.17347442
+
+
+ GitHub review issue
+ https://github.com/openjournals/jose-reviews/issues/306
+
+
+
+ 10.21105/jose.00306
+ https://jose.theoj.org/papers/10.21105/jose.00306
+
+
+ https://jose.theoj.org/papers/10.21105/jose.00306.pdf
+
+
+
+
+
+ Reinforcement learning: An introduction
+ Sutton
+ IEEE Transactions on Neural Networks
+ 16
+ 2005
+ Sutton, R. S., & Barto, A. G. (2005). Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 16, 285–286. https://api.semanticscholar.org/CorpusID:9166388
+
+
+ Lectures on reinforcement learning
+ Silver
+ 2015
+ Silver, D. (2015). Lectures on reinforcement learning. url: https://www.davidsilver.uk/teaching/.
+
+
+ The hugging face deep reinforcement learning class
+ Simonini
+ GitHub repository
+ 2023
+ Simonini, T., & Sanseviero, O. (2023). The hugging face deep reinforcement learning class. In GitHub repository. https://github.com/huggingface/deep-rl-class; GitHub.
+
+
+ CS234: Reinforcement learning winter 2025
+ Brunskill
+ 2025
+ Brunskill, E. (2025). CS234: Reinforcement learning winter 2025. url: https://web.stanford.edu/class/cs234/.
+
+
+ Spinning up in deep reinforcement learning
+ Achiam
+ 2018
+ Achiam, J. (2018). Spinning up in deep reinforcement learning. url: https://spinningup.openai.com/.
+
+
+ Jupyter notebooks – a publishing format for reproducible computational workflows
+ Kluyver
+ 2016
+ Kluyver, T., Ragan-Kelley, B., & Pérez, F. et al. (2016). Jupyter notebooks – a publishing format for reproducible computational workflows (F. Loizides & B. Schmidt, Eds.; pp. 87–90). IOS Press.
+
+
+ Pandas-dev/pandas: pandas
+ pandas
+ 10.5281/zenodo.3509134
+ 2020
+ pandas. (2020). Pandas-dev/pandas: pandas (latest). Zenodo. https://doi.org/10.5281/zenodo.3509134
+
+
+ Data Structures for Statistical Computing in Python
+ McKinney
+ Proceedings of the 9th Python in Science Conference
+ 10.25080/Majora-92bf1922-00a
+ 2010
+ McKinney, Wes. (2010). Data Structures for Statistical Computing in Python. In Stéfan van der Walt & Jarrod Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
+
+
+ Gymnasium
+ Towers
+ 10.5281/zenodo.8127026
+ 2023
+ Towers, M., Terry, J. K., & Kwiatkowski, A. et al. (2023). Gymnasium. Zenodo. https://doi.org/10.5281/zenodo.8127026
+
+
+ PyTorch: An imperative style, high-performance deep learning library
+ Paszke
+ Advances in neural information processing systems 32
+ 2019
+ Paszke, A., Gross, S., & Massa, F. et al. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems 32 (pp. 8024–8035). Curran Associates, Inc. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
+
+
+ OpenAI gym
+ Brockman
+ 2016
+ Brockman, G., Cheung, V., & Pettersson, L. et al. (2016). OpenAI gym.
+
+
+ Stable-Baselines3: Reliable reinforcement learning implementations
+ Raffin
+ Journal of Machine Learning Research
+ 268
+ 22
+ 2021
+ Raffin, A., Hill, A., & Gleave, A. et al. (2021). Stable-Baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268), 1–8. http://jmlr.org/papers/v22/20-1364.html
+
+
+ Mastering the game of Go with deep neural networks and tree search
+ Silver
+ Nature
+ 7587
+ 529
+ 10.1038/nature16961
+ 0028-0836
+ 2016
+ Silver, D., Huang, A., & Maddison, C. J. et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961
+
+
+ Mastering chess and shogi by self-play with a general reinforcement learning algorithm
+ Silver
+ CoRR
+ abs/1712.01815
+ 2017
+ Silver, D., Hubert, T., & Schrittwieser, J. et al. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. CoRR, abs/1712.01815. http://arxiv.org/abs/1712.01815
+
+
+ Playing atari with deep reinforcement learning
+ Mnih
+ CoRR
+ abs/1312.5602
+ 2013
+ Mnih, V., Kavukcuoglu, K., & Silver, D. et al. (2013). Playing atari with deep reinforcement learning. CoRR, abs/1312.5602. http://arxiv.org/abs/1312.5602
+
+
+ Grandmaster level in StarCraft II using multi-agent reinforcement learning
+ Vinyals
+ Nat.
+ 7782
+ 575
+ 10.1038/s41586-019-1724-z
+ 2019
+ Vinyals, O., Babuschkin, I., & M. Czarnecki, W. et al. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nat., 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z
+
+
+ Transferring online reinforcement learning for electric motor control from simulation to real-world experiments
+ Book
+ IEEE Open Journal of Power Electronics
+ 2
+ 10.1109/OJPEL.2021.3065877
+ 2021
+ Book, G., Traue, A., & Balakrishna, P. et al. (2021). Transferring online reinforcement learning for electric motor control from simulation to real-world experiments. IEEE Open Journal of Power Electronics, 2, 187–201. https://doi.org/10.1109/OJPEL.2021.3065877
+
+
+ Reinforcement learning in robotics: A survey
+ Kober
+ The International Journal of Robotics Research
+ 11
+ 32
+ 10.1177/0278364913495721
+ 2013
+ Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274. https://doi.org/10.1177/0278364913495721
+
+
+ Training a helpful and harmless assistant with reinforcement learning from human feedback
+ Bai
+ 2022
+ Bai, Y., Jones, A., & Ndousse, K. et al. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. https://arxiv.org/abs/2204.05862
+
+
+ Applications of reinforcement learning in finance – trading with a double deep q-network
+ Zejnullahu
+ 2022
+ Zejnullahu, F., Moser, M., & Osterrieder, J. (2022). Applications of reinforcement learning in finance – trading with a double deep q-network. https://arxiv.org/abs/2206.14267
+
+
+ Reinforcement learning for intelligent healthcare applications: A survey
+ Coronato
+ Artificial Intelligence in Medicine
+ 109
+ 10.1016/j.artmed.2020.101964
+ 0933-3657
+ 2020
+ Coronato, A., Naeem, M., De Pietro, G., & Paragliola, G. (2020). Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence in Medicine, 109, 101964. https://doi.org/10.1016/j.artmed.2020.101964
+
+
+ An efficient deep reinforcement learning model for urban traffic control
+ Lin
+ CoRR
+ abs/1808.01876
+ 2018
+ Lin, Y., Dai, X., Li, L., & Wang, F.-Y. (2018). An efficient deep reinforcement learning model for urban traffic control. CoRR, abs/1808.01876. http://arxiv.org/abs/1808.01876
+
+
+ API design for machine learning software: Experiences from the scikit-learn project
+ Buitinck
+ ECML PKDD workshop: Languages for data mining and machine learning
+ 2013
+ Buitinck, L., Louppe, G., & Blondel, M. et al. (2013). API design for machine learning software: Experiences from the scikit-learn project. ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122.
+
+
+ Safe reinforcement learning-based control in power electronic systems
+ Weber
+ 2023 international conference on future energy solutions (FES)
+ 10.1109/FES57669.2023.10182718
+ 2023
+ Weber, D., Schenke, M., & Wallscheid, O. (2023). Safe reinforcement learning-based control in power electronic systems. 2023 International Conference on Future Energy Solutions (FES), 1–6. https://doi.org/10.1109/FES57669.2023.10182718
+
+
+ A deep q-learning direct torque controller for permanent magnet synchronous motors
+ Schenke
+ IEEE Open Journal of the Industrial Electronics Society
+ 2
+ 10.1109/OJIES.2021.3075521
+ 2021
+ Schenke, M., & Wallscheid, O. (2021). A deep q-learning direct torque controller for permanent magnet synchronous motors. IEEE Open Journal of the Industrial Electronics Society, 2, 388–400. https://doi.org/10.1109/OJIES.2021.3075521
+
+
+
+
+
+
diff --git a/jose.00306/10.21105.jose.00306.pdf b/jose.00306/10.21105.jose.00306.pdf
new file mode 100644
index 0000000..907e200
Binary files /dev/null and b/jose.00306/10.21105.jose.00306.pdf differ
diff --git a/jose.00306/paper.jats/10.21105.jose.00306.jats b/jose.00306/paper.jats/10.21105.jose.00306.jats
new file mode 100644
index 0000000..9dea027
--- /dev/null
+++ b/jose.00306/paper.jats/10.21105.jose.00306.jats
@@ -0,0 +1,865 @@
+
+
+
+
+
+
+
+Journal of Open Source Education
+JOSE
+
+2577-3569
+
+Open Journals
+
+
+
+306
+10.21105/jose.00306
+
+Reinforcement Learning: A Comprehensive Open-Source
+Course
+
+
+
+https://orcid.org/0009-0001-5853-5900
+
+Abdelwanis
+Ali Hassan Ali
+
+
+
+
+https://orcid.org/0000-0003-0862-2069
+
+Haucke-Korber
+Barnabas
+
+
+
+
+https://orcid.org/0009-0002-1576-2465
+
+Jakobeit
+Darius
+
+
+
+
+https://orcid.org/0000-0001-9490-1843
+
+Kirchgässner
+Wilhelm
+
+
+
+
+https://orcid.org/0009-0008-2879-7118
+
+Meyer
+Marvin
+
+
+
+
+https://orcid.org/0000-0001-5427-9527
+
+Schenke
+Maximilian
+
+
+
+
+https://orcid.org/0009-0005-0654-8741
+
+Vater
+Hendrik
+
+
+
+
+https://orcid.org/0000-0001-9362-8777
+
+Wallscheid
+Oliver
+
+
+
+
+https://orcid.org/0000-0003-3367-5998
+
+Weber
+Daniel
+
+
+
+
+
+Department of Power Electronics and Electrical Drives,
+Paderborn University, Germany
+
+
+
+
+Department of Interconnected Automation Systems, University
+of Siegen, Germany
+
+
+
+
+19
+7
+2023
+
+8
+92
+306
+
+Authors of papers retain copyright and release the
+work under a Creative Commons Attribution 4.0 International License (CC
+BY 4.0)
+2025
+The article authors
+
+Authors of papers retain copyright and release the work under
+a Creative Commons Attribution 4.0 International License (CC BY
+4.0)
+
+
+
+data science
+Python
+TensorFlow
+PyTorch
+Jupyter notebook
+reproducible workflow
+open science
+reinforcement learning
+exploratory data analysis
+machine learning
+supervised learning
+
+
+
+
+
+ Summary
+
We present an open-source repository of an extensive course on
+ reinforcement learning. It is specifically designed for master
+ students in engineering and computer science. The course aims to
+ introduce beginners to the fundamentals of reinforcement learning and
+ progress towards advanced algorithms. This is done using examples
+ spanning many different classic control engineering tasks. It is
+ structured to be accessible to students with limited prior programming
+ experience by introducing the basics of Python.
+
The course spans 14 weeks, comprising 14 lectures and 12 exercises.
+ Accompanying video materials from real lectures and exercises are
+ provided to aid in understanding the course content. They are
+ available on a
+ YouTube
+ channel under an Creative Commons license. The open-source
+ nature of the course allows other teachers to freely adapt the
+ materials for their own teaching purposes. The primary goal is to
+ equip learners with a solid theory of reinforcement learning
+ principles, as well as the practical tools to solve real-world
+ engineering problems from different domains, such as electrical
+ engineering.
+
The lecture follows Richard S. Sutton and Andrew G. Barto’s
+ fundamentals book on reinforcement learning
+ (Sutton
+ & Barto, 2005) and takes inspiration from the reinforcement
+ learning lecture script delivered by David Silver
+ (Silver,
+ 2015). The exercises are programmed in Python using Jupyter
+ notebooks
+ (Kluyver
+ et al., 2016) for presentation. Important libraries for machine
+ and reinforcement learning are introduced, such as pandas
+ (McKinney,
+ 2010;
+ pandas,
+ 2020), gymnasium
+ (Towers
+ et al., 2023), PyTorch
+ (Paszke
+ et al., 2019), scikit-learn
+ (Buitinck
+ et al., 2013), and stable-baselines3
+ (Raffin
+ et al., 2021).
+
The authors of this course have experience working with
+ reinforcement learning in the domain of electrical engineering, in
+ particular in electric drive
+ (Schenke
+ & Wallscheid, 2021) and grid control
+ (Weber
+ et al., 2023). The course has first been held under the
+ constraints of the COVID-19 pandemic in 2020, resorting to an online,
+ asynchronous learning experience. It has been extended with a session
+ on more contemporary algorithms in 2022. In subsequent years the
+ course has been revised to incorporate experience from teaching the
+ course and to align the structure of the exercises. All versions (for
+ each year’s revision) are available inside the publicly available
+ GitHub
+ repository.
+
+
+ Statement of Need
+
Recent developments in (deep) reinforcement learning caused
+ considerable excitement in both academia and
+ popular
+ science media. Starting with beating champions in complex
+ board games such as chess
+ (Silver
+ et al., 2017) and Go
+ (Silver
+ et al., 2016), breaking human records in a wide variety of
+ video games
+ (Mnih
+ et al., 2013;
+ Vinyals
+ et al., 2019), up to recent solutions in real-world (control)
+ applications
+ (Bai et
+ al., 2022;
+ Book et
+ al., 2021;
+ Coronato
+ et al., 2020;
+ Kober
+ et al., 2013;
+ Lin et
+ al., 2018;
+ Zejnullahu
+ et al., 2022), reinforcement learning agents have been proven
+ to be a control or decision-making solution for a wide variety of
+ application domains. Reinforcement learning poses an elegant and
+ data-driven path to a control solution with minimal expert knowledge
+ involved, which makes it highly attractive for many different research
+ domains. A similar development has already been observed in recent
+ years with regard to deep supervised learning.
+
An increasing amount of educational resources is available due to
+ the traction RL has gained in recent years. However, most courses lack
+ either the continuity of topics ranging from the foundations up to the
+ advanced topics of deep reinforcement learning, practical programming
+ exercises accompanying each theoretical lecture, the testing at
+ university level, or free availability. Alternative courses often
+ focus on games
+ (Simonini
+ & Sanseviero, 2023) or a mix of theoretical and practical
+ questions for their exercises
+ (Achiam,
+ 2018;
+ Brunskill,
+ 2025). In contrast, our course utilizes practical application
+ scenarios from a wide variety of domains with a strong focus on
+ classical control engineering tasks. This course can therefore help
+ accelerate establishing reinforcement learning solutions within
+ real-world applications.
+
+
+ Target Audience and Learning Goals
+
The target learner audience of this course are master students from
+ the subjects of engineering, computer science and anyone who is
+ interested in the concepts of reinforcement learning. Its exercises
+ are designed to be solvable by students without (strong) programming
+ background when done in the presented order. Students learn to utilize
+ reinforcement learning depending on the problem. They learn how to
+ incorporate expert knowledge into their reinforcement learning
+ solution, e.g., by designing the features or reward functions.
+ Exercises start with a very low-level introduction of the programming
+ language Python. Later exercises introduce advanced techniques that
+ can be utilized in more comprehensive environments, such as electric
+ drive states prediction or vehicle control. Students should have
+ experience with algorithm notation to be able to practically implement
+ the algorithms which are presented in the lectures. Some basic
+ understanding of stochastics is advised to understand mathematical
+ background. At the end of the course, students should have gained the
+ following skills:
+
+
+
Understand basic concepts and functionalities of reinforcement
+ learning methods.
+
+
+
Be able to understand and evaluate state-of-the-art
+ algorithms.
+
+
+
Have the ability to implement basic and advanced algorithms
+ using open-source libraries in Python.
+
+
+
Be able to select a fitting solution when presented with a new
+ task.
+
+
+
Can critically interpret and evaluate results and
+ performance.
+
+
+
+
+ Content
+
The course is structured as a one semester university-level course
+ with two sessions each week: one lecture and one exercise. The
+ contents of the latest iteration of the course (summer term 2025) are
+ presented in the following.
+
A summary of lectures and exercises can be found in table 1 and
+ table 2, respectively.
+
+
+
Summary of course lectures.
+
+
+
+
+
Lecture
+
Content
+
+
+
+
+
01
+
Introduction to Reinforcement Learning
+
+
+
02
+
Markov Decision Processes
+
+
+
03
+
Dynamic Programming
+
+
+
04
+
Monte Carlo Methods
+
+
+
05
+
Temporal-Difference Learning
+
+
+
06
+
Multi-Step Bootstrapping
+
+
+
07
+
Planning and Learning with Tabular Methods
+
+
+
08
+
Function Approximation with Supervised Learning
+
+
+
09
+
On-Policy Prediction with Function Approximation
+
+
+
10
+
Value-Based Control with Function Approximation
+
+
+
11
+
Stochastic Policy Gradient Methods
+
+
+
12
+
Deterministic Policy Gradient Methods
+
+
+
13
+
Further Contemporary RL Algorithms (TRPO, PPO)
+
+
+
14
+
Outlook and Research Insights
+
+
+
+
+
+
+
Summary of course exercises.
+
+
+
+
+
Exercise
+
Content
+
+
+
+
+
01
+
Basics of Python for Scientific Computing
+
+
+
02
+
Basic Markov Chain, Reward and Decision Problems
+
+
+
03
+
Dynamic Programming
+
+
+
04
+
Race Track with Monte Carlo Learning
+
+
+
05
+
Race Track with Temporal-Difference Learning
+
+
+
06
+
Inverted Pendulum with Tabular Multi-Step Methods
+
+
+
07
+
Inverted Pendulum within Dyna Framework
+
+
+
08
+
Predicting Electric Drive with Supervised Learning
+
+
+
09
+
Evaluate Given Agents in Mountain Car Problem
+
+
+
10
+
Mountain Car Valley Using Semi-Gradient Sarsa
+
+
+
11
+
Moon Landing with Actor-Critic Methods
+
+
+
12
+
Shoot for the moon with DDPG & PPO
+
+
+
+
+
Lectures and exercises which share the same number also deal with
+ the same topics. Thus, theoretical basics are provided in the lecture,
+ which are to be implemented and evaluated in the exercises on the
+ basis of specific application examples which are taken from third
+ party open-source libraries
+ (Brockman
+ et al., 2016;
+ Towers
+ et al., 2023). This allows the learners to internalize learned
+ contents practically. However, the lecture can be studied
+ independently of the exercises and the exercises independently of the
+ lecture in case of self-learning.
+
The lecture slides were created in LaTex and published accordingly
+ to allow for consistent display and easy adaptation of the material by
+ other instructors. The practical exercises were implemented in Jupyter
+ notebooks
+ (Kluyver
+ et al., 2016). These also allow a quick implementation of
+ further, or modification of existing, content.
+
+
+ Conclusion
+
The presented course provides a complete introduction to the
+ fundamentals and contemporary applications of reinforcement learning.
+ By combining theory and practice, the learner is enabled to analyze
+ and solve (even intricate control engineering) problems in the context
+ of reinforcement learning. Both the lecture content and the exercises
+ are open-source and designed to be easily adapted by other
+ instructors. Due to the recorded explanatory videos, this course can
+ easily be used by self-learners.
+
+
+ Author’s Contribution
+
Authors are listed in alphabetical order. Wilhelm Kirchgässner,
+ Maximilian Schenke, Oliver Wallscheid, and Daniel Weber have created
+ and held this course since the summer term of 2020. Barnabas
+ Haucke-Korber, Darius Jakobeit, and Marvin Meyer joined the University
+ of Paderborn at a later date and supported with revising and holding
+ the exercises in 2023. In 2024, Hendrik Vater contributed by aligning
+ the exercises to a common format. In 2025, Ali Hassan Ali Abdelwanis
+ supported updating the exercises to the newest libraries and
+ contributed to their revision.
+
+
+ Acknowledgements
+
We would like to thank all of the students who helped improving the
+ course by attending lectures, solving the exercises and giving
+ valuable feedback, as well as the open-source community for asking
+ questions and suggesting changes on GitHub.
+
+
+
+
+
+
+
+
+ SuttonRichard S.
+ BartoAndrew G.
+
+ Reinforcement learning: An introduction
+ IEEE Transactions on Neural Networks
+ 2005
+ 16
+ https://api.semanticscholar.org/CorpusID:9166388
+ 285
+ 286
+
+
+
+
+
+ SilverDavid
+
+ Lectures on reinforcement learning
+ url: https://www.davidsilver.uk/teaching/
+ 2015
+
+
+
+
+
+ SimoniniThomas
+ SansevieroOmar
+
+ The hugging face deep reinforcement learning class
+ GitHub repository
+ https://github.com/huggingface/deep-rl-class; GitHub
+ 2023
+
+
+
+
+
+ BrunskillEmma
+
+ CS234: Reinforcement learning winter 2025
+ url: https://web.stanford.edu/class/cs234/
+ 2025
+
+
+
+
+
+ AchiamJosh
+
+ Spinning up in deep reinforcement learning
+ url: https://spinningup.openai.com/
+ 2018
+
+
+
+
+
+ KluyverThomas
+ Ragan-KelleyBenjamin
+ PérezFernando et al
+
+ Jupyter notebooks – a publishing format for reproducible computational workflows
+
+ LoizidesF.
+ SchmidtB.
+
+ IOS Press
+ 2016
+ 87
+ 90
+
+
+
+
+
+ pandas
+
+ Pandas-dev/pandas: pandas
+ Zenodo
+ 202002
+ https://doi.org/10.5281/zenodo.3509134
+ 10.5281/zenodo.3509134
+
+
+
+
+
+ McKinney
+
+ Data Structures for Statistical Computing in Python
+ Proceedings of the 9th Python in Science Conference
+
+ Walt
+ Millman
+
+ 2010
+ 10.25080/Majora-92bf1922-00a
+ 56
+ 61
+
+
+
+
+
+ TowersMark
+ TerryJordan K.
+ KwiatkowskiAriel et al
+
+ Gymnasium
+ Zenodo
+ 202303
+ 20230708
+ https://zenodo.org/record/8127025
+ 10.5281/zenodo.8127026
+
+
+
+
+
+ PaszkeAdam
+ GrossSam
+ MassaFrancisco et al
+
+ PyTorch: An imperative style, high-performance deep learning library
+ Advances in neural information processing systems 32
+ Curran Associates, Inc.
+ 2019
+ http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
+ 8024
+ 8035
+
+
+
+
+
+ BrockmanGreg
+ CheungVicki
+ PetterssonLudwig et al
+
+ OpenAI gym
+ 2016
+
+
+
+
+
+ RaffinAntonin
+ HillAshley
+ GleaveAdam et al
+
+ Stable-Baselines3: Reliable reinforcement learning implementations
+ Journal of Machine Learning Research
+ 2021
+ 22
+ 268
+ http://jmlr.org/papers/v22/20-1364.html
+ 1
+ 8
+
+
+
+
+
+ SilverDavid
+ HuangAja
+ MaddisonChris J. et al
+
+ Mastering the game of Go with deep neural networks and tree search
+ Nature
+ 2016
+ 529
+ 7587
+ 0028-0836
+ 10.1038/nature16961
+ 484
+ 489
+
+
+
+
+
+ SilverDavid
+ HubertThomas
+ SchrittwieserJulian et al
+
+ Mastering chess and shogi by self-play with a general reinforcement learning algorithm
+ CoRR
+ 2017
+ abs/1712.01815
+ http://arxiv.org/abs/1712.01815
+
+
+
+
+
+ MnihVolodymyr
+ KavukcuogluKoray
+ SilverDavid et al
+
+ Playing atari with deep reinforcement learning
+ CoRR
+ 2013
+ abs/1312.5602
+ http://arxiv.org/abs/1312.5602
+
+
+
+
+
+ VinyalsOriol
+ BabuschkinIgor
+ M. CzarneckiWojciech et al
+
+ Grandmaster level in StarCraft II using multi-agent reinforcement learning
+ Nat.
+ 2019
+ 575
+ 7782
+ https://doi.org/10.1038/s41586-019-1724-z
+ 10.1038/s41586-019-1724-z
+ 350
+ 354
+
+
+
+
+
+ BookGerrit
+ TraueArne
+ BalakrishnaPraneeth et al
+
+ Transferring online reinforcement learning for electric motor control from simulation to real-world experiments
+ IEEE Open Journal of Power Electronics
+ 202103
+ 2
+ 10.1109/OJPEL.2021.3065877
+ 187
+ 201
+
+
+
+
+
+ KoberJens
+ BagnellJ. Andrew
+ PetersJan
+
+ Reinforcement learning in robotics: A survey
+ The International Journal of Robotics Research
+ 2013
+ 32
+ 11
+ https://doi.org/10.1177/0278364913495721
+ 10.1177/0278364913495721
+ 1238
+ 1274
+
+
+
+
+
+ BaiYuntao
+ JonesAndy
+ NdousseKamal et al
+
+ Training a helpful and harmless assistant with reinforcement learning from human feedback
+ 2022
+ https://arxiv.org/abs/2204.05862
+
+
+
+
+
+ ZejnullahuFrensi
+ MoserMaurice
+ OsterriederJoerg
+
+ Applications of reinforcement learning in finance – trading with a double deep q-network
+ 2022
+ https://arxiv.org/abs/2206.14267
+
+
+
+
+
+ CoronatoAntonio
+ NaeemMuddasar
+ De PietroGiuseppe
+ ParagliolaGiovanni
+
+ Reinforcement learning for intelligent healthcare applications: A survey
+ Artificial Intelligence in Medicine
+ 2020
+ 109
+ 0933-3657
+ https://www.sciencedirect.com/science/article/pii/S093336572031229X
+ 10.1016/j.artmed.2020.101964
+ 101964
+
+
+
+
+
+
+ LinYilun
+ DaiXingyuan
+ LiLi
+ WangFei-Yue
+
+ An efficient deep reinforcement learning model for urban traffic control
+ CoRR
+ 2018
+ abs/1808.01876
+ http://arxiv.org/abs/1808.01876
+
+
+
+
+
+ BuitinckLars
+ LouppeGilles
+ BlondelMathieu et al
+
+ API design for machine learning software: Experiences from the scikit-learn project
+ ECML PKDD workshop: Languages for data mining and machine learning
+ 2013
+ 108
+ 122
+
+
+
+
+
+ WeberDaniel
+ SchenkeMaximilian
+ WallscheidOliver
+
+ Safe reinforcement learning-based control in power electronic systems
+ 2023 international conference on future energy solutions (FES)
+ 2023
+
+ 10.1109/FES57669.2023.10182718
+ 1
+ 6
+
+
+
+
+
+ SchenkeMaximilian
+ WallscheidOliver
+
+ A deep q-learning direct torque controller for permanent magnet synchronous motors
+ IEEE Open Journal of the Industrial Electronics Society
+ 202104
+ 2
+ 10.1109/OJIES.2021.3075521
+ 388
+ 400
+
+
+
+
+