Productivity as a DIY project

As I was wrapping up my time at Fullstack Academy, the future seemed bright. I had an in-demand skill set in my arsenal and after three months of drilling algorithms and learning best practices in…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




What are the Fundamentals of Reinforcement Learning

Machines seem to be finally catching up.

Photo by author

To study Machine Learning while minding my two-year-old son makes things a little bit more interesting. I imagine him like a super-intelligent robot (I mean: super, super intelligent) right there with you simulating the things you’re reading about.

How does he learn so fast?! He tries, tries and tries until he gets lucky. Then there is a reward: something is open, or a lego block falls into place. The path to the reward is registered, and next time he‘ll know what to do better— Well, that’s Reinforcement Learning in a nutshell (like a tiny nutshell).

In this article, I cover the very basics of Reinforcement Learning, from the field of Machine Learning perspective. I’ve been fascinated with the subject as of recently and have been studying from the resources outlined in the Resources section.

A lot of progress has been made in this area — The development of AlphaGo which played the world champion Lee Sedol, and then the continued enhancements with Alpha Zero referenced in more detail further below, are quite exciting.

Reinforcement learning consists of an agent interacting with the environment and learning from trial and error — the agent is the learner. The environment is anything that the agent interacts with.

The interaction between the two is dynamic — the agent observes the environment’s state and takes action based on it. Then the environment produces a reward (good or bad) and changes its state based on the action.

This goes on and on in a loop with the intent of accumulating the maximum number of rewards to achieve an end goal in the best way possible. Figure 1 illustrates this process.

Each loop is made of a sequence of State (S), Action (A) and Reward (R) — such sequence can be called an episode or the trajectory. [2]

Add a comment

Related posts:

A Brief Introduction to Edge Computing and Deep Learning

Welcome to my first blog on topics in artificial intelligence! Here I will introduce the topic of edge computing, with context in deep learning applications. This blog is largely adapted from a…

IT Staff Augmentation vs. Outsourcing

Modern businesses are blessed with the flexibility and convenience of sharing their work with third party service providers. You can outsource selected processes (like IT) or hire contractual team…

Re Human Day 40

Also listened to this great podcast of Tim Ferris talking with Nick Kokonas — a cool guy who studied philosophy, was a derivatives trader, built a reservations system used in 2.5M restaurants…