Reinforcement Learning

Pick the action with the highest predicted reward when given a set of features.

A reinforcement learning model takes features as input and produces an integer output known as an action. This type of model attempts to pick the best action by estimating the reward of each action available to take, given the input features.

A high-level example of this is best represented by the way humans often make decisions. For instance, every day we decide what clothing to wear. We choose our outfit based around our activities; we wear light clothing for outdoor activities on a hot day, or more formal attire for a wedding. We can think of the appropriate attire being the action that gives us the highest amount of reward. The features to help decide that action might be the type of activities planned for the day, the weather, and what time it is. The reward tells the model how good the action was and might be reported as the number of compliments or how comfortable the temperature was.

This example also illustrates why reinforcement learning is ideal for personalization. A reinforcement learning model will learn directly from the reward and will modify its behavior automatically to perform the actions that reward it positively (just like humans do). Some real world examples of reinforcement learning models are below.

Model Description

Example Feature Input

Example Reward

Predicting products that a customer is likely to purchase

Customer purchase history, items they've viewed this session, time spent on product pages

Predicted product was purchased (1) or not (0)

Predicting the top 3 search results to put at the top of a search page

User profile, previous user behaviors, previous user click-through rates for search results

Predicted top article was clicked (1) or not (0)

Predicting the most interesting content to show a user

User profile, previous content consumed by user, most popular content consumed by all users

Predicted content was viewed for an extended period of time (duration viewed in seconds)

Model Objective

The objective of a reinforcement learning model is to find accurate estimates of reward for each action across all possibilities of features. This model type is a little harder to visualize as it is more abstract than the classifier or regression model types. A simple way to graph this would be to show the estimated rewards for all the actions for a particular set of features.

Model Settings

The reinforcement learning model has mandatory and optional fields that don't exist with the other model types.

Negative Reward

If the reinforcement learning model is never rewarded for the action it predicted (e.g., a recommended product was never bought), then a negative reward is given to the model to inform it of a bad decision. The negative reward parameter is assigned at prediction time and is mandatory.

Epsilon

The reinforcement learning model needs to try new options every once in a while so it can adapt to new trends. For example, a customer of a clothing store who originally bought many tee-shirts is now more interested in sweaters because the seasons are changing. The epsilon parameter provides this ability to the model.

This parameter is set between 0 and 1, and decides how often the model will try to pick the "best" decision it can. By setting epsilon to 0, you are always choosing the best action but never trying new actions. By setting epsilon to 1, you are always choosing a random action. Epsilon is an optional parameter and by default it is set to 0.2.

Action List

A subset of actions can be defined so that the model does not have to choose from the entire set of actions if it is not necessary. The action list parameter is optional and by default will set the list to the entire set of actions.

Reinforcement Learning Example - Movie Recommender

Reinforcement learning training. The reward endpoint is called after making a prediction, if the model's prediction resulted in a positive outcome. For example, if a user started watching the movie that was recommended, the reward endpoint should then be called to notify the model that it predicted a good action. mlrequest automatically joins the reward information from the previously predicted action, this is why only a few fields are needed when calling the reward endpoint.

Python Client
Python
Javascript
Java
Go
Ruby
C#
Python Client
from mlrequest import RL
rl = RL('your-api-key')
r = rl.reward(model_name='movie-recommender', session_id='the-user-session-id', reward=1)
Python
import requests
payload = {
'model_name': 'movie-recommender',
'session_id': 'the-user-session-id',
'reward': 1
}
r = requests.post('https://api.mlrequest.com/v1/rl/reward', json=payload, headers={'MLREQ-API-KEY':'your-api-key'})
Javascript
Coming soon...
Java
Coming soon...
Go
Coming soon...
Ruby
Coming soon...
C#
Coming soon...

Reinforcement learning prediction. This endpoint is called first to get a prediction for a particular session. See the model settings section for a description of the extra fields that are reinforcement learning specific.

Python Client
Python
Javascript
Java
Go
Ruby
C#
Python Client
features = {
'favorite-genre': 'action',
'last-movie-watched': 'die-hard',
'movies-watched-last-week': 2,
'favorite-genre-all-users': 'comedy',
'top-rated-movie': 'avengers-endgame'
}
r = rl.predict(features=features, model_name='my-model', session_id='the-user-session-id', negative_reward=0, action_count=3, epsilon=0.1)
best_action = r.predict_result[0] #returns a list of actions ordered by rank, choose the first for the best action
Python
import requests
features = {
'favorite-genre': 'action',
'last-movie-watched': 'die-hard',
'movies-watched-last-week': 2,
'favorite-genre-all-users': 'comedy',
'top-rated-movie': 'avengers-endgame'
}
payload = {
'model_name': 'movie-recommender',
'session_id': 'the-user-session-id',
'features': features,
'epsilon': 0.1,
'negative_reward': 0,
'action_count': 3
}
r = requests.post('https://api.mlrequest.com/v1/rl/predict', json=payload, headers={'MLREQ-API-KEY':'your-api-key'})
Javascript
Coming soon...
Java
Coming soon...
Go
Coming soon...
Ruby
Coming soon...
C#
Coming soon...