Building Slot Machine Python Kivy

  1. Kivy Python Example
  2. Kivy Python Android
  3. Kivy For Python 3.8

So I actually ended up making a slot machine module with a built-in mini-game that runs in the terminal. The slot machine is itself a class object that contains information about items and reels attached to it, allowing one to make customized machines and the mini-game is one such example.

SlotMachine
import random
print(''Welcome to the Slot Machine Simulator
You'll start with $50. You'll be asked if you want to play.
Answer with yes/no. you can also use y/n
No case sensitivity in your answer.
For example you can answer with YEs, yEs, Y, nO, N.
To win you must get one of the following combinations:
BARtBARtBARttpayst$250
BELLtBELLtBELL/BARtpayst$20
PLUMtPLUMtPLUM/BARtpayst$14
ORANGEtORANGEtORANGE/BARtpayst$10
CHERRYtCHERRYtCHERRYttpayst$7
CHERRYtCHERRYt -ttpayst$5
CHERRYt -t -ttpayst$2
'')
#Constants:
INIT_STAKE = 50
ITEMS = ['CHERRY', 'LEMON', 'ORANGE', 'PLUM', 'BELL', 'BAR']
firstWheel = None
secondWheel = None
thirdWheel = None
stake = INIT_STAKE
def play():
global stake, firstWheel, secondWheel, thirdWheel
playQuestion = askPlayer()
while(stake != 0 and playQuestion True):
firstWheel = spinWheel()
secondWheel = spinWheel()
thirdWheel = spinWheel()
printScore()
playQuestion = askPlayer()
def askPlayer():
''
Asks the player if he wants to play again.
expecting from the user to answer with yes, y, no or n
No case sensitivity in the answer. yes, YeS, y, y, nO . . . all works
''
global stake
while(True):
answer = input('You have $' + str(stake) + '. Would you like to play? ')
answer = answer.lower()
if(answer 'yes' or answer 'y'):
return True
elif(answer 'no' or answer 'n'):
print('You ended the game with $' + str(stake) + ' in your hand.')
return False
else:
print('wrong input!')
def spinWheel():
''
returns a random item from the wheel
''
randomNumber = random.randint(0, 5)
return ITEMS[randomNumber]
def printScore():
''
prints the current score
''
global stake, firstWheel, secondWheel, thirdWheel
if((firstWheel 'CHERRY') and (secondWheel != 'CHERRY')):
win = 2
elif((firstWheel 'CHERRY') and (secondWheel 'CHERRY') and (thirdWheel != 'CHERRY')):
win = 5
elif((firstWheel 'CHERRY') and (secondWheel 'CHERRY') and (thirdWheel 'CHERRY')):
win = 7
elif((firstWheel 'ORANGE') and (secondWheel 'ORANGE') and ((thirdWheel 'ORANGE') or (thirdWheel 'BAR'))):
win = 10
elif((firstWheel 'PLUM') and (secondWheel 'PLUM') and ((thirdWheel 'PLUM') or (thirdWheel 'BAR'))):
win = 14
elif((firstWheel 'BELL') and (secondWheel 'BELL') and ((thirdWheel 'BELL') or (thirdWheel 'BAR'))):
win = 20
elif((firstWheel 'BAR') and (secondWheel 'BAR') and (thirdWheel 'BAR')):
win = 250
else:
win = -1
stake += win
if(win > 0):
print(firstWheel + 't' + secondWheel + 't' + thirdWheel + ' -- You win $' + str(win))
else:
print(firstWheel + 't' + secondWheel + 't' + thirdWheel + ' -- You lose')
play()

commented Dec 14, 2015

Instead of;
if(answer 'yes' or answer 'y'):

Do;
if answer.lower() in ['yes',y']

commented Jun 2, 2017

I run it on python 2 ,it's need to modify the 43 line (input -> raw_input)

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Reinforcement learning has yet to reach the hype levels of its Supervised and Unsupervised learning cousins. Albeit, it is an exceptionally powerful approach aimed to solve a variety of problems in a completely different way. To learn reinforcement learning, it is best to start from its building blocks and progress from there. In this article, we will go through a Multi-Armed Bandit using Python to solve a business problem.

What are Multi-Armed Bandits?

Imagine you have 3 slot machines and you are given a set of rounds. In each round, you have to choose 1 slot machine and pull its arm and receive the reward (or no at all) from that slot machine. Then, you do it again, and again… Eventually, you figure out which slot machine gives you the most rewards and keep pulling it for each round.

This is the Multi-Armed Bandit problem also known as the k-Armed Bandit problem. K is the number of slot machines available and your reinforcement learning algorithm needs to figure out which slot machine to pull in order to maximize its rewards.

Multi-Armed bandits are a classical reinforcement learning example and it clearly exemplifies a well-known dilemma in reinforcement learning called the exploration-exploitation trade-off dilemma.

Keep in mind these important concepts. I will cover them in more detail in posts coming up. For now, let’s apply Multi-Armed bandits to a business problem.

Our Business Problem

We will apply the Multi-Armed Bandit method to a theoretical marketing business problem.

You are the Chief Marketing Officer promoting a new product and need to get the word out. To do this, you will be texting your target customers a message. Upon receiving the message, the customer will react one way or the other and generate a reward that you have access to.

You have 4 text messages to send but are not sure which one will work best.

Upper-Confidence-Bound Action Selection Formula

How do you choose the best action? Going back to my example in which you had to choose a slot machine at each round. Particularly at the beginning, you had to test them out correct? Either stick to one from the start (not the best option) or try a couple of them to find out the best one.

A common strategy is called the Upper-Confidence-Bound Action selection, in short, UCB.

If you are an optimist, you will like this one! It’s strategy is :

Optimism in the face of uncertainty.

This method selects the action according to its potential, captured in the Upper-Confidence interval. Balancing this out with how uncertain you are of its measurement.

The UCB formula is the following:

( A_{t} doteq argmax_{a} left [ Q_{t}(a) + csqrtfrac{lnt}{N_{t}(a)} right ] )

  • t = the time (or round) we are currently at
  • a = action selected (in our case the message chosen)
  • Nt(a) = number of times action a was selected prior to the time t
  • Qt(a) = average reward of action a prior to the time t
  • c = a number greater than 0 that controls the degree of exploration
  • ln t = natural logarithm of t

The idea behind this algorithm is that the value of the square root indicates the uncertainty of the value of action a.

If an action is not chosen, t increases Nt(a) does not which will make the uncertainty of that action to increase, increasing its chances of being chosen (exploration).

If an action is chosen both numerator increases. The numerators increase get smaller through time (due to the natural log) but the denominator does not, causing the uncertainty to decrease.

The value with the maximum UCB gets chosen at each round! Let’s go through an example of how to implement UCB in Python.

Multi-Armed Bandit – Generate Data

Let us begin implementing this classical reinforcement learning problem using python. As always, import the required libraries first.

The last line above is only needed if you are using a jupyter notebook for the implementation.

Next up, we will generate our dataset consisting of 5 columns. The first is the user we sent SMS messages to followed by m1 through m4 which holds the message that was sent. The value is the rewards for each combination of message and user. Each follows a normal distribution with a mean between 95 and 105 and a standard deviation of 5 or 10. We generate 10,000 samples.

You can see our dataset below.

To get a better sense of our average rewards for each message, let’s visualize the above dataset in a box plot.

Looking at the box-plot above, if you had to pick the best message which one would you choose?…. It would have to be m2 of course!

Do you think our multi-armed bandit algorithm will pick this up? We will find out soon.

Multi-Armed Bandit – UCB Method

In order to solve our Multi-Armed bandit problem using the Upper-Confidence Bound selection method, we need to iterate through each round, take an action (select and send a message), see its returns and pick again. Eventually, we will be selecting the best message.

To implement UCB in python first initialize our variables. Each is commented on below to aid in your understanding. These will help us solve the UCB formula previously shown.

Next, we will loop through each round represented by the variable t. At each round, the UCB_Values array will hold the UCB values as of that round and these get set to 0. Then we loop through each possible action. If an action has never been selected, it gets chosen. Otherwise, we calculate the UCB value for each action and store it in the UCB_Values array.

At the end of each round, the action containing the maximum UCB Value gets selected. The numpy argmax function returns the index of the maximum value in the array. This is stored in the action_selected variable.

The last block of code below, under “update Values as of round t” updates the values of our important variables with the information our system has seen thus far.

To allow us to perform additional analysis, I have added this additional piece to our code after “sum_rewards[action_selected] += reward”.

Building Slot Machine Python Kivy

We store values at each round corresponding to the reward we would have received by choosing a message at random in each round. Obviously, we aim to do better than that through the UCB algorithm.

Let us first see if our method performs better than choosing an action randomly.

Kivy Python Example

Indeed it performs much better than selecting an action randomly!

Our variable, Nt_a, holds the number of times each action was selected. We said previously that for this exercise, M2 would be the better one as it’s distribution has the highest average. Plot each action and the number of times each was selected.

What do you think? Just as expected!

Conclusion

Kivy Python Android

Machine

Kivy For Python 3.8

We have now implemented the Upper-Confidence Bound Selection method to solve a classical reinforcement learning algorithm, the Multi-Armed Bandits. UCB is widely used to solve digital marketing problems. However, there exist other methods to solve these K-Armed bandits such as epsilon greedy and Thompson Sampling. Hope you enjoyed this article, stay tuned for more!