austere wrote:Or perhaps it is: what if there is something about the perfect player which causes him to choose the correct decision every time. Some bias in his decision making process that, in the example of Super Mario Bros 2, would cause him to avoid the poison mushrooms. A sort of intuition that this darker mushroom is bad for you? There's only one resolution to this: realising that every person has their own vision of the perfect player -- i.e. for lack of a better term, their own hyperplayer. From the perspective of this hyperplayer, which we all try to approach when we play and practice our favourite games, we judge whether a game is fair or not.
With the rise of machine learning, the prospect of creating this hyperplayer will soon no longer be hypothetical, it will require massive amount of computational power, an online-reinforcement learning algorithm and careful selection of a certain criteria I wish to ask all of you. Note that this will be specific to auto scrolling STGs and not action games in general which I haven't thought much about. I would also like to see what you guys think of the idea in general, and whether you can think of any holes in it.austere wrote:So, to summarise, the decision of labelling a game fair is not universal. The best we can do is to argue whether our vision of a hyperplayer would be able to complete the game on their first try.
First some definitions that are restricted to this context to make what I'm requesting perfectly clear:
Completion: Playing an STG until you reach the final-most stage using only a single credit (2-ALL or 1-CC).
Perfect play: To complete an STG (2-ALL or 1-CC) without dying once (no miss).
Training game set: A carefully selected set of autoscrolling STGs.
Reinforcement learning: A machine learning algorithm that actively learns from previous attempts at a game until it maximizes some criterion. The network must have no pre-existing knowledge of the game and will only have the video and audio signals of the game as direct input and the game progress as a score. The game score will be ignored.
Hyperplayer: A machine that has been trained using a reinforcement learning algorithm on every entry in the training game set until it can achieve perfect play on a sufficiently large set of random number generator states (where applicable).
Online reinforcement learning: A machine learning algorithm that takes an already trained network (or any other machine learning topology) and actively learns from the new stimuli (i.e. the new game it's playing) while still attempting to maximize the same criterion as before.
So, the idea is as follows:
Setup phase: To select a training game set, train a reinforcement learning network on every entry in the set until it achieves perfect play, thus becoming a hyperplayer.
Trial phase for a candidate game: To try an achieve completion with the network of the hyperplayer but in an online reinforcement learning network. The new network formed by the online reinforcement learning network is discarded after each play, thus no knowledge of a candidate game under trial is retained. Multiple plays can be attempted to obtain a large enough distribution given random number generation, but with discarding there will be no bias build up of the game.
The degree of fairness of a game will be defined by the rate of survival:
* If completion was achieved, the number of lives remaining (I'm only considering plays for survival).
* If the hyperplayer fails to attain completion, the percentage of the whole game they went through (might have trouble with multiple endings, 2nd loop triggers though, this could use work).
The reason why this definition is selected is because when you first play a game, you will base the elements of surprise on your ability to learn the dynamics of the game as you play with some "assumed knowledge". E.g. It's assumed you know your ship will be shot down by a bullet if it hits it (this is learnt by the machine during the training phase, for example). But as you play a game multiple times, what might have been considered unfair (instant laser kills for example), would be naturalized as "assumed knowledge". The latter is what I wish to remove from this hyperplayer.
Given this setup, which games would you select that perfectly represent the "fundamental" STGs that can be used to determine the degree of fairness of the game. These are sets of games that are sufficiently difficult that it can train the hyperplayer while being fair enough that it doesn't bias them to certain surprises. This method determine difficult of course (that could be an interesting discussion though), since we are only considering the very first play where you would encounter the most surprises.