A metric for fairness in STGs

A place where you can chat about anything that isn't to do with games!
Post Reply
User avatar
austere
Posts: 680
Joined: Mon Mar 22, 2010 10:50 am
Location: USA

A metric for fairness in STGs

Post by austere »

Eight years ago (:shock:) I asked you gentlemen the question: "How would you define fairness in action games". In that post, I proposed the following loose definitions:
austere wrote:Or perhaps it is: what if there is something about the perfect player which causes him to choose the correct decision every time. Some bias in his decision making process that, in the example of Super Mario Bros 2, would cause him to avoid the poison mushrooms. A sort of intuition that this darker mushroom is bad for you? There's only one resolution to this: realising that every person has their own vision of the perfect player -- i.e. for lack of a better term, their own hyperplayer. From the perspective of this hyperplayer, which we all try to approach when we play and practice our favourite games, we judge whether a game is fair or not.
austere wrote:So, to summarise, the decision of labelling a game fair is not universal. The best we can do is to argue whether our vision of a hyperplayer would be able to complete the game on their first try.
With the rise of machine learning, the prospect of creating this hyperplayer will soon no longer be hypothetical, it will require massive amount of computational power, an online-reinforcement learning algorithm and careful selection of a certain criteria I wish to ask all of you. Note that this will be specific to auto scrolling STGs and not action games in general which I haven't thought much about. I would also like to see what you guys think of the idea in general, and whether you can think of any holes in it.

First some definitions that are restricted to this context to make what I'm requesting perfectly clear:

Completion: Playing an STG until you reach the final-most stage using only a single credit (2-ALL or 1-CC).
Perfect play: To complete an STG (2-ALL or 1-CC) without dying once (no miss).
Training game set: A carefully selected set of autoscrolling STGs.
Reinforcement learning: A machine learning algorithm that actively learns from previous attempts at a game until it maximizes some criterion. The network must have no pre-existing knowledge of the game and will only have the video and audio signals of the game as direct input and the game progress as a score. The game score will be ignored.
Hyperplayer: A machine that has been trained using a reinforcement learning algorithm on every entry in the training game set until it can achieve perfect play on a sufficiently large set of random number generator states (where applicable).
Online reinforcement learning: A machine learning algorithm that takes an already trained network (or any other machine learning topology) and actively learns from the new stimuli (i.e. the new game it's playing) while still attempting to maximize the same criterion as before.

So, the idea is as follows:

Setup phase: To select a training game set, train a reinforcement learning network on every entry in the set until it achieves perfect play, thus becoming a hyperplayer.
Trial phase for a candidate game: To try an achieve completion with the network of the hyperplayer but in an online reinforcement learning network. The new network formed by the online reinforcement learning network is discarded after each play, thus no knowledge of a candidate game under trial is retained. Multiple plays can be attempted to obtain a large enough distribution given random number generation, but with discarding there will be no bias build up of the game.

The degree of fairness of a game will be defined by the rate of survival:

* If completion was achieved, the number of lives remaining (I'm only considering plays for survival).
* If the hyperplayer fails to attain completion, the percentage of the whole game they went through (might have trouble with multiple endings, 2nd loop triggers though, this could use work).

The reason why this definition is selected is because when you first play a game, you will base the elements of surprise on your ability to learn the dynamics of the game as you play with some "assumed knowledge". E.g. It's assumed you know your ship will be shot down by a bullet if it hits it (this is learnt by the machine during the training phase, for example). But as you play a game multiple times, what might have been considered unfair (instant laser kills for example), would be naturalized as "assumed knowledge". The latter is what I wish to remove from this hyperplayer.

Given this setup, which games would you select that perfectly represent the "fundamental" STGs that can be used to determine the degree of fairness of the game. These are sets of games that are sufficiently difficult that it can train the hyperplayer while being fair enough that it doesn't bias them to certain surprises. This method determine difficult of course (that could be an interesting discussion though), since we are only considering the very first play where you would encounter the most surprises.
<RegalSin> It does not matter, which programming language you use, you will be up your neck in math.
User avatar
BareKnuckleRoo
Posts: 6165
Joined: Mon Oct 03, 2011 4:01 am
Location: Southern Ontario

Re: A metric for fairness in STGs

Post by BareKnuckleRoo »

Giga Wing is a fundamentally "fair" shmup. Tiny hitbox size, the reflect mechanic is always available for survival purposes if you don't care to score, the good endings are only dependent on survival, and there's a couple pretty potent shot types. Shutock may be trickier for AI to learn if it starts trying to reposition his gunpods which is rarely practical. Giga Wing is also difficult though as it heavily punishes mistakes. There are no extra lives (unless you are doing really badly in stage 4), and you only get 2 bombs per life, though there is one bomb per stage.

Raiga Strato Fighter has a punishing death system - it is not a checkpoint game, but it may as well have one since later on it is not unusual for one death to turn into 5. You get a ton of lives, but you are so badly depowered on a death that you have to essentially learn to play it deathless. It also has a couple item carriers with randomized powerups (including random 1ups which are essential to scoring as far as I know), so it is not a game I would consider particularly fair. First loop is manageable, the second loop is quite a bit trickier (2nd loop always triggers on a 1cc).
Randorama
Posts: 3503
Joined: Tue Jan 25, 2005 10:25 pm

Re: A metric for fairness in STGs

Post by Randorama »

Sorry to post two questions before proposing a set of games, but I am curious to know:

1. How many papers do you plan to prepare from this work? (aka the "show me the money!" question :lol: )
2. Would "fairness" include an historical dimension?

I believe that you may find out that older games were less fair because they featured design choices that the programmers could not avoid due to inexperience, as they lacked a full grasp of what could count as "unfair" (e.g. the instant laser type you mention).

...maybe 2 could be the starting point for the follow-up project(s) (and that's a +1 for papers, ahem).

Two questions that could be addressed are:

1. Does the number and (formal) type of "unfair" factors decrease over the evolution of the genre?

2. Are there game-external factors that play a role in the eventual change in (un)fairness value?

The second question may not look like a question that an AI paper should address, but it might help as an "idea in the background to test".

Said this:

I would suggest going with something like "10% of the published titles in a year, per genre", so that you may simply pick games from a known database (say, MAME), and try to avoid a skewing effect.

STG releases rarefied over the years and latter titles may have been created by programmers with decades of experience for gamers with decades of experience.

Instant lasers would probably have spelt commercial doom for a 2012 title, so the chances (and reasons) of having a "fairer" title in 2012 might be higher than those for a 1982 title (e.g. Defender vs. Saidajou).

My initial sample set would be:
STG:={Defender,Time Pilot,Terra Cresta,Darius,Thunder Cade,Twin Eagle,Area 88,Tatsujin,Batsugun,Don Pachi,Battle Garegga,...}

The list does not follow the suggestions I make above, but I am wondering what results these titles might give you, to be honest.
Chomsky, Buckminster Fuller, Yunus and Glass would have played Battle Garegga, for sure.
User avatar
austere
Posts: 680
Joined: Mon Mar 22, 2010 10:50 am
Location: USA

Re: A metric for fairness in STGs

Post by austere »

BareKnuckleRoo wrote:Giga Wing is a fundamentally "fair" shmup
Yep, that one is a classic, though I haven't gone past the 5th stage to judge. Definitely one I'd want the AI to learn given its varied mechanics and effectively unlimited bombs.
BareKnuckleRoo wrote:Raiga Strato Fighter
This is a game I hadn't played before but it might be a good idea to slip in a few Gradius style games. It's interesting how you can change your direction here... might make it difficult to get the ML algorithm to converge though.
Randorama wrote:1. How many papers do you plan to prepare from this work? (aka the "show me the money!" question :lol: )
Now that I'm out of academia, that kind of decision is no longer up to me. :) But that does mean that I have the resources to make it happen if I can pitch the idea to my superiors and a paper is bound to happen if it is a success.
Randorama wrote:2. Would "fairness" include an historical dimension?
This is the kind of discussion I wanted to open... like, what would have been considered fair in the early 90s with sniper tanks could potentially be unfair today? My STG knowledge is limited to a set of games I enjoy and play a lot, but people on here have a vast experience in the genre.
Randorama wrote:I believe that you may find out that older games were less fair because they featured design choices that the programmers could not avoid due to inexperience, as they lacked a full grasp of what could count as "unfair" (e.g. the instant laser type you mention).
Precisely those kind of games we want to avoid in the set. I think the learning algorithm would overfit those scenarios anyway and wouldn't fair better in more modern unfair circumstances.
Randorama wrote:1. Does the number and (formal) type of "unfair" factors decrease over the evolution of the genre?
Would be a very interesting question, I think it would definitely decrease, this is part of the motivation behind thinking about such a metric and a way to measure it.
Randorama wrote:2. Are there game-external factors that play a role in the eventual change in (un)fairness value?

The second question may not look like a question that an AI paper should address, but it might help as an "idea in the background to test".
This is actually a very interesting question as you put it, games will often be the most fair if they are a reflection of reality. For example, we all recognize a tank turret as the origin of any projectile it may launch. A laser charging or making some sound prior to firing (or some translucent hue where it's about to attack) is recognizable given the context of the world around us. But, as awesome of a game as it is, random biological parts on insects in Mushihimesama do not give you any hint about where fire will come from. Nor would ESP attacks in ESPrade (or the attacks from hibachi in DDP).

But will some knowledge in the real world feed back into the prior knowledge a player would assume over time? This is difficult to measure but is actually cruical to the question. The attempt here was to bypass it by asking the most experienced people in STGs to provide the "eigenvector" games would be measured against, assuming they contain the human knowledge necessary to clear them.
Randorama wrote:STG:={Defender,Time Pilot,Terra Cresta,Darius,Thunder Cade,Twin Eagle,Area 88,Tatsujin,Batsugun,Don Pachi,Battle Garegga,...}
I would avoid the first two to restrict it to autoscrollers but this is actually a great list and the last four entries were on my mind but it's probably biased by my own taste in STGs. With an exception... I feel that Don Pachi is actually unfair. In the 4th stage, enemies approach you quickly from behind and can destroy you without time to react or get out of the way. I guess this could teach the AI to avoid hugging the walls of the screen. I'll concede you probably have more experience in this though so I don't put much weight into my own unfairness label.
<RegalSin> It does not matter, which programming language you use, you will be up your neck in math.
User avatar
BareKnuckleRoo
Posts: 6165
Joined: Mon Oct 03, 2011 4:01 am
Location: Southern Ontario

Re: A metric for fairness in STGs

Post by BareKnuckleRoo »

austere wrote:Yep, [Giga Wing] is a classic, though I haven't gone past the 5th stage to judge. Definitely one I'd want the AI to learn given its varied mechanics and effectively unlimited bombs.
Stages 6 and 7 are just abnormally long length boss fights against multi-phase bosses. Shouldn't pose any challenge for machine learning AI as there is no randomness or anything particular "weird" in terms of how the game works (stage 5 is arguably the nastiest of the stages in that there are some unmanageable enemies that require well executed reflects, or else you will get trapped and have to bomb).
austere wrote:Raiga Strato Fighter -
This is a game I hadn't played before but it might be a good idea to slip in a few Gradius style games. It's interesting how you can change your direction here... might make it difficult to get the ML algorithm to converge though.
Yeah, this and Deathsmiles both would prove troublesome. I was thinking that due to the two way shots and brutal depowering on death, this would be a taxing shmup for an AI to learn. That or throw a Parodius game at it.
austere wrote:I feel that Don Pachi is actually unfair. In the 4th stage, enemies approach you quickly from behind and can destroy you without time to react or get out of the way. I guess this could teach the AI to avoid hugging the walls of the screen.
DonPachi has even nastier instances of this in its last level. They're all learnable and the enemies come in slow enough that it's not what I'd call unfair, but you definitely will die if you're hugging the screen there. It's not something CAVE or many other competent shmup makers did though, and DonPachi was early in their career.

They also did this in ProGear, but literally at only one spot in the game for a single enemy wave (Stage 5, when you descend vertically and transition again to horizontal movement) so it's quite easy to remember and expect it, and is not particularly challenging from an execution standpoint to avoid when you know it's coming.
User avatar
austere
Posts: 680
Joined: Mon Mar 22, 2010 10:50 am
Location: USA

Re: A metric for fairness in STGs

Post by austere »

BareKnuckleRoo wrote:Yeah, this and Deathsmiles both would prove troublesome. I was thinking that due to the two way shots and brutal depowering on death, this would be a taxing shmup for an AI to learn. That or throw a Parodius game at it.
I guess in theory you may want to provide certain bits of information ahead of time, such as whether a game allows you to change directions or whether it's a medal/rank based game. But this is what would degrade the definition. I guess this kind of ties into the second point that Randorama made. There's nothing to stop the reinforcement learning algorithm from learning these kind of games (since in the perspective of the algorithm it's all one game with different starting conditions). What would break things is how quickly the online reinforcement algorithm can detect the game mode. Anyway the concern about medal/rank is more of a difficulty subject and not really fairness.
<RegalSin> It does not matter, which programming language you use, you will be up your neck in math.
User avatar
Despatche
Posts: 4196
Joined: Thu Dec 02, 2010 11:05 pm

Re: A metric for fairness in STGs

Post by Despatche »

I'm concerned that the results you want will not be there, that what you get will not be particularly interesting, and that the effort is just going to be "proof" for whiners that they can keep whining about their bullshit definition of "fairness".

The only thing that defines "fairness" in any game is:

1. whether or not the player can complete a game without having to waste a resource in games that are specifically designed around that

2. whether or not the player can complete a game using all the resources available to them in games that are specifically designed around that

This is for survival. Scoring would put additional scrutiny on point 1, and change point 2 to "being able to use all given resources effectively". This is why Battle Garegga is so fucking good, it was also really hard to design. This is also related to why Ikaruga is such a better game than Radiant Silvergun.

There are indeed levels to this; ridiculous difficulty curves are obviously a problem... but they need to be carefully understood, as "difficulty curve" has a different definition to different people because everyone seems to find some things harder and some things either than others would. Being "technically possible" is obviously not that interesting, as it takes a very specific kind of person that you can't just weed out of 6+ billion people so easily; this is what the tool-assisted speedrun is for. Likewise, something like Raiga, Xexex, Daioh, etc might be questionable at a high level due to their random extends, but very few people will get to the point where this actually changes how they play the game.

Another thing TASes do for us: as a general rule, whining about "not seeing that coming" is... whining. There's no point in debating over whether a human could beat a given game on their first try, because noone really cares whether a human "could possibly" do something until they actually can do it, making so-called "human theory TASes" worthless. The full TAS is much more important because the additional work required either reveals breakthroughs that a human theory TAS would not (because the human theory TAS is concerned about the "current state" of the metagame), or can be improved on by a better full TAS that may reveal its own breakthroughs.

Just like how the human theory TAS is unnecessarily limiting, obsessing over whether a player could theoretically complete a game on their First Try Ever (which way too many people obsess over for some reason) is unnecessarily limiting.

There's probably also a point to be made about humans learning better from failures.

Not entirely sure why this is in OT. I guess it applies to other arcade genres, but the thread doesn't cover those yet.
Rage Pro, Rage Fury, Rage MAXX!
Randorama
Posts: 3503
Joined: Tue Jan 25, 2005 10:25 pm

Re: A metric for fairness in STGs

Post by Randorama »

Late reply, and I need to cherry-pick...
austere wrote: Now that I'm out of academia, that kind of decision is no longer up to me. :) But that does mean that I have the resources to make it happen if I can pitch the idea to my superiors and a paper is bound to happen if it is a success.
If your superiors are a certain type of corporate venture (or even government, but not academia), a paper can have a fairly big role for your eventual evaluation :wink:
This is the kind of discussion I wanted to open... like, what would have been considered fair in the early 90s with sniper tanks could potentially be unfair today? My STG knowledge is limited to a set of games I enjoy and play a lot, but people on here have a vast experience in the genre.
I think that this would potentially be a follow-up, or a second/third experiment in a series. In a sense, you would reverse-engineer "history": some features may suddenly disappear from the genre because at some point players triggered a backlash against programmers (think of enemies from behind/bottom of the screen).



This is actually a very interesting question as you put it, games will often be the most fair if they are a reflection of reality. For example, we all recognize a tank turret as the origin of any projectile it may launch. A laser charging or making some sound prior to firing (or some translucent hue where it's about to attack) is recognizable given the context of the world around us. But, as awesome of a game as it is, random biological parts on insects in Mushihimesama do not give you any hint about where fire will come from. Nor would ESP attacks in ESPrade (or the attacks from hibachi in DDP).

But will some knowledge in the real world feed back into the prior knowledge a player would assume over time? This is difficult to measure but is actually cruical to the question. The attempt here was to bypass it by asking the most experienced people in STGs to provide the "eigenvector" games would be measured against, assuming they contain the human knowledge necessary to clear them.
This is tricky, so it could be the 10th experiment down the line. Real world knowledge is a rather vague knowledge, but you may focus on something simpler such as "does the game give cues (visual, aural) on insta-lasers and other potential unfair threats?"
Despatche wrote:The only thing that defines "fairness" in any game is:

1. whether or not the player can complete a game without having to waste a resource in games that are specifically designed around that

2. whether or not the player can complete a game using all the resources available to them in games that are specifically designed around that
Just to be sure that I get your definitions right, Despatche:

1. For 1, we can say that Garegga is fair. You can complete the game without losing a life, even if the game is designed around throwing lives away (well, ok, good luck with trying!).

2. For 2, we can say that Giga Wing is fair. You can complete the game by using the shield, bombs and lives (rank goes down), which are the three key resources at your disposal.

I think that more specific cases would emerge in context, e.g. : game has insta-lasers of death? Use shield resource which saves your butt as long as you notice laser 1 millisecond before (and so on).

Re: learning from failure. Yes, I agree, but it is a fairly complex topic. I think that the more we ruminate about it, the better our ideas will become.

My two cents to get the ball rolling: I guess that programmers could use this aspect to design sequels, and have some carry-over effect.
Say, learning to dodge dense clusters of bullets in Don Pachi made players ready for Do DonPachi, so players felt that the sudden increase in bullets (...I guess) was not so unfair. How to quantify this effect?
Chomsky, Buckminster Fuller, Yunus and Glass would have played Battle Garegga, for sure.
PC Engine Fan X!
Posts: 8433
Joined: Wed Jan 26, 2005 10:32 pm

Re: A metric for fairness in STGs

Post by PC Engine Fan X! »

Despatche wrote:Just like how the human theory TAS is unnecessarily limiting, obsessing over whether a player could theoretically complete a game on their First Try Ever (which way too many people obsess over for some reason) is unnecessarily limiting.

There's probably also a point to be made about humans learning better from failures.

Not entirely sure why this is in OT. I guess it applies to other arcade genres, but the thread doesn't cover those yet.
I played a 1CC session on a Soul Calibur 2 arcade cab despite not having played it before with the character Ivy and was able to enter my high score initials at the very end for posterity. So yes, the theory of doing such a 1CC within an arcade based genre is doable/possible these days (whether it's STG or Fighting genre or any other arcade game genre for that matter). Or it could be down to "good ol' fashioned luck" in beating every single AI opponent thrown at me within that single session and not knowing the proper fighting mechanics/movements/combos beforehand (considering that SC2 is a complicated 3D based fighting game to begin with), right? Also, the initial arcade operator's setting for that particular SC2 cab, I do not know what settings were used -- if they were on default or using a "custom" difficulty setting to begin with (so there's that part of the equation that is unknown beforehand to take into consideration as well). What was surprising about this arcade SC2 1CC is, that I'm just a casual fighting genre player myself and am not a hard-core one at that particular arcade game genre. You could say that it was just a random fluke or a stroke of luck that single arcade SC2 1CC stint occurred. I haven't played another arcade SC2 session since then.

PC Engine Fan X! ^_~
User avatar
austere
Posts: 680
Joined: Mon Mar 22, 2010 10:50 am
Location: USA

Re: A metric for fairness in STGs

Post by austere »

Despatche wrote:The only thing that defines "fairness" in any game is:

1. whether or not the player can complete a game without having to waste a resource in games that are specifically designed around that

2. whether or not the player can complete a game using all the resources available to them in games that are specifically designed around that
Here's the thing, while this is the obvious absolute binary choice, you could come up with a game that completely adheres to either one of your conditions but requires priori knowledge for first completion. For example, you're given a single choice and I ask you to pick a number from 1-50,000 which has to be the one I know is the correct number. At the end I'll tell you that number and compare it to yours. I always choose the same number so, if you if you lose the first time you could always win the follow on time. This is exactly the same thing with shooters that require priori knowledge to complete. But then, you needed priori knowledge to complete the number picking games, i.e. you need to understand English. And that's precisely what I want to remove from the equation by picking some "eigenshooters".
Randorama wrote:If your superiors are a certain type of corporate venture (or even government, but not academia), a paper can have a fairly big role for your eventual evaluation :wink:
Oh I know :) That's not the big motivation here though, I'm thinking of something much bigger. Think about how you could put such an AI in an even bigger outer loop to start creating things, not just evaluating them. Measure twice, cut once as they say.
Randorama wrote:I think that this would potentially be a follow-up, or a second/third experiment in a series. In a sense, you would reverse-engineer "history": some features may suddenly disappear from the genre because at some point players triggered a backlash against programmers (think of enemies from behind/bottom of the screen).
It's precisely also this kind of data that I would be fascinated in seeing. Getting tools like this into data scientist's hands would be fascinating. Imagine the kind of clustering you could see in games' mechanics as measured by a virtual player.
Randorama wrote:This is tricky, so it could be the 10th experiment down the line. Real world knowledge is a rather vague knowledge, but you may focus on something simpler such as "does the game give cues (visual, aural) on insta-lasers and other potential unfair threats?"
Yep, this would be much further down the line, I'm pretty much starting out with "can this be finished on the first shot, if not how far can you go?" and building up from there.
Randorama wrote:I think that more specific cases would emerge in context, e.g. : game has insta-lasers of death? Use shield resource which saves your butt as long as you notice laser 1 millisecond before (and so on).
Much more straight to the point than my example but yes!
Randorama wrote:Say, learning to dodge dense clusters of bullets in Don Pachi made players ready for Do DonPachi, so players felt that the sudden increase in bullets (...I guess) was not so unfair. How to quantify this effect?
More potential uses like this would be very interesting to hear. Maybe we can coauthor something about this if I get this off the ground.
PC Engine Fan X! wrote:Also, the initial arcade operator's setting for that particular SC2 cab, I do not know what settings were used -- if they were on default or using a "custom" difficulty setting to begin with (so there's that part of the equation that is unknown beforehand to take into consideration as well).
I'll only be looking at default settings to be honest, since realistically this is how most people would play games. I usually play fighting games on the hardest difficulty when playing against the computer but for STGs I only use default.
<RegalSin> It does not matter, which programming language you use, you will be up your neck in math.
User avatar
Ed Oscuro
Posts: 18654
Joined: Thu Dec 08, 2005 4:13 pm
Location: uoıʇɐɹnƃıɟuoɔ ɯǝʇsʎs

Re: A metric for fairness in STGs

Post by Ed Oscuro »

Very interesting material all across the board, thanks everyone! :mrgreen:

@Despatche: We need to manage our expectations about the project. Ideally the process is iterative; it is not reasonable to ask a machine learning project to solve what are essentially political issues, especially not on the first try. It should only succeed in its designed goals, so we start with clear definitions and a path towards later expansion. Pages and pages of text have been wasted by opinion writers and philosophers, working without much useful data, on the question "What should the self-driving car hit, if it has to choose?" If you do not have knowledge about how a system will perform, the discussions will veer beyond setting guidelines. Writers end up uselessly reiterating their conventional biases or assumptions, or they provoke political fights when their biases turn out not to be universally held. Of course these discussions can be useful - the system has to be designed according to human wishes - but suggestions have to remain plausible in order to advance the discussion. When the system is producing, we not only see how well it works, but also gain clarity into our own biases and ignorance. On the practical level, I think it is quite interesting to see what kinds of situations are engaging enough to be called a "game" that requires executing complicated patterns to solve, while also remaining "fair." And in general, though apparently out of the scope of austere's proposal, AI was traditionally marketed as finding ways for computer programs to attain general knowledge, which seems to indicate self-directed selection of winning criteria.

@austere: I hope this is not too long and repetitive...Despite what I write above, I think that marketing this as a test of "fairness" is misleading because a definition of "fairness" encompasses qualities that I do not see a simple system having a direct enough analogue for, not without many factors modeled (mostly human factors). Some are already mentioned:
- Human vision and muscle control / fatigue limits, and limits of the physical interface. (Which is a 'fairer' competition: A quiz about ancient Chinese history, or a feat of strength? Depends on the contestant. Likewise, a computer can easily control values and successfully stick-waggle for an arbitrary amount of time, but without training it will easily miss cues obvious to the human child.)
- A priori knowledge / genre history (i.e., why did Space Panic fail? Was it 'just a bad game,' or was Universal creating too demanding a game for the player skillbase in 1980?)
- The computer's tendency to test cases normally avoided or not discernable to the human player (i.e., psuedorandom events a human player cannot visualize or manipulate, and also situations a human player will fear or know to avoid, but a virtual player will not)
- An intelligent player's ability to exercise free choice in engaging with, or avoiding, certain conditions

I do not think a priori knowledge is, in of itself, a good enough definition of something left out of the definition of a 'fair' game, given that at all stages - including the win condition, and tactics within the game - there is a priori knowledge at work, promoting certain player choices. It seems to me it may be an unnecessary complication to try and denounce certain common patterns, because our perception of a "fair" death laser is based on subtle points of implementation. A virtual player may also be trained to spot and deal with such patterns, even if implementation is different from title to title, and this may foil the original aim of the program. This seems like a real problem to me, because even if one uses "general" genre knowledge from training on other games in the set, before attempting to gauge the 'fairness' of a particular game, there are opportunities for learning about certain 'unfair' events after they have been dealt with in other games. And likewise if the other games simply don't train for a type of event, that's not a sufficient condition for declaring a situation 'unfair' in my mind.

Finally, how will you implement the actual interface? The virtual game player projects that are competent enough to play games usually monitor the actual program state, so they can directly check the program state (and therefore progress towards the win condition). This often leads to glitches if the game data is structured in a way such that game progress is not completely represented by the most obvious variable (for example, if a variable that is used as a proxy for progress is reset mid-stage, as I recall being the case during Super Mario Bros, a problem for one virtual game player project on YouTube). It also allows the virtual game player to directly observe and manipulate internal game values. Even absent this ability, a virtual game player can easily track many more variables than a human player could, and possibly could achieve the same effect.

Keeping your proposed definitions, how will you detect the win condition absent the machine information, or special case / ad hoc definitions of the win state (which also seem to constitute a priori knowledge)? Looping requirements / multiple endings are like genre variants or games outside the intended selection; you can ignore them by definition, or you can admit them but they unavoidably invalidate the definition of progress as percentage completion, since such a definition does not apply. They also seem to bake-in some prior knowledge.
User avatar
colour_thief
Posts: 375
Joined: Mon Apr 30, 2007 12:41 am
Location: Waterloo, Ontario

Re: A metric for fairness in STGs

Post by colour_thief »

Is this thread just about ideas or is work being done? I didn't think eg. DQN was up to this sort of task but would love to be proven wrong.
User avatar
Despatche
Posts: 4196
Joined: Thu Dec 02, 2010 11:05 pm

Re: A metric for fairness in STGs

Post by Despatche »

Ed Oscuro wrote:@Despatche: We need to manage our expectations about the project. Ideally the process is iterative; it is not reasonable to ask a machine learning project to solve what are essentially political issues, especially not on the first try. It should only succeed in its designed goals, so we start with clear definitions and a path towards later expansion. Pages and pages of text have been wasted by opinion writers and philosophers, working without much useful data, on the question "What should the self-driving car hit, if it has to choose?" If you do not have knowledge about how a system will perform, the discussions will veer beyond setting guidelines. Writers end up uselessly reiterating their conventional biases or assumptions, or they provoke political fights when their biases turn out not to be universally held. Of course these discussions can be useful - the system has to be designed according to human wishes - but suggestions have to remain plausible in order to advance the discussion. When the system is producing, we not only see how well it works, but also gain clarity into our own biases and ignorance. On the practical level, I think it is quite interesting to see what kinds of situations are engaging enough to be called a "game" that requires executing complicated patterns to solve, while also remaining "fair." And in general, though apparently out of the scope of austere's proposal, AI was traditionally marketed as finding ways for computer programs to attain general knowledge, which seems to indicate self-directed selection of winning criteria.
My concern is that this project will only ever do more harm than good, no matter what form it could possibly take, and that this isn't exactly the same thing as general AI research and development...
Ed Oscuro wrote:I do not think a priori knowledge is, in of itself, a good enough definition of something left out of the definition of a 'fair' game, given that at all stages - including the win condition, and tactics within the game - there is a priori knowledge at work, promoting certain player choices. It seems to me it may be an unnecessary complication to try and denounce certain common patterns, because our perception of a "fair" death laser is based on subtle points of implementation. A virtual player may also be trained to spot and deal with such patterns, even if implementation is different from title to title, and this may foil the original aim of the program. This seems like a real problem to me, because even if one uses "general" genre knowledge from training on other games in the set, before attempting to gauge the 'fairness' of a particular game, there are opportunities for learning about certain 'unfair' events after they have been dealt with in other games. And likewise if the other games simply don't train for a type of event, that's not a sufficient condition for declaring a situation 'unfair' in my mind.
...and this is why. This is correct thinking, but most people simply refuse to think like this, and some openly admit that they actively oppose such thinking. That's a dangerous environment for a project like this to be in. Gaming is a lot more political than general knowledge gathering, because of things like nostalgia and things like humans hating being told that they are wrong or that they failed something.

It's an environment that makes you want to stop thinking and stop caring about anything, because as time goes on there seem to be less and less people who really care about caring. It will very much expand beyond the bounds of the gaming field, and sooner rather than later.
Rage Pro, Rage Fury, Rage MAXX!
User avatar
Ed Oscuro
Posts: 18654
Joined: Thu Dec 08, 2005 4:13 pm
Location: uoıʇɐɹnƃıɟuoɔ ɯǝʇsʎs

Re: A metric for fairness in STGs

Post by Ed Oscuro »

I knew I was walking right into that. Well, it should be said that I would never want a context-free AI being used as a yardstick of what is supposed to be acceptable in design for human players.

But that is why I mentioned managing expectations and being careful about branding. Perhaps it's too audacious to claim it is a "fairness" metric to please all comers, but I would still give austere a chance to try to make this work. I think there will still be lessons to learn, and they can't necessarily be predicted from the outset.
Randorama
Posts: 3503
Joined: Tue Jan 25, 2005 10:25 pm

Re: A metric for fairness in STGs

Post by Randorama »

Time is a tyrant, and so are notifications.

Very Sorry for being late...
I hope you won't mind me cherry-picking the points to bring up again, to get a quick answer down.
Austere wrote: Oh I know :) That's not the big motivation here though, I'm thinking of something much bigger. Think about how you could put such an AI in an even bigger outer loop to start creating things, not just evaluating them. Measure twice, cut once as they say.
That's the next 10 years of project, I guess.
It's precisely also this kind of data that I would be fascinated in seeing. Getting tools like this into data scientist's hands would be fascinating. Imagine the kind of clustering you could see in games' mechanics as measured by a virtual player.
I am also wondering what kind of data you would get. I admit that I am struggling to predict anything, at least on guts intuitions.
More potential uses like this would be very interesting to hear. Maybe we can coauthor something about this if I get this off the ground.
Yes, we could. No hurry for me, But let us say that whenever you wish to talk more about this topic and there is a way to become productive as a team, I am all ears.
Chomsky, Buckminster Fuller, Yunus and Glass would have played Battle Garegga, for sure.
Post Reply