Old School Programming (Arcade/Genesis/SNES/PCE)

flushdoor · Post by **flushdoor** » Sun Oct 27, 2024 1:32 pm

Old school programming always fascinates me. And I don't mean writing the same C style code but in assembly, there is a huge difference how everything is arranged. It's a breath of fresh air when compared to high level languages like C. So people who are experienced in the old school way, how do different games like Gradius/R-Type, Truxton, etc Handle stuff like:

Enemy waves/levels/playlist
Collision
Anything you find interesting

From my research, it seems these games makes heavy use of distinct arrays. For Example, they have different arrays for player bullets, enemy bullets, enemies themselves (which I think is also separated in air vs ground), which makes it simple to check for collision between each group.

Long answers, wall of text, examples are much appreciated. This style of programming is basically a lost art, people who know it, don't write about it. People who don't know it, call it outdated.

Lander · Post by **Lander** » Sun Oct 27, 2024 9:26 pm

Megaton choice for a first thread/post - good stuff, and welcome aboard

We have a few domain specialists around and about like Sumez and orange808, who'll show up before long with a bit of luck.

I'm somewhat on the opposite end, myself - hobbyist compiler engineer, and a disciple of Church (Alonzo) over Turing - but might still be able to offer a bit of theoretical insight. Firm believer that optimal algorithms are still optimal whether written by man or machine, tricky as the latter may be to get categorically right!

Bugger C though, respectable elder god as it may be - terrible example of our collective tendency to build tall on basic tools instead of walking alongside academia for a smarter solve. Seems self-defeating to build high-level languages that still require the human to care about machine ideas over strictly logical ones, especially as programs scale beyond our total-reasoning capacity.

That, and its bugs tend to be way more boring than the equivalent hand-crafted ASM hotrod

And well, I suppose that's what it comes down to. If speed is the name of the game because all you have is ~7.6MHz, then cheating as hard as you can for the same output is the only option.

Lookup tables - or LUTs - are an interesting conjunction between arrays and work smarter not harder.
Spoilered for significant length:

On Lookup Tables

Take a (relatively) expensive function like your basic trigonometric co/sine - a critical tool for the geometric ideas inherent to many games.
Simple as it may seem to an end-user, trig functions are surprisingly complex to compute on a machine; the most intuitive versions require already having ready access to the other trig functions (cos, tan, etc.), and otherwise necessitate iterative approximation using factorials and Taylor Series Expansion. But that's no good! We're trying to implement sine from scratch, after all, and iterative approximations over already-expensive compositions of arithmetic are going to be even worse for performance!

So how to get around the issue? Well, we can start by observing that sine is a pure function over the domain of real (decimal) numbers. It has the signature X ⊳ X, where X is some type of number, terms to the left of the ⊳ are inputs, and ones to the right are outputs. Which is to say, it takes a given type of number, and hands back that same type of number. And being 'pure', it always returns the same output for a given input.

In a logical sense, we don't really care what happens inside our sine function, so long as it takes some X and produces corresponding correct output, and is fast - that's the beauty of abstraction, after all.

So, we're free to change its implementation arbitrarily, so long as we promise not to return incorrect output, but how do we make it faster? We start by defining how long our naive calculation will take, which is (very roughly) O(number-of-algebraic-ops). That's 'Big O' notation, which measures a function's time-to-run relative to its inputs. In this case, we spend some time for each add, subtract, multiply, and divide, so it's a fixed amount that depends on how fast those ops are on our hardware.

And that's where arrays come in: computationally, array lookup happens in O(1) time. The input factors here are the size of array and the offset we want to look up, but neither of them are mentioned in the notation, because array lookup always takes fixed time regardless of either. Hence, they're notated with a constant 1 'time', which is what makes them so popular for optimizing storage of many values.

But, how do we leverage an array to drive a sine function? It doesn't have any intrinsic properties that'll help us do trigonometry or approximation, and needs a known fixed size in order to be useful. Contrast that to our naive sine, which - as a function over real numbers - operates over arbitrarily-granular quantities.

Well, since we have to simplify down to integers to please the machine, we naturally have a known set of finite values that such a function might take and return - 0 to 255, if we're on 8-bit hardware and not (yet) getting into the realm of emulating higher precision quantities. Since sine is a pure function - same input, same output - we might as well run 256 calculations (spanning -1.0:1.0 to cover one loop of sine's range) on our development machine, put the results into an array, and rewrite our program's sine to look up the answer directly instead of figuring it out manually.

And presto, a speedup factor from the hefty O(number-of-algebraic-ops) to an ideal-case O(1), with no loss in precision, for the nominal cost of ~256 bytes of program memory. Not bad!

But, what if we're severely memory constrained, and can only afford half that? We could span -1.0:1.0 over 0-127 instead, and rewrite the sine function to divide the input by 2 on each lookup; that would cost one divide per, but still be markedly cheaper than the full formal calculation.
A compiler could conceivably do this sort of optimization, but would need context from a human programmer (or perhaps machine specs and static analysis) to reason about how much time and memory are available, and how they should be prioritized.

But, to take it a step further into full intuitive reasoning, you'd need a human to step back, take a broader view, ask "if we can't afford 256 bytes, do we really need that precision in the input?", and work their way back through the program. Depending on the use-case, they could potentially clamp the input to 0-127 from the start, and stay within memory budget while also avoiding lookup overhead.
In theory a compiler could also figure out that sort of wide-scope micro-optimization, but in practice, it's a minefield if the user still has access to types which profess to directly represent machine quantities. Which isn't to say they don't try - production C-* compilers have a terrifying breadth of competing heuristics to eke out every last unsafe-but-safe op - but there's only so far you can go before the problem becomes unscalable and results in the 'best attempt' output that's so maligned by hardline assembly hand-coders.

But, I contend that there are still ways and means, given specialized-enough languages and compilers. Not from the mainstream, given the march of hardware and economies of scale, but a sufficiently-motivated enthusiast armed with bleeding-edge PL theory and a mountain of static analysis might just stand a chance at rivaling hand-tuned machine code

shmups.system11.org

Old School Programming (Arcade/Genesis/SNES/PCE)

Old School Programming (Arcade/Genesis/SNES/PCE)

MOV 707 RAX; HCF;