Take a (relatively) expensive function like your basic trigonometric co/sine - a critical tool for the geometric ideas inherent to many games.
Simple as it may seem to an end-user, trig functions are surprisingly complex to compute on a machine; the most intuitive versions require already having ready access to the other trig functions (cos, tan, etc.), and otherwise necessitate iterative approximation using factorials and Taylor Series Expansion. But that's no good! We're trying to implement sine from scratch, after all, and iterative approximations over already-expensive compositions of arithmetic are going to be even worse for performance!
So how to get around the issue? Well, we can start by observing that sine is a pure function over the domain of real (decimal) numbers. It has the signature
X ⊳ X, where
X is some type of number, terms to the left of the
⊳ are inputs, and ones to the right are outputs. Which is to say, it takes a given type of number, and hands back that same type of number. And being 'pure', it always returns the same output for a given input.
In a logical sense, we don't really care what happens inside our sine function, so long as it takes some
X and produces corresponding correct output, and is
fast - that's the beauty of abstraction, after all.
So, we're free to change its implementation arbitrarily, so long as we promise not to return incorrect output, but how do we make it faster? We start by defining how long our naive calculation will take, which is (very roughly)
O(number-of-algebraic-ops). That's 'Big O' notation, which measures a function's time-to-run relative to its inputs. In this case, we spend some time for each add, subtract, multiply, and divide, so it's a fixed amount that depends on how fast those ops are on our hardware.
And that's where arrays come in: computationally, array lookup happens in
O(1) time. The input factors here are the size of array and the offset we want to look up, but neither of them are mentioned in the notation, because array lookup always takes fixed time regardless of either. Hence, they're notated with a constant 1 'time', which is what makes them so popular for optimizing storage of many values.
But, how do we leverage an array to drive a sine function? It doesn't have any intrinsic properties that'll help us do trigonometry or approximation, and needs a known fixed size in order to be useful. Contrast that to our naive sine, which - as a function over real numbers - operates over arbitrarily-granular quantities.
Well, since we have to simplify down to integers to please the machine, we naturally have a known set of finite values that such a function might take and return - 0 to 255, if we're on 8-bit hardware and not (yet) getting into the realm of emulating higher precision quantities. Since sine is a pure function - same input, same output - we might as well run 256 calculations (spanning -1.0:1.0 to cover one loop of sine's range) on our development machine, put the results into an array, and rewrite our program's sine to look up the answer directly instead of figuring it out manually.
And presto, a speedup factor from the hefty
O(number-of-algebraic-ops) to an ideal-case
O(1), with no loss in precision, for the nominal cost of ~256 bytes of program memory. Not bad!
But, what if we're severely memory constrained, and can only afford half that? We could span -1.0:1.0 over 0-127 instead, and rewrite the sine function to divide the input by 2 on each lookup; that would cost one divide per, but still be markedly cheaper than the full formal calculation.
A compiler could conceivably do this sort of optimization, but would need context from a human programmer (or perhaps machine specs and static analysis) to reason about how much time and memory are available, and how they should be prioritized.
But, to take it a step further into full intuitive reasoning, you'd need a human to step back, take a broader view, ask "if we can't afford 256 bytes, do we really need that precision in the input?", and work their way back through the program. Depending on the use-case, they could potentially clamp the input to 0-127 from the start, and stay within memory budget while also avoiding lookup overhead.
In theory a compiler could also figure out that sort of wide-scope micro-optimization, but in practice, it's a minefield if the user still has access to types which profess to directly represent machine quantities. Which isn't to say they don't try - production C-* compilers have a terrifying breadth of competing heuristics to eke out every last unsafe-but-safe op - but there's only so far you can go before the problem becomes unscalable and results in the 'best attempt' output that's so maligned by hardline assembly hand-coders.
But, I contend that there are still ways and means, given specialized-enough languages and compilers. Not from the mainstream, given the march of hardware and economies of scale, but a sufficiently-motivated enthusiast armed with bleeding-edge PL theory and a mountain of static analysis might just stand a chance at rivaling hand-tuned machine code