That metallic sound is characteristic of FM synthesis. FM is what the Mega Drive/Genesis used to generate sound. More importantly, it's what Japanese PCs of the 1980s like the Sharp X series or NEC PC 88/98 used. Because they were the most powerful tools available to Japanese programmers at the time, the X68000 and the like became the devkits for arcade development almost by default. As a consequence, you'll hear pretty much every Japanese arcade game from the second half of the 1980s using FM synthesis because of this, from Outrun to Final Fight to yes, Image Fight (and anything else on Irem's M72 board).
FM, or frequency modulation synthesis, only uses sine waves. Because the sine is a pure waveshape with no irregularities or angles, it's referred to as the fundamental wave. With sufficient changes to its frequency, it can theoretically create any other sound.
In FM, you change - or 'modulate' - the frequency of a sine using another sine. Say a sound wave is at a certain medium pitch, you can move that pitch using the
shape of another sine (not its sound) as a guide, so the pitch goes up and down, following the curve of the sine being used as a modulator. This is basically mimicking in a simple sense what happens in a guitar string vibrating or a violin's vibrato.
In practice, it looks like this:

1. You take the top sine.
2. Apply it as a guide to the middle wave, our "carrier." This is the one people can actually hear.
3. The resulting sound is the new waveform at the bottom. See how in the bottom wave the wiggles get further apart at the same spots the first wave at the top dips low? That means it's now oscillating more slowly in those parts, producing a lower pitch, thanks to the modulation.
The most basic sound you'll get using two sines is a bell tone. That's why the Genesis did those so well. Add in a little more modulation to that bell, and you can get a metallic clang. Add in some more modulation to make it oscillate faster, and you can turn that clang into a buzz. With a bit of tweaking, you can make that buzz sound like an electric guitar. Those basic building blocks form the basis of a huge amount of Genesis music. The Genesis only had 4 sines, or 'operators' per voice, so there were limits in how far it could go. But professional synths have used 7 or 8, in software versions theoretically the only limit is processing power.
This wonderful madman built a whole physical synthesizer around a Mega Drive to control each parameter using individual knobs, if you want to see how programming it plays out in practice:
https://youtu.be/V0kq0yCTpNE