Hey guys, it's been a while.
I have been following the developments regarding a more accurate blitter delay calculation for the Cave sh1000 emulation, to allow for accurate slowdown. And it seems the current consensus is that while buffi's calculation are/were a nice step in the right direction they had some issues by usually not accounting for enough delay and in turn resulting in not enough slowdown, sometimes by quite a margin on some specific games/stages (Futari for example).
So in an attempt to make improving upon the current results a possibility I modified the current FBNeo to add a set of special DIP settings to tweak some parameters of buffi's blitter delay calculation. Ideally a set of values can be found which would work for all cv1000 games.
quick usage guide, with a cv1000 game loaded:
For the new DIP switches
Input -> Set dipswitches
For cpu:
Game -> Ajust CPU speed... (should not be needed if blitter delay is correctly emulated)
As for the new dips, the multiplier DIPs start at 1.0x which is the default buffi calculations, values over 1.0 increase delay/slowdown:
Upload Multiplier:
Upload Constant: added cycles for when the blitter reads sprite data (unlikely to be useful)
Source Clipping Multiplier: number of pixels read from source, likely candidate if clipping is imperfect
Dest Clipping Multiplier: pixels written to destination
Source VRAM Rows Multiplier: overhead after each read from a source VRAM row, likely candidate
Dest VRAM Rows Multiplier: overhead after each write to a destination VRAM row, likely candidate
Draw constant (default: 12): additional cycles per sprite at the end of writing
And here's the download (for win64, works well via wine in linux too), includes the modified source code in the zip:
https://www.dropbox.com/scl/fi/p8g2jeu4 ... g7e57&dl=0
Let me know if you have timestamps of game/stages combinations which are currently inaccurate with buffi's blitter calculations.
Custom FBNeo version for cave sh1000/cv1k emulation
-
nimitz
- Posts: 900
- Joined: Thu Jan 10, 2008 5:05 am
- Location: Québec
-
Buffi
- Posts: 202
- Joined: Sat Jul 22, 2006 5:17 pm
- Location: Sweden
Re: Custom FBNeo version for cave sh1000/cv1k emulation
Lots of stuff going on in that message, so I will try to answer the stuff as I see it.'
1. The Blitter does not have time to fully draw a frame before the next one
2. The CPU does not have time to execute all the game logic until the next frame
The patches I made were only for the first one of these. The second one still applies, and for many games that is by far the biggest cause of slowdown. The proper solution is going to be related to tweaking the CPU delays... which I'll talk more about later in this message.
This is why people playing around with tuning to specific percentages with the old Blitter code could get things working kinda alright in some small sections or with some ships, but less accurate in others. It will be the same trying to tweak the new ones.
If someone wants to double check then, I think I have the old LA outputs somewhere, but theres some screenshots with timings in there.
It's of course possible I made some mistake there, but just randomly trying to change stuff there is unlikely to be very good.
So whats the proper fix
If someone wants to get this working, it means fixing the CPU slowdown in mame. Currently there's only a fixed slider for CPU speed, and while adjusting this is probably the best you will get until someone does the proper implementation which works differently, it does not reflect how things work in an actual SH3.
SH3 wait states are not emulated at all in MAME. On hardware, every non-cached access to RAM involves a wait state. Access to other peripherals does too, but is likely less important. Additionally the CPU can be held in wait externally from a bus request (mostly done when blitter requests data). There's probably a bunch of other timing related things for the CPU that I don't know as well, but the current state is known to not be accurate.
This is exactly what the "coffeepope fork" of mame tries (tried?) to address: https://github.com/CoffeePope/mame-sh34 but looks like development on that stopped. Fixing the CPU slowdown is a LOT of work, especially if trying to do it in a sane enough way to merge back into mame.
Just tweaking shit in the blitter code for lulz is not gonna help you.
I dunno which discussions exactly this is but let me be clear here. Only using the Blitter fixes I made for Mame in games is expected to not produce enough delays in games. The delay in CV1000 game can happen in two different ways:I have been following the developments regarding a more accurate blitter delay calculation for the Cave sh1000 emulation, to allow for accurate slowdown. And it seems the current consensus is that while buffi's calculation are/were a nice step in the right direction they had some issues by usually not accounting for enough delay and in turn resulting in not enough slowdown, sometimes by quite a margin on some specific games/stages (Futari for example).
1. The Blitter does not have time to fully draw a frame before the next one
2. The CPU does not have time to execute all the game logic until the next frame
The patches I made were only for the first one of these. The second one still applies, and for many games that is by far the biggest cause of slowdown. The proper solution is going to be related to tweaking the CPU delays... which I'll talk more about later in this message.
I don't think you will have much luck just yolo-adjusting delays in the blitter code and hoping for something thats consistently works because as mentioned, the CPU slowdown is inaccurately emulated in MAME, so you can adjust all you want and things will still not be accurate.So in an attempt to make improving upon the current results a possibility I modified the current FBNeo to add a set of special DIP settings to tweak some parameters of buffi's blitter delay calculation. Ideally a set of values can be found which would work for all cv1000 games.
This is why people playing around with tuning to specific percentages with the old Blitter code could get things working kinda alright in some small sections or with some ships, but less accurate in others. It will be the same trying to tweak the new ones.
These are picked from looking at the rates with a logic analyzer, see PDF in https://buffis.com/research/cv1000-blitter-research/Source Clipping Multiplier: number of pixels read from source, likely candidate if clipping is imperfect
Dest Clipping Multiplier: pixels written to destination
Source VRAM Rows Multiplier: overhead after each read from a source VRAM row, likely candidate
Dest VRAM Rows Multiplier: overhead after each write to a destination VRAM row, likely candidate
Draw constant (default: 12): additional cycles per sprite at the end of writing
If someone wants to double check then, I think I have the old LA outputs somewhere, but theres some screenshots with timings in there.
It's of course possible I made some mistake there, but just randomly trying to change stuff there is unlikely to be very good.
To re-iterate, the slowdown is expected to be inaccurate in Mame right now, because the CPU emulation is inaccurate. This is not strange.Let me know if you have timestamps of game/stages combinations which are currently inaccurate with buffi's blitter calculations.
So whats the proper fix
If someone wants to get this working, it means fixing the CPU slowdown in mame. Currently there's only a fixed slider for CPU speed, and while adjusting this is probably the best you will get until someone does the proper implementation which works differently, it does not reflect how things work in an actual SH3.
SH3 wait states are not emulated at all in MAME. On hardware, every non-cached access to RAM involves a wait state. Access to other peripherals does too, but is likely less important. Additionally the CPU can be held in wait externally from a bus request (mostly done when blitter requests data). There's probably a bunch of other timing related things for the CPU that I don't know as well, but the current state is known to not be accurate.
This is exactly what the "coffeepope fork" of mame tries (tried?) to address: https://github.com/CoffeePope/mame-sh34 but looks like development on that stopped. Fixing the CPU slowdown is a LOT of work, especially if trying to do it in a sane enough way to merge back into mame.
Just tweaking shit in the blitter code for lulz is not gonna help you.
-
Firehawke
- Posts: 234
- Joined: Thu Apr 21, 2005 6:37 pm
- Location: Western USA
Re: Custom FBNeo version for cave sh1000/cv1k emulation
Waitstates *in general* are a problem in MAME. CPS-2 also has the same issue going on.
Fixing it would require a fairly substantial overhaul of MAME.
Fixing it would require a fairly substantial overhaul of MAME.
-
Buffi
- Posts: 202
- Joined: Sat Jul 22, 2006 5:17 pm
- Location: Sweden
Re: Custom FBNeo version for cave sh1000/cv1k emulation
A lot of the stuff written here is what I remember from last looking into it, so some stuff/terminology might be wrong.
I think the problematic timing from SH3 is mostly understood, and the datasheet describes the timing well. Fixing it is very hard.
When the device initializes, values describing the various wait states are written to registers. Different memory mapped regions will have different amounts of waitstates inserted when accessing them. For RAM, only non-cached accesses will generate these states. The caching behavior is also described in the datasheet.
In addition, different instructions take different amount of time to execute. The coffeepope mame branch did some work on that (https://github.com/CoffeePope/mame-sh34 ... 50ba2659d4).
Theres also bus requests which Halt the CPU, but these are infrequent enough to probably not matter(?)
None of this is implemented in MAME... but that's not really the full extent of things.
MAME also does some kindof smart things to make the emulation work better on slow machines. The SH3 code can both run in interpreted mode or in DRC mode, where the DRC code makes it quite a bit harder to hook up wait states (I think?, seemed that way when I looked at it).
Additionally, it completely bypasses a lot of ram-accesses, by having an emulation layer cache (see the fastram code related to https://github.com/mamedev/mame/blob/4c ... k.cpp#L968), which needs to be disabled to implement the RAM wait states.
This means that to get accurate timing, shittier computers will have a bad time.
Fixing all of this will still be a LOT of work someone has to do. I noped out quite early once I realized how much it is.
For now, I'd just recommend playing around with CPU% in mame and find something that feels alright to play, and not worry too much about it not being 100% like PCB, cause you aint gonna get something that's completely accurate no matter how you tune the current settings.
Adding weird DIPs for injecting waits in the blitter for unclear reasons does not change that either.
edit: Also, I know some dude on another forum has some weird retro nostalgia for tuning the old blitter code, which is kindof hilarious considering that code having literally nothing to do with how things work on hardware. It was just a hack added to inject some amount of delay into the blitter. To be clear, not shitting on that work in MAME, the effort in getting the driver in place at all was very impressive, but the "old" blitter-delay code was just never meant to be accurate
I think the problematic timing from SH3 is mostly understood, and the datasheet describes the timing well. Fixing it is very hard.
When the device initializes, values describing the various wait states are written to registers. Different memory mapped regions will have different amounts of waitstates inserted when accessing them. For RAM, only non-cached accesses will generate these states. The caching behavior is also described in the datasheet.
In addition, different instructions take different amount of time to execute. The coffeepope mame branch did some work on that (https://github.com/CoffeePope/mame-sh34 ... 50ba2659d4).
Theres also bus requests which Halt the CPU, but these are infrequent enough to probably not matter(?)
None of this is implemented in MAME... but that's not really the full extent of things.
MAME also does some kindof smart things to make the emulation work better on slow machines. The SH3 code can both run in interpreted mode or in DRC mode, where the DRC code makes it quite a bit harder to hook up wait states (I think?, seemed that way when I looked at it).
Additionally, it completely bypasses a lot of ram-accesses, by having an emulation layer cache (see the fastram code related to https://github.com/mamedev/mame/blob/4c ... k.cpp#L968), which needs to be disabled to implement the RAM wait states.
This means that to get accurate timing, shittier computers will have a bad time.
Fixing all of this will still be a LOT of work someone has to do. I noped out quite early once I realized how much it is.
For now, I'd just recommend playing around with CPU% in mame and find something that feels alright to play, and not worry too much about it not being 100% like PCB, cause you aint gonna get something that's completely accurate no matter how you tune the current settings.
Adding weird DIPs for injecting waits in the blitter for unclear reasons does not change that either.
edit: Also, I know some dude on another forum has some weird retro nostalgia for tuning the old blitter code, which is kindof hilarious considering that code having literally nothing to do with how things work on hardware. It was just a hack added to inject some amount of delay into the blitter. To be clear, not shitting on that work in MAME, the effort in getting the driver in place at all was very impressive, but the "old" blitter-delay code was just never meant to be accurate
-
nimitz
- Posts: 900
- Joined: Thu Jan 10, 2008 5:05 am
- Location: Québec
Re: Custom FBNeo version for cave sh1000/cv1k emulation
Thanks for the detailed answer Buffi, I wasn't hoping for such a detailed answer from the man himself!
What I'm wondering though is that I've seem some A/B testing done in other forums where a combination of the "old" cpu clock + old style blitter delay for a specific game and ship would result in a more accurate overall experience than with the new blitter code, people are currently recommending to use those older combinations for better slowdown accuracy.
Which can mean a few things:
1. The new blitter is accurate and having a accurate blitter calculations actually makes it harder to simply underclock the cpu to get accurate slowdown.
2. The new blitter is not fully accurate, which could explain some of the discrepancies.
3. The A/B testing was incorrectly done, which means we need more testing and actual examples of problem areas.
4. People haven't actually tried to combine the new blitter delay + underclocking to get accurate results on a per-game basis.
Allowing for tweaking the blitter calculation independently of cpu clock could potentially allow to improve on cases 1 and 2. For case 3/4 I think getting more data is definitely helpful in any case.
In the end this goes back to simulation accuracy vs emulation accuracy. Until we can have completely accurate emulation, we should at least try to get accurate simulation.
That being said, I wasn't aware that instruction timings were not even implemented in MAME! has anyone done some testing with the new blitter code + correct instruction timings? That should not be too hard to implement in FBNeo.
What I'm wondering though is that I've seem some A/B testing done in other forums where a combination of the "old" cpu clock + old style blitter delay for a specific game and ship would result in a more accurate overall experience than with the new blitter code, people are currently recommending to use those older combinations for better slowdown accuracy.
Which can mean a few things:
1. The new blitter is accurate and having a accurate blitter calculations actually makes it harder to simply underclock the cpu to get accurate slowdown.
2. The new blitter is not fully accurate, which could explain some of the discrepancies.
3. The A/B testing was incorrectly done, which means we need more testing and actual examples of problem areas.
4. People haven't actually tried to combine the new blitter delay + underclocking to get accurate results on a per-game basis.
Allowing for tweaking the blitter calculation independently of cpu clock could potentially allow to improve on cases 1 and 2. For case 3/4 I think getting more data is definitely helpful in any case.
In the end this goes back to simulation accuracy vs emulation accuracy. Until we can have completely accurate emulation, we should at least try to get accurate simulation.
That being said, I wasn't aware that instruction timings were not even implemented in MAME! has anyone done some testing with the new blitter code + correct instruction timings? That should not be too hard to implement in FBNeo.
-
Creamy Goodness
- Posts: 376
- Joined: Wed May 05, 2021 1:23 am
Re: Custom FBNeo version for cave sh1000/cv1k emulation
I believe El_Rika's CPU settings are pretty known since they seem embedded in RetroArch. If anyone is familiar with these, could these just be used with everything else being default?
-
Buffi
- Posts: 202
- Joined: Sat Jul 22, 2006 5:17 pm
- Location: Sweden
Re: Custom FBNeo version for cave sh1000/cv1k emulation
My honest opinion is that these measurements are done by people who have no idea what theyre doing, and no one has bothered fact checking the measurements. They don’t really make any sense. More on that later in this message.What I'm wondering though is that I've seem some A/B testing done in other forums where a combination of the "old" cpu clock + old style blitter delay for a specific game and ship would result in a more accurate overall experience than with the new blitter code, people are currently recommending to use those older combinations for better slowdown accuracy.
I doubt the new one it’s fully accurate, but it’s definitely _more_ accurate> 1. The new blitter is accurate and having a accurate blitter calculations actually makes it harder to simply underclock the cpu to get accurate slowdown.
I expect there to be some stuff I’ve missed compared to real hardware, so its not going to be fully accurate, but thats not really the cause. The discrepancies are expected due to the missing CPU emulation.> 2. The new blitter is not fully accurate, which could explain some of the discrepancies.
Yes, I think the testing is pretty shit. I dont think this problem is solvable by more testing though, since the CPU emulation is not emulated.> 3. The A/B testing was incorrectly done, which means we need more testing and actual examples of problem areas.
It will not be accurate anyways, because CPU underclocking is a poor approximation of the actual CPU slowdown.> 4. People haven't actually tried to combine the new blitter delay + underclocking to get accurate results on a per-game basis.
I don’t think randomly changing value here will lead to anything else than more confusion, similar to the “old blitter measurements”, see more below.> Allowing for tweaking the blitter calculation independently of cpu clock could potentially allow to improve on cases 1 and 2. For case 3/4 I think getting more data is definitely helpful in any case.
I mean, thats what the coffeepope mame branch is. New blitter code and better instruction timing, but it doesnt do the other stuff mentioned in the previous message I wrote. Its still a step in the right direction though and looks promising!> That being said, I wasn't aware that instruction timings were not even implemented in MAME! has anyone done some testing with the new blitter code + correct instruction timings?
Ok, I mean I wasn’t gonna go there, cause I don’t really have any interest in ending up in a flame war with some random person on the internet or whatever but fuck it, strap yourself in cause now I’m gonna go on full on rant mode.> I believe El_Rika's CPU settings are pretty known since they seem embedded in RetroArch. If anyone is familiar with these, could these just be used with everything else being default?
I think those measurements are completely useless. This is based partly on knowing how the old blitter code works, and partly just by common sense. And it’s not even hard to see this just from a glance.
Its its the measurements i’ve seen has silly things like different percentages between different characters like:
The difference in practice here between the characters are gonna be a very small amount of CPU (which is not relevant for blitter calculation) and the sprites being drawn for the shots.Mushihimesama Futari:
47.7 (46.6%) / 57 (Reco N maniac)
48.5 (47.3%) / 56 (Palm N maniac)
Estimates in the examples are completely made up, and just to show why this makes no sense. Lets just for the same of simplicity say that Reco draws 1000 pixels when shooting and Palm draws 800 pixels and that initial testing is done on a pattern with about 5000 pixels.
Someone doing benchmarks notices that with whatever timing they pick, things look right for Reco but wrong for Palm.
Guess what happens when another pattern in the game is encountered which has 5200pixels instead? Suddenly Palms values will be better for Reco, because the “tuning” (lol) was made to look correct when 6000 pixels are drawn.
In practice, its not quite this simple since the blitter delay is not “per pixel drawn”, but the exact same reasoning holds. Trying to do measurements on one or a few individual patterns and tuning based on that will just mean that things look correct on that pattern, not that it somehow is more correct across the game.
I strongly believe that these measurements are useless, and noone with experience with the games on hardware has bothered fact checking it.
It sometimes get even more comically confusing when people compare it to 360 footage.
So yeah, sorry about that rant but the measurements make no sense to me
-
Starfighter
- Posts: 391
- Joined: Sun May 11, 2008 7:15 pm
- Location: Sweden
Re: Custom FBNeo version for cave sh1000/cv1k emulation
I understand close to nothing of this and I'm sorry for creating this post if it makes you all think there's some new information on the topic (I don't want to get your hopes up for nothing etc), but I want to say it makes me really happy to see this level of commitment. I'm very impressed and wholeheartedly admire these technical endeavors. (I might as well add that I'm not trying to be patronizing or sarcastic, it's hard to tell sometimes with text on the internet.)
-
nimitz
- Posts: 900
- Joined: Thu Jan 10, 2008 5:05 am
- Location: Québec
Re: Custom FBNeo version for cave sh1000/cv1k emulation
Thanks Starfighter, the way I see it, if we're going to preserve these games, might as well do it in a way that comes as close as possible to the original experience. Especially if there's a marked difference, as is the case here.
I did some testing with CoffeePope's build and it seems to be very accurate (compared to youtube videos of actual pcbs, including some of yours Buffi) so that motivated me to look into making cv1k games more playable in mame and turns out that removing a frame of lag by doing rendering on a single buffer comes with no visual issues at all. Which brings the total system lag on a modern pc to something very close to pcb + crt.
I'm going to release that build soon..
I did some testing with CoffeePope's build and it seems to be very accurate (compared to youtube videos of actual pcbs, including some of yours Buffi) so that motivated me to look into making cv1k games more playable in mame and turns out that removing a frame of lag by doing rendering on a single buffer comes with no visual issues at all. Which brings the total system lag on a modern pc to something very close to pcb + crt.
I'm going to release that build soon..
-
nesrulz
- Posts: 186
- Joined: Wed Nov 13, 2013 6:01 pm
Re: Custom FBNeo version for cave sh1000/cv1k emulation
Greetings nimitz.
Thank you for continuing to work hard. Cheers.
Thank you for continuing to work hard. Cheers.