The other day I decided to spend an hour writing a quick Mode 7 plugin, just to test the possibility.
Usually I'd do this kind of thing in hardware via WebGL, but to keep compatibility high this time I opted to use the slower, software rendered canvas features of HTML5.
Immediately, there were massive performance issues. Running at 100% resolution the rendering was at 1 frame every 9 seconds, that's 0.111 FPS.
I had to reduce the resolution by a factor of 7 before I could get anything remotely playable.
On the top left is the RPG Maker MV standard FPS display for the entire game and below that is my mode 7's FPS calculation (how fast the mode 7 itself is rendering), with the average of the mode 7's FPS below that (used to better see performance gains).
There's two limiting pieces here; MV's update rate (top FPS) and the mode 7's render rate (middle FPS).
The CPU time profiler describes what's going on;
83.52% of the CPU time is spent rendering the mode 7 effect, and overall the engine is only left with 4.21% of CPU time for internal work, hence the 4 FPS MV time versus the relatively fast mode 7 rendering.
The drawScanLine call here actually draws the entire frame and uses Bitmap.blt to draw the graphics, so first thing is to optimise the blitting.
I changed this naive blt method to the canvas native method and invalidated the bitmap once all drawing is completed;
Notice that the MV FPS is still 4, but the mode 7 FPS is up slightly. The problem is still the internal engine, I can optimise how I use that as much as possible, but I can't change how that works, which seems to be the problem here.
The stats show what's going on;
So because we're spending less time rendering the mode 7 due to using drawImage rather than blt, the system can render more mode 7 frames and make matters worse for the engine. Not good.
I opted to ignore HTML5 canvas for now and render things manually via colour arrays. Javascript is incredibly slow, so this option will be slower, but it doesn't touch the engine internals as much so that should give MV time to actually render and do things faster than 4 FPS.
So on initialisation, we use a slow canvas method to grab all the pixel data of the source image and create a colour buffer to draw into.
The colour buffer is the height and width of our rendering screen.
So to draw a pixel we write the RGBA values into the array;
That << 2 is just multiplying the source and dest offsets by 4 to skip over each RGBA component (which are hit by the +0, 1, 2, 3 offsets).
The buffer can be uploaded(?) to the MV bitmap with another HTML5 method;
This is very slow, but shifts the weight lifting from the engine internals to the mode 7 plugin, as a result the entire thing runs much faster (as it is paced by the performance of MV, rather than HTML5 canvas).
Holy cow, a steady FPS and the mode 7 is updating quickly.
Check out the CPU usage percentages;
That (idle) at the top shows that the new limiting factor is the 60Hz draw rate of MV - the engine will do its work, then wait around until it's time to update every 16.666 milliseconds (60 FPS).
So that's a lot of unused processing power sitting there. The resolution downscale of 7 can be removed.
100% resolution. 2 FPS, ouch.
What does the profiler say?
42.32% of CPU time is spent rendering out mode 7, previously that was 2.17%, so now too much time is spent rendering out that full-resolution screen.
The obvious thing to do is drop the resolution, but I'm not one to compromise quality before methodology.
Some graphics programmers would be wondering why I named the colour buffer variable _scanLine, well that's because I planned for this from the beginning.
Rather than reduce the bitmap resolution, I'll reduce the size of the colour buffer that we write to, so it's 1/8th of the height of the bitmap.
This means only a horizontal slice will be rendered. What I can now do is draw into that slice, upload the slice to the bitmap, then draw the next section. If things are taking too long, cancel and continue the work on the next frame.
This technique is used in the software renderers for Quake and Quake II.
this._msec is set to 12, so if the rendering takes longer than 12 milliseconds it will cancel for now and continue on the next update.
It will also move back to the top and continue drawing from there if the scan line goes past the bitmap height.
this._interleaved is a small experiment I did for interleaved scan lines, where every even scan line is updated and then every odd scan line is updated. It works nicely, but causes massive blurring so I left it at 0.
Cancelling the rendering if things take too long worked. This is a full resolution buffer rendering at 30 FPS, with MV updating at 60 FPS (so roughly half of the mode 7 graphics will render each frame).
This does result in vertical tearing when moving around, but that's a compromise I'm happy with.
CPU time dropped from 42.32% to 36.53%. Also, an object's getter is now apparently a cause for performance drop, I missed a caching opportunity so I fixed that at this point too (0.01% isn't much to worry about, but I like clean rendering).
It is probably worth dropping the resolution to gain a bit more performance. What I decide to do is keep the rendering height at full resolution, but make the width square (width == height).
You probably can't tell much of a difference unless you flip between the before and after images. An 8 FPS gain for the mode 7 renderer shows that even small resolution changes can make a bit enough difference.
The profiler shows that the CPU didn't really care about the resolution drop all that much;
From 36.53% to 36.47%. It's interesting that 0.06% of CPU time in the right of the whole picture can boost the performance by 8 FPS in one small area.
The biggest downer here is the fact that MV's Javascript does not support threading, which would have helped massively here (each thread could work on a scan line). Makes you wish Ruby was back.
Tested on an i7 5930k @ 4.2GHz, so that miserable 30 FPS would likely be 15 FPS for most computers.
If I were to use WebGL then this would be lightning fast at full resolution with zero compromises, but this was an experiment on the performance of raw canvas.
It's a shame that Javascript is so much slower than Ruby. Not only will this mode 7 likely won't ever be able to support real MV maps it also won't support animations due to how slow HTML5 canvas is with image processing.
It's a good thing MV supports WebGL to make up with how slow its Javascript is.
Plugin writers; use the performance tools, remember to cache, and think outside of the box.
Usually I'd do this kind of thing in hardware via WebGL, but to keep compatibility high this time I opted to use the slower, software rendered canvas features of HTML5.
Immediately, there were massive performance issues. Running at 100% resolution the rendering was at 1 frame every 9 seconds, that's 0.111 FPS.
I had to reduce the resolution by a factor of 7 before I could get anything remotely playable.
data:image/s3,"s3://crabby-images/5e9fc/5e9fc64aeca3309cbfc8a03ba513055dc635a7bc" alt=""
On the top left is the RPG Maker MV standard FPS display for the entire game and below that is my mode 7's FPS calculation (how fast the mode 7 itself is rendering), with the average of the mode 7's FPS below that (used to better see performance gains).
There's two limiting pieces here; MV's update rate (top FPS) and the mode 7's render rate (middle FPS).
The CPU time profiler describes what's going on;
data:image/s3,"s3://crabby-images/0204c/0204c697291ca1ab49b2b24da2150a7e734c136c" alt=""
83.52% of the CPU time is spent rendering the mode 7 effect, and overall the engine is only left with 4.21% of CPU time for internal work, hence the 4 FPS MV time versus the relatively fast mode 7 rendering.
The drawScanLine call here actually draws the entire frame and uses Bitmap.blt to draw the graphics, so first thing is to optimise the blitting.
JavaScript:
for ( yy = 0; yy < height; yy++ )
for ( xx = 0; xx < width; xx++ )
this.bitmap.blt( this._source, Math.floor( sx * this._sourceWidth ), Math.floor( sy * this._sourceHeight ), 1, 1, xx, yy );
JavaScript:
for ( yy = 0; yy < height; yy++ )
for ( xx = 0; xx < width; xx++ )
this.bitmap._context.drawImage( this._source._canvas, Math.floor( sx * this._sourceWidth ), Math.floor( sy * this._sourceHeight ), 1, 1, xx, yy, 1, 1 );
// ... when drawing is complete
this.bitmap._setDirty();
data:image/s3,"s3://crabby-images/1c60e/1c60edf4d77088799d27508e57e990b0ce13963a" alt=""
Notice that the MV FPS is still 4, but the mode 7 FPS is up slightly. The problem is still the internal engine, I can optimise how I use that as much as possible, but I can't change how that works, which seems to be the problem here.
The stats show what's going on;
data:image/s3,"s3://crabby-images/cf46e/cf46eae550da42ae49bbbdc23132ab0b2f5872a0" alt=""
So because we're spending less time rendering the mode 7 due to using drawImage rather than blt, the system can render more mode 7 frames and make matters worse for the engine. Not good.
I opted to ignore HTML5 canvas for now and render things manually via colour arrays. Javascript is incredibly slow, so this option will be slower, but it doesn't touch the engine internals as much so that should give MV time to actually render and do things faster than 4 FPS.
So on initialisation, we use a slow canvas method to grab all the pixel data of the source image and create a colour buffer to draw into.
JavaScript:
// Create colour buffer
this._scanLine = this.bitmap.context.createImageData( Math.ceil( width ), Math.ceil( height ) );
// Get texture pixels
this._textures[0] = this._source.context.getImageData( 0, 0, this._source.width, this._source.height );
So to draw a pixel we write the RGBA values into the array;
JavaScript:
var source = ( sx + sy * this._sourceWidth ) << 2;
var dest = ( dx + dy * this._scanLineWidth ) << 2;
this._scanLine.data[dest + 0] = this._textures[index].data[source + 0];
this._scanLine.data[dest + 1] = this._textures[index].data[source + 1];
this._scanLine.data[dest + 2] = this._textures[index].data[source + 2];
this._scanLine.data[dest + 3] = this._textures[index].data[source + 3];
The buffer can be uploaded(?) to the MV bitmap with another HTML5 method;
JavaScript:
this.bitmap.context.putImageData( this._scanLine, 0, 0 );
this.bitmap._setDirty(); // Tell MV the bitmap is changed
data:image/s3,"s3://crabby-images/0ae6b/0ae6b131ed9a5b5c873c5dc9ad23e502b3373f4d" alt=""
Holy cow, a steady FPS and the mode 7 is updating quickly.
Check out the CPU usage percentages;
data:image/s3,"s3://crabby-images/3234b/3234b9eb5beccf5e9b4d8cc07bcd885719216ea9" alt=""
That (idle) at the top shows that the new limiting factor is the 60Hz draw rate of MV - the engine will do its work, then wait around until it's time to update every 16.666 milliseconds (60 FPS).
So that's a lot of unused processing power sitting there. The resolution downscale of 7 can be removed.
data:image/s3,"s3://crabby-images/59fa5/59fa5c620a47e7fd7c9e8c9809828ebf60a0a116" alt=""
100% resolution. 2 FPS, ouch.
What does the profiler say?
data:image/s3,"s3://crabby-images/f4a4e/f4a4e0e23244f0b09951d98345a5ce4c6b6bb959" alt=""
42.32% of CPU time is spent rendering out mode 7, previously that was 2.17%, so now too much time is spent rendering out that full-resolution screen.
The obvious thing to do is drop the resolution, but I'm not one to compromise quality before methodology.
Some graphics programmers would be wondering why I named the colour buffer variable _scanLine, well that's because I planned for this from the beginning.
Rather than reduce the bitmap resolution, I'll reduce the size of the colour buffer that we write to, so it's 1/8th of the height of the bitmap.
JavaScript:
this._scanLine = this.bitmap.context.createImageData( Math.ceil( width ), Math.ceil( height / 8 ) );
This technique is used in the software renderers for Quake and Quake II.
JavaScript:
var startTime = window.performance.now();
var scanCount = this._scanCount;
while ( scanCount-- ) {
this.drawScanLine( this._currentScan ); // Draw current scan
this._currentScan += 1 + this._interleaved;
if ( this._currentScan * this._scanLineHeight >= this._bitmapHeight ) {
this._currentScan -= this._scanCount + this._interleaved; // Move back to the top (v-blank!)
}
if ( window.performance.now() - startTime >= this._msec ) {
break; // Cancel if taking too long
}
}
It will also move back to the top and continue drawing from there if the scan line goes past the bitmap height.
this._interleaved is a small experiment I did for interleaved scan lines, where every even scan line is updated and then every odd scan line is updated. It works nicely, but causes massive blurring so I left it at 0.
Cancelling the rendering if things take too long worked. This is a full resolution buffer rendering at 30 FPS, with MV updating at 60 FPS (so roughly half of the mode 7 graphics will render each frame).
data:image/s3,"s3://crabby-images/d8731/d87317b56e0465e1098f9ef86d9673082efa1443" alt=""
This does result in vertical tearing when moving around, but that's a compromise I'm happy with.
data:image/s3,"s3://crabby-images/cb86b/cb86b975af986e5bcb2c94acf15800438d4157a0" alt=""
CPU time dropped from 42.32% to 36.53%. Also, an object's getter is now apparently a cause for performance drop, I missed a caching opportunity so I fixed that at this point too (0.01% isn't much to worry about, but I like clean rendering).
It is probably worth dropping the resolution to gain a bit more performance. What I decide to do is keep the rendering height at full resolution, but make the width square (width == height).
data:image/s3,"s3://crabby-images/088ae/088aebf33a7b9781dfb1106bd87eb641a84cd007" alt=""
You probably can't tell much of a difference unless you flip between the before and after images. An 8 FPS gain for the mode 7 renderer shows that even small resolution changes can make a bit enough difference.
The profiler shows that the CPU didn't really care about the resolution drop all that much;
data:image/s3,"s3://crabby-images/87518/87518bcf2f51e2e5022471c01bdd8cabccd1b9d4" alt=""
From 36.53% to 36.47%. It's interesting that 0.06% of CPU time in the right of the whole picture can boost the performance by 8 FPS in one small area.
The biggest downer here is the fact that MV's Javascript does not support threading, which would have helped massively here (each thread could work on a scan line). Makes you wish Ruby was back.
Tested on an i7 5930k @ 4.2GHz, so that miserable 30 FPS would likely be 15 FPS for most computers.
If I were to use WebGL then this would be lightning fast at full resolution with zero compromises, but this was an experiment on the performance of raw canvas.
It's a shame that Javascript is so much slower than Ruby. Not only will this mode 7 likely won't ever be able to support real MV maps it also won't support animations due to how slow HTML5 canvas is with image processing.
It's a good thing MV supports WebGL to make up with how slow its Javascript is.
Plugin writers; use the performance tools, remember to cache, and think outside of the box.