So far, I've got a slow raycasting engine that only renders flat coloured walls.
(http://lithium.zext.net/asmcast1.png)
It uses a modified version of TinyPTC, which uses either VFW, DDraw, or the GDI.
Anyway, the slow part right now is the actual casting of the rays...
The current method is as such:
Trace a breshnham line into the map until something is hit, return hit position (fixed point, precision = 256)
The better method, which I'm planning to do next is:
Using the equation of a line, jump across grid lines until something is hit, return hit position
Before I start though, I'm wondering if I should do it with fixed point or floating point... I would prefure fixed point, but floating point will give me better quality.. and it will eliminated cracks along gridlines. So how much faster is fixed point verses floating point these days? The other issue is that I've never used the FPU in ASM before, if someone could point me to "the best tutorial" that would be great.
You can check it the demo here:
http://lithium.zext.net/asmcast.rar (Left/Right - Turn (no move yet))
It's been a long time since I've dealt with this but I think you may have more problems with missed points using floating point. The problem you'll encounter is rounding errors. I don't recall the solution but there was a specific trick we used to use for this problem.
I did write one before, in FreeBASIC, although the wall slices and floors and ceelings were rendered with inline asm.
I remeber I had the ray casting function using floating point, it was virtually perfect, but then when I switched to fixed point I encountered the problem with cracks inbetween blocks, on the corners... I was using a precision of 65536 as well, but the reason I switched to fixed point was because it did improve the speed a bit, and the cracks wern't noticable enough...
I was going to paste the old code for that, but there's too many features -- it basically works like this...
do
move ray1's x in the direction of the ray 1 grid unit
update ray1's y accordingly
move ray2's y in the direction of the ray 1 grid unit
update ray2's x accordingly
loop until both "rays" have hit something, or are out of bounds
take the ray that has gone the smallest distance, that is our distance.
Now that I write that out it seems a lot simpler, maybe this won't be too bad after all...
EDIT: I forgot to mention, both rays need to be alligned to the grid first.
original raycasting was using a LUT, based on phytagoras and arctanLUT and cosine to correct fisheye and was fast enough to run on a 286
I'm using luts for sine and cosine, there's maybe a few too many muls and divs, but they're all in O(screen width) time, so I doubt they'll make a huge difference, there is one IntSqrt per x, and other than that everything's fixed point...
Anyway, I've semi-finished implementing the optimized (but huge), ray casting function... I'm having the same problem I had when introducing fixed point into my previous engine, i.e. a crack running down a corner on a block... this is due to precision errors, and I don't think there's much I can do other than increase the precision, which from my expereince won't ever totaly fix the problem, unless I have as much precision as I would with floating point.
... I just implemented moving forward/backward, it seems as tho I have other issues to work out as well... Ah well, nothing a little headsmacking won't fix.
(good..... bad...)
(http://lithium.zext.net/asmcast2.PNG)
EDIT: you can get this version here: http://lithium.zext.net/asmcast.zip
Hi LithiumDex,
Definitely getting flashbacks from 286 coding days!, man it has been sooo many years since I looked into that kindof code. It will be intersting to see how you get rid of the gaps that appear sometimes. I extremely vaguely remember something like working on a bigger scale fixed or reduced that rounding error without having to consider going back to floating point (could be quite wrong there). Good Luck!
regards, Jason
Thanks, I'm not sure how I'm going to get rid of the small cracks... the image on the right however is a code error that shouldn't be hard to fix.
A couple of things come to mind:
You might have already seen this tutorial http://www.permadi.com/tutorial/raycast/index.html which explains a lot of the calculation "corner-cutting" that was used in wolfenstein3d & doom type raycasting, I noticed your walls are infinitely thin too ( another difference ) particularly using a grid size of a power of 2 can provide immense speedups and accuracy. ( the example uses a grid size of 64, so that you can use left and right shifts instead of muls and divs which are really slow ) and makes it all mostly integer arithmetic ( greased lightening for cpu optimising type guys on this site). The tutorial has a demo, with painted floor & ceiling & no cracks runs really fast in a browser... in JAVA :dazzled: I think the cube grid, integer arithmetic and some tweaking might mean you'll have to slow yours down a bit :D
Jason
Thanks, and yeah... I noticed in that tutorial that they were using 1/64 precision, so obviously higher precision is not nescesarily the awnser, and in this case it wasn't. (entirely)
I managed to get it working, my last build and test I couldn't produce a crack, although I'm sure it's not entirely gone.
I also had problems with overflows, but I seem to have fixed them somewhat now, I'll have to see when I test it with a bigger map.
Anywho, here's the latest version:
http://lithium.zext.net/asmcast.rar
(http://lithium.zext.net/asmcast3.png)
On the todo list, in this order:
- Wall textures
- Depth Shading
- Floor/Ceeling
- Doors.. (if I think I would ever use this for a game)
If you want to get 3D models in a game, I recommend using OpenGL or Direct3D, which support hardware acceleration.
no cracks here . and a bit faster too i think. :cheekygreen:, Let me explain a little why I can appreciate exploring soft rendering like this. The tendency these days is, quite justifiably, to chuck as much of the graphics processing out to the specialised GPU via directX or other abstracted mechanism. what this does is save you from reinventing the wheel, and hides some of the gory mechanisms you are exploring ( on purpose).
But there are some of us who would like to understand, say for example, how a line is drawn, and even try our own implementaion of breshenam's algorithm. I think DirectX IS designed for games programming, not for understanding and learning how graphics algorithms actually work. I think by exploring your own implementation will better equip you to do the same thing using DirectX ( or any other available API ) because you'll have a better understanding of the data structures and processing going on within the black boX.
I would think that then using Both the GPU and CPU to full potential, rather than leaving the 2+ GHz CPU Idle ( and perhaps handling a few mouse events )
would be a worth a try.
Just my 2 cents.
[Later: i.e. perhaps a hybrid asmcast+direct3d, direct3d (GPU) for hardware textures etc, asmcast for the projection & collision physics (CPU) could be quite a setup with some potential]
Jason
That's certainly a good reason why not to use a 3D API. Understanding how 3D scenes are being drawn is an interesting subject, and this knowledge can come in handy even if you use a 3D API like Direct3D.
Right... This is actually my second raycasting engine, but the learning expereince is still quite valuable. My first one was done in FreeBASIC, using inline ASM for the wall textures and floors and such (you can have a look here: (http://lithium.zext.net)... I plan to surpass my first one, and the other reason I'm doing this is because it was the most interesting thing I could think of to code in MASM32 -- as I just started my intro to assemebley language programming class, I think the only topic I havn't covered this far is the FPU lol :P
EDIT: There is a frame limiter now, on my 1ghz it would run < 60 before, now with the frame limiter off I get atleast 200 fps. That's just an estimation, but I'm hoping this will run at a decent speed on even a 300mhz.
Nice thing, that raycaster you made. (the one from your site)
It really made me interested - I'm wondering what one can squeeze out of a CPU alone.
The best non-accelerated 3D graphics I know came from the original Unreal Tournament. I wonder if you can get something better :) Maybe I could provide you with some help now and then.
Great work guys, I wish I had the time to play.
Thought this might come in handy if you haven't already seen it -> The Graphics Programming Black Book (http://www.byte.com/abrash/)
I haven't read it but it looks interesting :)
Thanks,
Although my previous raycaster was stable, I couldn't get more than a steady 40fps out of it on my 1ghz -- But contrary to what my good friend thinks, (i.e. floats are just as fast fixed point these days), in my last release where I converted from floating to fixed point for just the raycasting loop alone, I gained about 10fps -- That's not a huge difference, but it's notable.
As far as this engine goes, I'm thinking about creating a checkpoint now, and at some point adapt it to allow for variable wall heights, (i.e. doom) -- because this requires checking beyond the closest ray intersection point, this will be slower, and I havn't figure out exactly how to implement raised floors with speed, but it's something to think about anyway.
Back to the current work though, getting the right perspective with the floors/seelings and having it fast is an issue -- I noticed permadi's tutorial doesn't cover this in much detail -- If i remeber correctly they were the biggest slow down in my first engine... (Although entirely fixed point, and luts where possible, I think I still had a mul/div for each pixel)
Quote from: j_groothu on January 23, 2007, 03:27:50 PM
no cracks here . and a bit faster too i think. :cheekygreen:, Let me explain a little why I can appreciate exploring soft rendering like this. The tendency these days is, quite justifiably, to chuck as much of the graphics processing out to the specialised GPU via directX or other abstracted mechanism. what this does is save you from reinventing the wheel, and hides some of the gory mechanisms you are exploring ( on purpose).
But there are some of us who would like to understand, say for example, how a line is drawn, and even try our own implementaion of breshenam's algorithm. I think DirectX IS designed for games programming, not for understanding and learning how graphics algorithms actually work. I think by exploring your own implementation will better equip you to do the same thing using DirectX ( or any other available API ) because you'll have a better understanding of the data structures and processing going on within the black boX.
no you can put up a quad and render with program pixelshaders I can run RTRT at 20 fps, but its also designed to that specialcase of squares with 90 degrees as raycasting and its programmable 128bit instructions that make SSE look like crap, because it has so many built in matricesfunctions etc that SSE lack, look at some shaders the circle is closed, they are inspired by the old assembly waterrippleshaders,fireshaders etc
I would think that then using Both the GPU and CPU to full potential, rather than leaving the 2+ GHz CPU Idle ( and perhaps handling a few mouse events )
would be a worth a try.
use the gpu to full potential, you want the cpu for all AI etc, unless you wanna go for an asm RTRT demo
[Later: i.e. perhaps a hybrid asmcast+direct3d, direct3d (GPU) for hardware textures etc, asmcast for the projection & collision physics (CPU) could be quite a setup with some potential]
Jason
I was rewriting a kinda my homebrew raycaster to 80 rays, preparing it for an experiment to only cast a few rays and let direct3d draw whole textured/lit/3dtransformed walls and objects and trace thru transparent windows and store where it hit and blend some window ontop afterwards
Quoteno you can put up a quad and render with program pixelshaders I can run RTRT at 20 fps, but its also designed to that specialcase of squares with 90 degrees as raycasting and its programmable 128bit instructions that make SSE look like crap, because it has so many built in matricesfunctions etc that SSE lack, look at some shaders the circle is closed, they are inspired by the old assembly waterrippleshaders,fireshaders etc
....
use the gpu to full potential, you want the cpu for all AI etc, unless you wanna go for an asm RTRT demo
....
I was rewriting a kinda my homebrew raycaster to 80 rays, preparing it for an experiment to only cast a few rays and let direct3d draw whole textured/lit/3dtransformed walls and objects and trace thru transparent windows and store where it hit and blend some window ontop afterwards
Nice approach that, certainly will be looking more into the pixel shaders myself ( a lot to learn there ). I assume that it would avoid the bandwidth limitations that would be dominant in my older ATI AGP 8x card and free up the CPU some more as you mention. Taking a closer look at some of the demos with source on the ATI website is an eye opener fpr me ( I like the terrain one that uses some interesting data structures), I hadn't really considered my old radeon 9550 was capable of that.
Jason
Allright, here's today's work:
http://lithium.zext.net/asmcast.zip
(http://lithium.zext.net/asmcast4.png)
I've designed it so you can use any power of two texture size, up to 1024, there is also a basic blit function and bmp load function...
the bmp function only loads 24bit uncompressed bitmaps, and the blit function doesn't do transparencey or clipping.
Also this screenshot shows the engine with a 640x480 resolution.
I stayed up all night trying to code a floor-caster that wouldn't have any muls in the inner most loop, my first attempt failed, so I went back to my old method, and in the process found some other issues and fixed them... and now I think if I tried my optimization again it might work...
Anyway, I've recoded some small, non-speed-critical parts of the engine using the FPU... I was a little weary to learn it but it's not so bad now.
So, there's now a textured floor in the demo -- right now it's just one texture repeated, but it won't be hard to implement for a floor map,
and as for the ceiling it's just a matter of copy and paste with a little edit.
I think I will do depth shading next... then sprites... then movable blocks and doors... and somewhere inbetween there I will write a map editor. (and improve the existing) 2d drawing functions
You can download this version here: http://lithium.zext.net/asmcast.zip
A screenie:
(http://lithium.zext.net/asmcast5.png)
But since this is the laboratory, (and not the workshop as you would think by these last two posts), I'll breifly discuss my floor-mapping algorithms...
The current Algorithm
for each frame, the distance from the origin for each y is computed and stored in a lut.
the wall slices are drawn a column at a time, scaling the directional vector for that x by the distanced stored in the lut for each y, added to the camera position, and the map-coordinates are found accordingly... So basically there's two muls for each pixel (bad, but how bad?)
The faster Algorithm
I'm not entirley sure if it will work, but this algorithm is based of my assumption that for ever row of mapped texture on the screen, the difference in map coordinates from (screen)x to x+1, will be the same as x+1 to x+2, and x+2 to x+3 and so on...
So I would, for each y:
Calculate the distance and position of the left-most ray (x=0) for this y
Calculate the distance and position of the right-most ray(x=screen width) for this y
take the difference of those two vectors, divide by screen width to normalize
Set the map coordinate to the position of the left-most ray, and increment by the vector from the last step, and increment x by 1, get texture coordinate, draw pixel, loop until x=screen width
EDIT: I couldn't get my mind off it... so I tried it, and it worked.. The entire problem with it wasn't actually a problem with it, infact it worked fine in the first place -- it was another part of my engine that was making it look incorrect... now that that's fixed it works great... that makes this all-nighter worth-while ;)
Added depth shading today... It uses luts, totalling about 192KB of memory... the only unfortunate part is that it requires 3 bytes to be read seperatley for each pixel from the lut (which is actually three) -- is mov'ing a byte faster or slower than a dword?
Anyway, here's a screenshot:
(http://lithium.zext.net/asmcast6.png)
And the download link is the same:
http://lithium.zext.net/asmcast.zip
me being a relative Noob at the optimisation thing, I would have to say it depends on the instructions around it. You might only get 1 byte or word move per cycle ( or 2... ) or you might be able to squeeze out several per cycle. If you could point out a particular critical code section and list it here ( perhaps an inner loop or something) someone much much better than me would likely point you in the right direction. Have you taken a look at Agner Fog's optimisation docs yet ?
Jason,
P.S. Wish I had more time to play with this at the moment myself, oh well back to study
1 dword operation is faster than 3 byte operations - try using 32 bits per pixel.
I was trying to change it to use 1024x1024 textures, but it breaks the skymapping and outerwalls
no fps counter? could be useful when you let many with different cpus test it, I was interested to see if it slowed down much if use different hirestexture for each and every block
unfortunatley I can't use one dword operation... I was trying to set it up as such, but it would require an enormous lut, unless I sacrifised alot of colour depth.
my plan of attack now, is to decrease the colour depth by a factor of four, and the amount of depth levels by a factor of two... which will decrease the size of my lut from 192kb to 24kb, then I will have it allocated as static memory, as apose to dynamic.
as for the 1024x1024 textures.. I theorized that it would be possible, but I didn't test it... (oops), I'll have to run some tests on that and see if I can fix it.
oh and: http://lithium.zext.net/asmcast_test.zip -- press F to get a MessageBox with the framerate, warning though -- the frame limiter is turned off, so be carefull not to run outside of the level, or it will crash.
Avoid floats in the casting simply because of the rounding errors, and not just because they are less efficient. Absolutely take advantage of power-of-two sizes and power-of-two fixed point (Signed 15.16 fixed point would be my choice)
The casting shouldnt be a big efficiency issue with only ScreenWidth casts per frame (maybe double that if you allow for some mirrored walls, something ive never seen in a raycaster but certainly possible.) If you have (are expecting) a lot of wide open spaces in the voxel map, then you might want to store the closest distance to the next voxel (so you can skip several at a time) but I dont think that will be a major benefit.
The rendering of the strips to your backbuffer (system ram I assume) could take advantage of the SSE "streaming store" movnti instruction. Additionally, you could be prefetchnta'n the texels as you run down the strip.
The biggest issue is going to be the presentation method. Lots of choices but none of them should much better than another because you are going across the AGP/PCI BUS. I'd stick with GDI and DibSections or CreateBitmap() myself simply because DirectDraw doesnt have the greatest of support anymore and DirectX/OpenGL is way overcomplicated with little benefit for a simple presentation routine. SDL is an option but you will run into the same DirectDraw issues since thats what it'll be using on the back end (unless SDL uses OpenGL on windows now? its been awhile..)
Quote from: Rockoon on February 02, 2007, 09:45:31 PM
The biggest issue is going to be the presentation method. Lots of choices but none of them should much better than another because you are going across the AGP/PCI BUS. I'd stick with GDI and DibSections or CreateBitmap() myself simply because DirectDraw doesnt have the greatest of support anymore and DirectX/OpenGL is way overcomplicated with little benefit for a simple presentation routine. SDL is an option but you will run into the same DirectDraw issues since thats what it'll be using on the back end (unless SDL uses OpenGL on windows now? its been awhile..)
why not opengl/dx solution?, let the cpu castrays and tell gpu what coordinates and UVcoordinates to render each of 640 quads?latest gpus have support for bumpmapping, which is what you need for realisticlooking brickwalls
or you could render to systemram and upload it as texturefrommemory and turn on all antialiasing,texturefiltering,trilinear filtering etc
Quote from: daydreamer on February 12, 2007, 05:48:57 AM
why not opengl/dx solution?, let the cpu castrays and tell gpu what coordinates and UVcoordinates to render each of 640 quads?latest gpus have support for bumpmapping, which is what you need for realisticlooking brickwalls
or you could render to systemram and upload it as texturefrommemory and turn on all antialiasing,texturefiltering,trilinear filtering etc
Because there is little advantage to using OpenGL/DX besides having cleaner control over screen resolution. Each ray is a point sample, not an area sample, so it would be hard to generate quad texture vertices from the raycast data.
Quote from: Rockoon on February 12, 2007, 08:07:33 PM
Quote from: daydreamer on February 12, 2007, 05:48:57 AM
why not opengl/dx solution?, let the cpu castrays and tell gpu what coordinates and UVcoordinates to render each of 640 quads?latest gpus have support for bumpmapping, which is what you need for realisticlooking brickwalls
or you could render to systemram and upload it as texturefrommemory and turn on all antialiasing,texturefiltering,trilinear filtering etc
Because there is little advantage to using OpenGL/DX besides having cleaner control over screen resolution. Each ray is a point sample, not an area sample, so it would be hard to generate quad texture vertices from the raycast data.
you can tell the hardware to pointsample the texture
are you confusing it with raytracing that each ray is a point sample?each ray result in you render a 1pixelwide slice of the wall, its only adress it to use a value between 0.0f to 1.0f than his crappy 0-63 int, vertical you could set flag for tiledtexture and set it 0.0f in the top and 6.0f in the bottom means it repeats the texture 6 times
x values could be initialized values to 640 tiles, while ytop and ybottom is set different to the usual verticalsize that makes it pseudo3d
Quote from: daydreamer on February 17, 2007, 02:43:27 PM
you can tell the hardware to pointsample the texture
are you confusing it with raytracing that each ray is a point sample?each ray result in you render a 1pixelwide slice of the wall, its only adress it to use a value between 0.0f to 1.0f than his crappy 0-63 int, vertical you could set flag for tiledtexture and set it 0.0f in the top and 6.0f in the bottom means it repeats the texture 6 times
x values could be initialized values to 640 tiles, while ytop and ybottom is set different to the usual verticalsize that makes it pseudo3d
I think you are confusing screen space with texel space.
Each ray cast relates to an infinitely thin world space strip, which is used as a point sample for a 1 pixel wide strip of screen space, and this will not map linearly to a 1-pixel anything in texel space.
In texel space that strip could represent an area 1 texel wide, 100 texels wide, or 1/100 texels wide, or any other value based on distance, scaling, and orientation factors.
The end result being that you are making slow draw calls to hardware (yes, draw calls and state changes are slow.. thats why we batch geometry together) to do exactly what you would be doing rapidly in software .. and without any of the control software gives.
Now if you wanted to raycast on the other side of the draw call (such as with displacement mapping), then I could probably agree with you.
As for his "crappy" 0..63 .. if he is using a fixed-point to voxel resolution of 64:1 then no amount of GPU tricks will get him (or you) more resolution from it .. 64 points per voxel.. thats it.