Saving a 3D array into a 1D array to the Hard Drive?

Started by Coral88, January 23, 2006, 06:31:36 AM

Previous topic - Next topic

Coral88

I am working with PowerBasic CC4. I am  finishing to write my Neural Network Pgr.

http://www.powerbasic.com/support/forums/Forum5/HTML/003259-4.html   (see my code at the end of the posting)


SUB NNsave()
      'I  Removed some  uneccessary code from this posting
       ERRCLEAR
           hFile = FREEFILE
              OPEN "H:\PBGuy_NN0_060122_1228.DTA" FOR RANDOM AS hFile LEN = SIZEOF(SglCell)
                 IF ERR THEN
                   PRINT "Unable to open data file D:\PBGuyL\ALL6Rslt.DAT -- Push any key"
                   WAITKEY$:EXIT SUB 'FUNCTION
                 END IF
       FOR Cntr04% = 1 TO 2 ' 1 to 18 !WHEN I USE ALL THESE THREE VALUES TOGETHER PBCC4 HANGS
           FOR Cntr05% = 1 TO 150 ' 1 to 1500 !THE FILE SHOULD BE 192MB but it hangs at 53 MB
              FOR Cntr06% = 1 TO 1600 '1 TO 1600 !and TAKES 2 Hours to DO 53 MB
                                                                                                                 [font=Verdana] 'When Cntr04% = 1to 2 / Cntr05% = 1to150 /Cntr06% =1 to 1600
                                                                                                                   '-it takes 6 seconds to create this random file 1.875 Mb long
                                                                                                                   ''On win2000 Celeron 2.8Ghz Hyperthreaded - HD AT 7200 Rpm capacity 240 GB  [/font]
                        SglCell&= AR1& (Cntr04%,Cntr05%,Cntr06%)
                        PUT hFile, Cntr04% * Cntr05% * Cntr06%, SglCell
                                                                                                                    [glow=red,2,300] ' Cntr04% * Cntr05% * Cntr06% unique position -actually this is INCORRECT
                                                                                                                       ' of each cell with RANDOM file[/glow]
              NEXT Cntr06%
          NEXT Cntr05%
       NEXT Cntr04%

    PRINT" Finished WRITING to Random file H:\PBGuy_NN0_060122_1226.DTA"
  CLOSE hFile
END SUB


1)    My NN has a 3D arrray which is large.
The 3D array is composed of 18 layers of 2D arrays.
Each 2D array is 1500 x 1600 cells
This is a total of   67 200 000 cells (62 millions cells). [18 * 1500 * 1600]
The NN is learning, so I need to train it for days and months.   ::)
The Pgr executes  trillions of trillions of data swapping.
This is why any bit of ASM code will help me to save time during the execution of pgr and saving of my
array on my HD.

Because the array is stored in 3D in memory from time to time I have to save this arrays to the hard drive.
Due to electrical storms I have to do that  each time I update any data.

Saving a 3D array to a HD which is in  1D  is a tricky conversion.
How do I convert a 3D array to a 1D array?
Any idea of code?

For example if I save to the cell 2,10, 20 this would different from the cell 2,20,10.
I can  not  just multiply 2 *  10  * 20
I need to use a Gauss sequence   :eek  which I have not managed to find yet. But it does exist..

2)Also  I need to keep the integrity of my array on HD by using some sort of checksum system for each
row and column of the array. :U
So incase of corruption from electrical storms which are frequent in Brisbane Australia, I can rebuild
quickly my array stored on my HD or DVD?

** PB Console Compiler 4 accepts ASM code **

Coral
Brisbane , Australia :boohoo:

gabor

Hi!

NN is a very interesting topic you are coding.  :P Could you say a bit more about the general purpose of this NN? I could not found any info on this on the webpage you linked here...
My first question is, why don't you write all the critical parts in asm, not only the backup, but the calculations too?
Another question is, maybe I don't fully understand your problem: is it not satisfying to store the cells in their natural order:
cell(x,y,l) is saved in 3 nested loops

for l=1 to 18
{
 for x=1 to 1500
 {
  for y=1600
  {
   save cell(x,y,l)
  };
 };
};
I believe if you load the data exactly in the same order, there won't be any problems.


Error protection is a good idea, but why just signal if there were any errors? With some more extra info stored with every data it is possible to restore damaged data...
I guess a way of protection could be if you don't store everything in just one file, you could store them for every layer seperatly.

To your posted code:
It seems to me that you are saving the data one by one. This is slow indeed. You should do the saving in big blocks say, one row (1500 or 1600 cells) at a time. The reason of the crash is mysterious, there is not enough info to discover it.

Believe me, it pays to convert this parts to asm and it is not that hard at all. If you exactly know what you want to do it is very easy to write an asm code. To find the most appropriate algo and after to optimize it is another issue. However here are many helpfull people, so don't worry! :U

I offer my help if you decide to adapt some of your code. (Is it possible to link obj files created by asm into your PB4 projects?)


Greets, Gábor

Coral88

Thanks Gabor
>> why don't you write all the critical parts in asm,
because I don't know enough about the MASM.


The general purpose of the NN  is to discover if undergrond water will dryout.
Email me mentioning this forum;. (i'll make my email visible for a few days) and I'll send you a detail email of what my NN is about.


>> is it not satisfying to store the cells in their natural order:
>>cell(x,y,l) is saved in 3 nested loopswhat the project is al about.

It is nearly impossibleto work on 1 D  with the data because there are nodes and other layers.
I need to work in layers , even nodes are in layers.

>>It seems to me that you are saving the data one by one.
>>This is slow indeed. You should do the saving in big blocks say, one row (1500 or 1600 cells) at a time.

Excellent idea, but I don't know how to do that in PB? ASM !!
I asked once and support PB said it was not possible.

>>Believe me, it pays to convert this parts to asm and it is not that hard at all.
I believe you. I am prepared to have a go.
Actually many years ago I worked a bit with A86 but stopped

>>you exactly know what you want to do it is very easy to write an asm code.
I know exactly what I want and I  how to transform a 3D array into 1D , but algorithm is not "elegant".
I saw from Finland many years ago theproper algorithm but lost it.
It uses sieving technic but the proper way is Gauss derivation.

>> I offer my help if you decide to adapt some of your code. (Is it possible to link obj files created by asm into your PB4 projects?
YES, thank you I may take it for the storage from 3D to 1D.
The only ASM problem could be WIN2000 Pro OS but I'll see when I come to that..

Guy



.




MickD

Hi Coral,
I'm only learning asm myself (so ignore this if it's not right) but I've read that all arrays regardless of dimension are in 1d in memory anyway so you're already there. To access them there is a formula that I'm sure someone here could explain (It's in Paul Carters asm tut) to access the different dimensions of your array data. If you knew the starting address you're set.
hth.
Mick.

sluggy

Hi Coral,
as MickD said, your array is already stored in 1D in memory - an array is just a bunch of contiguous memory locations. You just *access* it in 3D, or 2D, or 4D, or whatever you set it up to be.

So all you have to do is find the starting location of your array, find its length, then write that chunk of memory to file. Using asm will not make this any faster, because there is very little code you have to write, you use APIs to do most of the work.

dioxin

Coral,
   from the PBCC4 help file:
Quote
Binary files:

PUT #fNum&, [RecPos], Arr()
..

Arr()   When PUT is used on a binary file, the entire array specified by Arr() is written to the file.  With dynamic strings, the file is written in the PowerBASIC and/or VB packed string format.  If the string is shorter than 65535 bytes, a 2-byte length Word is followed by the string data.  Otherwise, a 2-byte value of 65535 is followed by a length Double-word (DWORD), then finally the string data.

With other data types, the entire data area is written as a single block.
In either case, it is presumed the file will be read with the complementary GET Array statement.

So who told you this wasn't possible? You can write the whole array in 1 go without any need for a conversion. It'll take under 2 seconds on any modern hard disk so you can simply write your array to disk every hour so you'll never lose more than 60 minutes of calculation in the event of a crash.
2 seconds every hour is  not a lot of time to use on this.
If there is a crash, you just reload the whole array in a similar way with a single GET startement and restart from the last full hour backup.

The smallest diskdrive you can get these days is around 40GB so you can keep 600 backup copies of your array, that's 600 hours worth of back data, before you need to start deleting old copies. A large hard disk, 200GB is readily available, will store 4 months of backup data, probably enough to complete your job.

Paul.


dioxin

Coral,
I just tried it in PBCC4 and to fill your 18x1600x1500 array of LONGs took 8 seconds.
To write the entire file to disk took 3.3 seconds.
It's longer than I thought because I assumed bytes and it's actually LONGs = 4 times as much data.
Paul.

FUNCTION PBMAIN () AS LONG
DIM a&(1 TO 18,1 TO 1600,1 TO 1500)
t1#=TIMER

FOR r& = 1 TO 18
    FOR t& = 1 TO 1600
        FOR y& = 1 TO 1500
         a&(r&,t&,y&)=RND(1,2000000000)
        NEXT
    NEXT
NEXT

PRINT "time to create array="; TIMER-t1#

t1#=TIMER
fn&=FREEFILE
OPEN "c:\junk1" FOR BINARY AS fn&
  PUT fn&,1,a&()
CLOSE fn&

PRINT "time to write entire array to disk=";TIMER-t1#
WAITKEY$

END FUNCTION
               

EduardoS

first, use zero based arrays, they are easy to work :U
In your example, to store the element (x, y, z) in a 1D array (using zero based arrays) put it on x*28800 + y*1600+ z
to get the element n from 1D array and put on 3D array do:
(\ means integer division)
x = n \ 28800
y = (n % 28800) \ 1600
z = n % 1600

Simple and work for any dimension array.

P1

QuotePUT hFile, Cntr04% * Cntr05% * Cntr06%, SglCell
If % stands for single precision, you overflowed your variable precision, that's why you crash at 53MB.  And also is a terrible way to calculate a pointer.

If & is a large integer, that would make for a better record pointer.  And from what I can tell, you should be able to Put the entire array.  And save some time and work. 

Then you can CRC the file with a MASM sub if you like.

Regards,  P1  :8)

Mark_Larson

I have never used PowerBasic.  Is there an option for showing the assembly language it generates?  Most modern C compilers have support for generating assembly lanuage files from your C files.  If you can do that, you can easily see the code that the Powerbasic is generating, and see how optimal it is.

Second, if you are a more modern processor such as a P4 or above, you can use more advanced instruction sets such as SSE2 to possibly speed up your code.   SSE2 can operate on 4 LONG values in parallel.  MMX will work on 2 LONG values in parallel.

  I read on that forum that your "cell" is a LONG between 1 and 18  million.  There are quite a few talented assembly language optimizers on the board.  If you find an easy way to convert your basic code to assembly ( like through a commandline switch), you can probably more easily tempt them into helping out.  As it is, I don't fully understand Powerbasic code.

Mark
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

MickD

Could PowerBasic pass the address of the first element (and the last if required) in the array to an asm dll to work with?

Mark_Larson

This is your "validate" routine.  It calls a random number generator RND().  Random number generators have been optimized to death on the forums.  Search both the old and new forums ( the old forums are searchable from the top right link that says "old forum").


SUB Validate_AR1()
LOCATE 1,0
PRINT" " ' Clears pevious writing to screen
LOCATE 1,0: COLOR 7,0
PRINT " Validating_AR1()with 67 200 000 cells "

CALL Time_A

DIM Cntr04 AS LOCAL INTEGER ' Loop integer for iterating
DIM Cntr05 AS LOCAL INTEGER ' Loop integer for iterating
DIM Cntr06 AS LOCAL INTEGER ' Loop integer for iterating
RANDOMIZE TIMER
FOR Cntr04% = 1 TO 18
FOR Cntr05% = 1 TO 1500
FOR Cntr06% = 1 TO 1600
AR1& (Cntr04%,Cntr05%,Cntr06%) = RND(1, 18000000)
'PRINT AR1& (Cntr04%,Cntr05%,Cntr06%)
NEXT Cntr06%
NEXT Cntr05%
NEXT Cntr04%
END SUB
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm