• Welcome, Guest. Please login.
 
May 07, 2021, 11:29:06 PM

News:

Welcome to the SQLitening support forums!


A BIG thank you

Started by Peter Presland, December 11, 2008, 04:27:15 PM

Previous topic - Next topic

Peter Presland

Joe may recall suggesting I look at SQLitening in reply to a posting of mine on Chris Boss's site early last month. I'm very glad I did.

Briefly, I have hundreds of separate text files collected over several years containing financial time series tick data. Each one contains between 2 and 15 million records. My need is to parse these files into a form that is usable by my existing analytical software. I have made good progress in learning PB9 and had produced working prototypes of what I need. Apart from the PB9 learning curve, my biggest problem turned out to be processing time.

I started by using Input and Write statement to read from a text file, then following minor conditional manipulation, write the result to a separate output text file. A typical 5 million record file took just over 40 minutes to process. So I tried SQLitening - another leaning curve followed by serious disappointment. The same file took well over an hour to process.

Then I stumbled upon Pat Dooley's  'Slow insert speed' thread and BINGO !!! - it now takes just 3 minutes to process the same file, so I am one happy bunny.

It's still early days (for a rank beginner) but thanks a bunch to Joe for the steer and to Fred and whoever for a cracking product.

Fred Meier

December 12, 2008, 11:48:50 AM #1 Last Edit: December 12, 2008, 11:56:50 AM by Fred Meier
For even slightly faster original bulk loads do slExe "pragma journal_mode=off" before the "Begin" transaction.  Check out all the pragma available settings at http://www.sqlite.org/pragma.html .  Setting page_size to 4096 is a good one.

Peter Presland

Thanks Fred, I'll check it out. I was going to have a good look at the pragmas anyway because, with the defaults, I can only get it to import text data that is delimited with single quotes in addition to the commas. Whereas all my text files are simply comma delimited with no quotes (double or single). I've constructed the import statement to add the single quotes OK but would maybe gain a bit more speed if they weren't needed. Worth investigating anyway.

Is there anything obvious I could do in that respect?

Fred Meier

December 12, 2008, 06:36:45 PM #3 Last Edit: December 12, 2008, 06:38:40 PM by Fred Meier
I don't understand what you mean "with the defaults, I can only get it to import text data that is delimited with single quotes in addition to the commas." 
I would just read each record in the text file using Line Input# and then build and execute a SQL Insert command, is that what your are doing?  Nothin I know of would be any faster (maybe using the Parse command to load entire text file into an array).  Maybe you can post some sample code.

Peter Presland

Fred

That last question about single quotes was a manifestation of beginners ignorance. I thought it may have been possible to dispense with the single quotes completely. However, even if it were, it would only make the INSERT statement a bit shorter and simpler to construct and have little if any effect on execution speed. So something else learned.

I am currently reading each line with INPUT# (rather than LINE INPUT#) then building the SQL INSERT statement from the resulting array. It is a legacy of the code I used to write my manipulated input file to a sequential output file with WRITE#.

Haven't Checked out the pragma adjustments yet but will let you know.

BTW further to my SQLite Expert Pro observations, I reported a bug to them (The start button on the file Import dialog was not disabled during the import process making it possible for a second click to be buffered and actioned on completion etc etc). They've already tweeked the import speed on the latest version improving it by a factor of about 4 and offered me a discount for the bug report so I'll be purchasing the Pro version.


Peter Presland

Tried the pragma tweeks but they didn't make much difference to the import speed. Anyway, at just under 3 minutes to process 5 million records, I'm happy with things as they are.