• Welcome to SQLitening Support Forum.
 

News:

Welcome to the SQLitening support forums!

Main Menu

Data Types In SQLite.

Started by Fred Meier, October 02, 2008, 06:07:23 PM

Previous topic - Next topic

Fred Meier

I find the way this is handled in SQLite to be one of the great features. 
I normally create columns with no data type, which will result in an
affinity of none, by using the following create syntax:
   Create Table T1 (C1, C2, C3)I have found no advantage in assigning column affinity using the following:
   Create Table T1 (C1 Integer, C2 Text, C3 Real)

The below was taken from http://www.sqlite.org/datatype3.html   
1. Storage Classes
   Each value stored in an SQLite database has one of the following storage classes:
   
      NULL. The value is a NULL value.
      INTEGER. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
   
      REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
   
      TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16-LE).
   
      BLOB. The value is a blob of data, stored exactly as it was input.
   
   Any column except an INTEGER PRIMARY KEY may be used to store any type of
   value.  The exception to this rule is 'Strict Affinity Mode'. 
   
   Storage classes are initially assigned as follows:
   
   Values specified as literals as part of SQL statements are assigned
   storage class TEXT if they are enclosed by single or double quotes,
   INTEGER if the literal is specified as an unquoted number with no decimal
   point or exponent, REAL if the literal is an unquoted number with a
   decimal point or exponent and NULL if the value is a NULL.  Literals with
   storage class BLOB are specified using the X'ABCD' notation. 
   
   Values supplied using the sqlite3_bind_* APIs are assigned the storage
   class that is requested.

2. Column Affinity

   The type of a value is associated with the value itself, not with the
   column or variable in which the value is stored.  (This is sometimes
   called manifest typing.) All other SQL databases engines that we are aware
   of use the more restrictive system of static typing where the type is
   associated with the container, not the value. 
   
   SQLite support the concept of "type affinity" on columns.  The
   type affinity of a column is the recommended type for data stored in that
   column.  The key here is that the type is recommended, not required.  Any
   column can still store any type of data, in theory.  It is just that some
   columns, given the choice, will prefer to use one storage class over
   another.  The preferred storage class for a column is called its
   "affinity". 
   
   Each column in SQLite is assigned one of the following type affinities:
   
      TEXT
      NUMERIC
      INTEGER
      NONE
   
   A column with TEXT affinity stores all data using storage classes NULL,
   TEXT or BLOB.  If numerical data is inserted into a column with TEXT
   affinity it is converted to text form before being stored. 
   
   A column with NUMERIC affinity may contain values using all five storage
   classes.  When text data is inserted into a NUMERIC column, an attempt is
   made to convert it to an integer or real number before it is stored.  If
   the conversion is successful, then the value is stored using the INTEGER
   or REAL storage class.  If the conversion cannot be performed the value is
   stored using the TEXT storage class.  No attempt is made to convert NULL
   or blob values. 
   
   A column that uses INTEGER affinity behaves in the same way as a column
   with NUMERIC affinity, except that if a real value with no floating point
   component (or text value that converts to such) is inserted it is
   converted to an integer and stored using the INTEGER storage class. 
   
   A column with affinity NONE does not prefer one storage class over
   another.  It makes no attempt to coerce data before it is inserted. 

3. Determination Of Column Affinity

   The type affinity of a column is determined by the declared type of the
   column when the table is created, according to the following rules:
   
   If the datatype contains the string "INT" then it is assigned INTEGER affinity.
   
   If the datatype of the column contains any of the strings "CHAR", "CLOB",
   or "TEXT" then that column has TEXT affinity.  Notice that the type
   VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity. 
   
   If the datatype for a column contains the string "BLOB" or if no datatype
   is specified then the column has affinity NONE. 
   
   Otherwise, the affinity is NUMERIC.


Matt Humphreys

How does one enforce 'Strict Affinity Mode' option?
SQLite docs mention it but does anyone know if it has been implemented yet?

Fred Meier

It is not currently available. I just assumed there was a Pragma to set it.
Below are excerpts from the following link
http://www.mail-archive.com/sqlite-users@sqlite.org/msg31195.html

QuoteOn Feb 2, 2008, at 7:57 PM, Scott Chapman wrote:
I've looked high and low and can't find a way to invoke the other 2
affinity modes.  Are they available? I'm on 3.5.4.
QuoteD. Richard Hipp (Mr SQLite) reply
The concept of "strict" affinity mode was briefly discussed years
ago, but we never implemented it, having never seen any benefit
for such a thing.  Can you explain why you think strict affinity mode
might be beneficial to you?  If somebody can provide a good
enough rational to justify strict affinity mode, we might just put it
in.


mikedoty

#3
If a numeric value were accidently insert using quotes it sounds like the
numeric storage optimization might be missed.  Also, couldn't this add
to problems adding a column if a numeric was considered a string?
Also, if a numeric was considered a string sorting might also be wrong.
I'll have to do some testing without assigning as INTEGER to see if
the results come up the same if a numeric is accidently enclosed within quotes.
It sounds like a good check on numeric values.

Storage classes are initially assigned as follows:
   
   Values specified as literals as part of SQL statements are assigned
   storage class TEXT if they are enclosed by single or double quotes,
   INTEGER if the literal is specified as an unquoted number with no decimal
   point or exponent, REAL if the literal is an unquoted number with a
   decimal point or exponent and NULL if the value is a NULL.  Literals with
   storage class BLOB are specified using the X'ABCD' notation. 

Fred Meier

Using the following snippet:

   slExe "Create table T1(F1)"
   slExe "Insert into t1 values (10)
   slExe "Insert into t1 values ('11')
   slExe "Insert into t1 values (12)


"Select Sum(F1)" work correctly returning 33.
"Select F1 From T1 Order By F1" sorts numeric before text so result would be 10,12,11.
Of course within your PB code, all values are returned as strings using the Get Field routines so you can not tell (nor do I think you care) if it was stored as numeric or text.



mikedoty

I will be writing millions of 2-byte binary fields so it does make a difference.
The data should come back 10,11,12 if numeric.

mikedoty

#6
SELECT F1 FROM T1 ORDER BY F1

The values come back 10,11,12 if the the field was created F1 INTEGER.
This is very good news. 

Later note:
Numerics sort correctly if quoted or not quoted with F1 INTEGER.

Fred Meier

I don't believe I said that the order resulting in 10,12,11 does not mater.

mikedoty

#8
lol. 
This is fantastic because I store money as whole numbers and use
julian dates.  To me a penny is 1 and a dollar is 100.
These would have gobbled up disk space and would
not have sorted correctly.  I'm glad you brought up this subject!

Haven't tried this yet, but this should hopefully work:
List of people sorted by balance +Lastname + FirstName.

mikedoty

#9
Sorting works correctly with  numbers as INTEGER and strings:
SELECT BALANCE, LASTNAME, FIRSTNAME,ROWID FROM T1 ORDER BY BALANCE,LASTNAME,FIRSTNAME;


DROP TABLE IF EXISTS T1;
CREATE TABLE IF NOT EXISTS  T1 (FIRSTNAME,LASTNAME, BALANCE INTEGER);
INSERT INTO T1 VALUES  ('BURT','ZORRO',6);
INSERT INTO T1 VALUES  ('FRED','MEIER',6);
INSERT INTO T1 VALUES  ('MIKE','DOTY','6');
INSERT INTO T1 VALUES  ('HEIDI','KLUME','6');
INSERT INTO T1 VALUES  ('FRED','MEIER',6);
INSERT INTO T1 VALUES  ('MIKE','DOTY', 1);
INSERT INTO T1 VALUES  ('PAUL','SQUIRES',6);
INSERT INTO T1 VALUES  ('PAUL','SQUIRES',1);
INSERT INTO T1 VALUES  ('BURT','ZORRO',1);
INSERT INTO T1 VALUES  ('AL','ZORRO',1);
INSERT INTO T1 VALUES  ('GEORGE','BUSH',1);
INSERT INTO T1 VALUES  ('BARRACK', 'OBAMA',1);
INSERT INTO T1 VALUES  ('HILLARY', 'CLINTON',1);
INSERT INTO T1 VALUES  ('FRED','MEIER',1);

Bern Ertl

Quote from: Fred Meier on October 04, 2008, 06:57:50 PM... Of course within your PB code, all values are returned as strings using the Get Field routines so you can not tell (nor do I think you care) if it was stored as numeric or text.

I have a table that consists entirely of numeric fields (LONGs and a QUAD).  It would be nice if there was a way to retrieve the values directly as numerics and not have to use VAL() to convert strings with every read.  It seems like an unnecessary performance penalty when processing millions of of values.

Fred Meier

SQLite3.Dll only returns strings.  So a val() must occure if you need a Long or Quad.  There would be no advantage for SQLitening to do the val(). IMO it would only complicate SQLitening.

Bern Ertl


Fred Meier

#13
I was not correct in saying SQLite3.Dll only returns strings.  It can
return 4 and 8 byte integers as well as 8 byte floats.  Many additional
routines could have been added to SQLitening.Dll to do this (there would
be some problems with the 8 byte integer) but it was decided that
returning only strings was the simplest.

Using only strings as output is consistent with the fact that you can
only use strings as input in the Insert and Update statements. 

I ran a test for val() speed.  Converting "12345" a million times took
.07190208 seconds while converting "120345.6789" a million times took
.15760492 seconds and my PC is nothing special. 

QuoteI have a table that consists entirely of numeric fields (LONGs and a QUAD).
If I may assume that this table has many columns, then I would consider
defining just two columns, one for all the longs and another for all the
quads.  Use the powerful ability of PB's absolute arrays to read and
maintain them in memory and then use slExeBind to update them. 


Bern Ertl

Quote from: Fred Meier on January 29, 2009, 11:31:42 AM
I was not correct in saying SQLite3.Dll only returns strings.  It can
return 4 and 8 byte integers as well as 8 byte floats.  Many additional
routines could have been added to SQLitening.Dll to do this (there would
be some problems with the 8 byte integer) but it was decided that
returning only strings was the simplest.

My calculation matrix routine will need to process 6-8 LONG arrays that could contain 5-6 figure elements (xx,xxx to xxx,xxx elements) each.  It's going to be a server side process though, so I will be accessing the SQLite3.dll directly in a SQLiteningProcsX.DLL routine.  Good to know there might be an opportunity to improve performance there.

Perhaps you could start a new topic and expand a bit on the slExeBind idea?  Is there an example already posted somewhere?