Saturday, 28 February 2015

Databases SQLite3

It will be SQL week here at NeuralCode. If you are a database ninja please share your dbfu. If you're a noob like the rest of us, join in and we'll work this out together. Databases are neat and cool! Cool because they run lots of data really fast without suffering the sluggish performance of spreadsheets and Neat because fields can be defined very precisely and strictly to maintain data accuracy. A database can have many tables that relate to one another and store relevant information together, that is still available to other tables and to SQL queries. Databases can also store much more than text and numeric data. Images, video, audio, whole documents, and references, can all be stored along with the information about them. It is this feature that i really appealing when you are handling lots of complex data related to an experiment. It might just keep your information organised in a manner that will make it useful to others and still useful to you in five years time. Databases will save you when that excel file starts to slow down (each scroll down takes a few seconds) or is so complicated that no-one can remember what parts of the 15 sheets is involved in calculating the values for the 16th. Phil has some databases of word frequency from different sources. They are already sluggish (150,000 entries on the one excel file we were playing with on friday). There is some redundancy and even a little error. The aim is to combine all the information into a database and extract the information relevant to the word list Phil will use in his upcoming experiment.
If you are using a mac - open a terminal window and type
you will start sqlite3 on the command line. We will progress toward using sqlite3 from within the ipython notebook but now it might be easier to understand in this simple format.

There are two types of commands I'm going to show you. Some start with a
this is how you can give instructions to the database engine, the rest of the commands are in the SQL language and that is how we create and query the information in our database.
Right now, if you are at the sqlite prompt, you can prettify the output by entering the following
.headers on 
.mode columns
This will cause the output to include headers and align the output in columns. Much nicer and easier to read.

The CREATE TABLE statement is used to create a table in a database.
Tables are organized into rows and columns; and each table must have a name.
CREATE TABLE table_name
column_name1 data_type(size),
column_name2 data_type(size),
column_name3 data_type(size),
How to write output of an SQL query to a csv
sqlite> .mode csv
sqlite> .separator ,
sqlite> .output test_file_1.csv
sqlite> select * from tbl2;
sqlite> .exit
Psycholinguistic_databases Wintermute$ cat test_file_1.csv
Psycholinguistic_databases Wintermute$

import sqlite3
conn = sqlite3.connect("mydatabase.db") # or use :memory: to put it in RAM
cursor = conn.cursor()
# create a table
cursor.execute("""CREATE TABLE albums
                  (title text, artist text, release_date text, 
                   publisher text, media_type text)  """) 

# insert some data
cursor.execute("INSERT INTO albums VALUES ('Glow', 'Andy Hunter', '7/24/2012', 'Xplore Records', 'MP3')")
# save data to database
# insert multiple records using the more secure "?" method
albums = [('Exodus', 'Andy Hunter', '7/9/2002', 'Sparrow Records', 'CD'),
          ('Until We Have Faces', 'Red', '2/1/2011', 'Essential Records', 'CD'),
          ('The End is Where We Begin', 'Thousand Foot Krutch', '4/17/2012', 'TFKmusic', 'CD'),
          ('The Good Life', 'Trip Lee', '4/10/2012', 'Reach Records', 'CD')]
cursor.executemany("INSERT INTO albums VALUES (?,?,?,?,?)", albums)

import sqlite3
conn = sqlite3.connect("mydatabase.db")
cursor = conn.cursor()
sql = """
UPDATE albums 
SET artist = 'John Doe' 
WHERE artist = 'Andy Hunter'

import sqlite3
conn = sqlite3.connect("mydatabase.db")
cursor = conn.cursor()
sql = """
WHERE artist = 'John Doe'

import sqlite3
conn = sqlite3.connect("mydatabase.db")
#conn.row_factory = sqlite3.Row
cursor = conn.cursor()
sql = "SELECT * FROM albums WHERE artist=?"
cursor.execute(sql, [("Red")])
print cursor.fetchall()  # or use fetchone()
print "\nHere's a listing of all the records in the table:\n"
for row in cursor.execute("SELECT rowid, * FROM albums ORDER BY artist"):
    print row
print "\nResults from a LIKE query:\n"
sql = """
SELECT * FROM albums 
WHERE title LIKE 'The%'"""
print cursor.fetchall()

How do i import a xls file data into sqlite3

The table must exist first. i can create the blp-items table with

CREATE TABLE blp_items
(spelling TEXT, 
lexicality TEXT(1),  
rt REAL,
zscore REAL, 
accuracy REAL, 
rt_sd REAL, 
zscore_sd REAL,
accuracy_sd REAL
use the .import command
.separator ","
.import blp-items.csv blp_items
the .import function will import the headers from the csv file

SELECT * FROM blp_items LIMIT 10;

spelling    lexicality  rt          zscore      accuracy    rt_sd       zscore_sd   accuracy_sd
----------  ----------  ----------  ----------  ----------  ----------  ----------  -----------
spelling    lexicality  rt          zscore      accuracy
a/c         N           668.036363  0.32196203  0.76923076  180.088554  1.09726646  0.424052095
aband       N           706.5       0.48159747  0.925       170.172149  0.86149330  0.266746782
abayed      N           689.935483  0.15776501  0.86842105  281.050165  1.02394376  0.342569987
abbear      N           560.583333  -0.3614142  0.95        121.360472  0.52986732  0.220721427
abbears     N           611.410256  -0.1372765  0.975       169.857946  0.89889305  0.158113883
abbens      N           598.289473  -0.3845125  1.0         144.779964  0.59991825  0.0        
abchaim     N           591.763157  -0.1323686  1.0         128.560940  0.65425409  0.0        
abects      N           635.205128  -0.0324051  0.975       155.666002  0.64429156  0.158113883
abeme       N           591.055555  -0.2430494  0.94736842  182.908243  0.93264000  0.226294285

we can delete this line by

DELETE FROM blp_items WHERE lexicality='lexicality';

spelling    lexicality  rt              zscore        accuracy      rt_sd           zscore_sd     accuracy_sd 
----------  ----------  --------------  ------------  ------------  --------------  ------------  ------------
a/c         N           668.0363636364  0.3219620388  0.7692307692  180.0885547226  1.0972664605  0.4240520956
aband       N           706.5           0.4815974773  0.925         170.1721499131  0.8614933071  0.2667467828
abayed      N           689.935483871   0.1577650108  0.8684210526  281.0501658048  1.0239437649  0.3425699875
abbear      N           560.5833333333  -0.361414284  0.95          121.3604725012  0.5298673287  0.2207214279
abbears     N           611.4102564103  -0.137276547  0.975         169.8579465239  0.8988930562  0.158113883 
abbens      N           598.2894736842  -0.384512528  1.0           144.7799649972  0.5999182553  0.0         
abchaim     N           591.7631578947  -0.132368665  1.0           128.5609402685  0.6542540915  0.0         
abects      N           635.2051282051  -0.032405186  0.975         155.6660020156  0.6442915682  0.158113883 
abeme       N           591.0555555556  -0.243049439  0.9473684211  182.9082431079  0.9326400097  0.2262942859
abents      N           690.2564102564  0.4099671341  0.975         207.919356577   0.8366905183  0.1581138
and then dump the sql output

.output blp_items.sql

.dump blp_items