User Tools

Site Tools


newdevs:code:import_csv_into_db

This is an old revision of the document!


Import Data from a CSV file into database tables

The Concerto database isn't very large, and sometimes you need to test things with more data. If you can pull and anonymize production-level data (but not too much to crash your test server), you can copy it into place with a CSV file.

Finessing the Data Prior to Import

Notes:

  • If the table you're wanting to insert into uses an org, you'll want to edit your CSV file first to use the org IDs from Concerto.
  • Don't include the column for ID from your production data.
  • If there are duplicate key constraints, check for those in your CSV *first* before you create your tables. For example, acq.provider has a duplicate key value unique constraint of "provider_name_once_per_owner". If you add a column in your CSV for dupe checking, you can use a formula to count how many times it appears and thus find dupes.
=COUNTIFS($A$2:$A$1000,A2,$B$2:$B$1000,B2)

This assumes that Column A is the no-dupe column (name, in this case) and Column B is the org. So basically, count the number of times that the name in A2 appears with the same org that's in B2. Then if we fill that down the entire column, if you get anything above a 1 you have a dupe that you can filter for and remove.

TIP: If you're not sure if your table includes a uniqueness constraint, here's a grep of Evergreen for the word "CONSTRAINT" which can point you toward where you might encounter a uniqueness constraint on the table you're working on: https://git.evergreen-ils.org/?p=Evergreen.git&a=search&h=HEAD&st=grep&s=CONSTRAINT

Importing the Data

1. Log into the database as the root user.

2. Create a placeholder schema for your project.

CREATE SCHEMA tlittle;

3. Create a placeholder table for your data to start in, and name the columns–including their datatype–for your table.

create table tlittle.acqprolist 
(name text,
owner integer,
currency_type text,
code text, 
active boolean,
prepayment_required boolean,
default_copy_count integer);

Make sure to include, at minimum, any required fields in your final table. This example is based on acq.provider, so these are the required fields in acq.provider. Each name of the column is followed by its datatype, e.g. prepayment_required (the column name) boolean (the datatype).

4. Now we're going to populate our starter table from our CSV file. Make sure your CSV file is on the server somewhere you know. In this case, my file is in /home/tlittle and the filename is pines_acqprolist.csv.

Note: If you do not have a header in your csv file, don't include the "header" at the last part of this code.

copy tlittle.acqprolist from '/home/tlittle/pines_acqprolist.csv' csv header;

Then if you do a select * from tlittle.acqprolist you'll see it's now populated with the data from our CSV file.

5. To get that data into our Evergreen table, we need to copy it over.

insert into acq.provider (name,
owner,
currency_type,
code,
active,
prepayment_required,
default_copy_count) select 
name,
owner,
currency_type,
code,
active,
prepayment_required,
default_copy_count from tlittle.acqprolist;

So the second half is the SELECT–select all these from tlittle.acqprolist–which we then INSERT with the first portion.

If it's successful, you'll get the INSERT 0 [NUMBER OF ROWS] message. You can then do a select * from [YOUR TABLE] to see all your new data.

newdevs/code/import_csv_into_db.1677522665.txt.gz · Last modified: 2023/02/27 13:31 by tlittle

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki

© 2008-2022 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later.
The Evergreen Project is a U.S. 501(c)3 non-profit organization.