User Tools

Site Tools


evergreen-admin:importing:bibrecords




NOTE: This are older instructions for importing MARC records into Evergreen. If you are trying out the 2.3 or newer release of Evergreen the docs would be the place to start














Importing bibliographic records

Several tools for importing bibliographic records into Evergreen can be found in ILS/Open-ILS/src/extras/import/. We explain the purpose of these tools first, and follow with an end-to-end example of importing the Project Gutenberg MARC records for electronic books.

A complete sample of scripts, instructions, and sample data for importing bib records and holdings can be found on the Evergreen downloads page.

Introducing the import tools

First, we'll introduce you to the import tools. Further down the page there are examples of how to use the tools to import bibliographic records into Evergreen.

Converting MARC records to Evergreen BRE JSON format

If you are starting with MARC records from your existing system or another source, use the marc2bre.pl script to create the JSON representation of a bibliographic record entry (hence bre) in Evergreen. marc2bre.pl can perform the following functions:

  • Converts MARC-8 encoded records to UTF-8 encoding
  • Converts MARC21 to MARCXML21
  • Select the unique record number field (common choices are '035' or '001'; check your records as you might be surprised how a supposedly unique field actually has duplicates, though marc2bre.pl will select a unique identifier for subsequent duplicates)
  • Extracts certain pertinent fields indexing and display purposes (along with the complete MARCXML21 record)
  • Sets the ID number of the first record from this batch to be imported into the biblio.record_entry table (hint - run the following SQL to determine what this number should be to avoid conflicts:
    psql -U postgres evergreen
    # SELECT MAX(id)+1 FROM biblio.record_entry;
    • If you are processing multiple sets of MARC records with marc2bre.pl before loading the records into the database, you will need to keep track of the starting ID number for each subsequent batch of records that you are importing. For example, if you are processing three files of MARC records with 10000 records each into a clean database, you would use –startid 1, –startid 10001, and –startid 20001 parameters for each respective file.
  • Ignore ('trash') fields that you do not want to retain in Evergreen

Note that if you use marc2bre.pl to convert your MARC records from the MARC-8 encoding to the UTF-8 encoding, it relies on the MARC::Charset Perl module to complete the conversion. When importing a large set of items, you can speed up the process by using a utility like marc4j or marcdumper to convert the records to MARC21XML and UTF-8 before running them through marc2bre.pl with the –marctype=XML flag to tell marc2bre.pl that the records are already in MARC21XML format with the UTF-8 encoding. If you take this approach, due to a current limitation of MARC::File::XML you have to do a horrible thing and ensure that there are no namespace prefixes in front of the element names. marc2bre.pl cannot parse the following example:

<?xml version="1.0" encoding="UTF-8" ?>
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://www.loc.gov/MARC/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
  <marc:record>
    <marc:leader>00677nam a2200193 a 4500</marc:leader>
    <marc:controlfield tag="001">H01-0000844</marc:controlfield>
    <marc:controlfield tag="007">t </marc:controlfield>
    <marc:controlfield tag="008">060420s1950    xx            000 u fre d</marc:controlfield>
    <marc:datafield tag="040" ind1=" " ind2=" ">
      <marc:subfield code="a">CaOHCU</marc:subfield>
      <marc:subfield code="b">fre</marc:subfield>
    </marc:datafield>
...

But marc2bre.pl can parse the same example with the namespace prefixes removed:

<?xml version="1.0" encoding="UTF-8" ?>
<collection xmlns:marc="http://www.loc.gov/MARC21/slim" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://www.loc.gov/MARC/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
  <record>
    <leader>00677nam a2200193 a 4500</leader>
    <controlfield tag="001">H01-0000844</controlfield>
    <controlfield tag="007">t </controlfield>
    <controlfield tag="008">060420s1950    xx            000 u fre d</controlfield>
    <datafield tag="040" ind1=" " ind2=" ">
      <subfield code="a">CaOHCU</subfield>
      <subfield code="b">fre</subfield>
    </datafield>
...

Converting Evergreen BRE JSON format to Open-ILS JSON ingest format

Once you have your records in Evergreen's BRE JSON format, you then need to use direct_ingest.pl to convert the records into the generic ingest JSON format for Open-ILS. This step uses the open-ils.ingest application to extract the data that will be indexed in the database.

Converting Open-ILS JSON ingest format to PostgreSQL SQL

Once you have your records in Open-ILS JSON ingest format, you then need to use pg_loader.pl to convert these records into a set of SQL statements that you can use to load the records into PostgreSQL. The –order and –autoprimary command line options (bre, mrd, mfr, etc) map to class IDs defined in the IDL file (http://open-ils.org/cgi-bin/viewcvs.cgi/ILS/Open-ILS/examples/fm_IDL.xml).

Adding metarecords to the database

One you have loaded the records into PostgreSQL, you can create metarecord entries in the metabib.metarecords table by running the following SQL:

psql evergreen
# \i Evergreen/src/extras/import/quick_metarecord_map.sql

Metarecords are required to place holds on items, among other actions.

Adding copies to bibliographic records in Evergreen

Once you've loaded the bibliographic records in Evergreen, you can search and view the records in the staff client, but they will not be visible in the catalogue. By default, bibliographic records will not be visible in the catalogue until you add a copy representing a physical manifestation of that resource. You can add a copy manually through the staff client via the Holdings maintenance screen, but if you're bulk-importing MARC records you probably want to bulk load the associated copies, call numbers, and barcodes as well.

Importing volumes and copies from MARC21XML holdings

There is currently no simple method for importing holdings based on the contents of the MARC holdings field (852, as specified by http://www.loc.gov/marc/holdings/). However, a more or less automated method could be built that performs the following steps:

  1. Create a tab-delimited file that contains your holdings information
  2. Create a staging table that matches the contents of your tab-delimited file.
  3. Insert the contents of your tab-delimited file into the table.
  4. Modify existing SQL scripts (existing where???) for item import to match the staging table that you just built.
  5. Run the SQL scripts to create the holdings in Evergreen.

If an ILS has the concept of "item categories", these may be mapped to Evergreen via statistical categories in the asset.stat_cat table. Note that statistical categories cannot be used as search filters; individual branches can define their own statistical categories; and define their own statistical category entries for individual items - best use case for statistical categories is probably for gifts.

Mike offered a basic example of a staging table import here: import via staging table

In 2009, Conifer placed their migration tools in the Conifer ILS-Contrib SVN repository, which might be useful samples augmenting the basic staging table import approach.

In 2010, Equinox contributed a set of migration utilities

Making electronic resources visible in the catalogue

For electronic resources that should be visible in the catalogue without any copies, you must set the source column value in the record.biblio_entry row for the respective bibliographic record to a value that matches the corresponding ID of the config.bib_source where the transcendant value is TRUE. Here's a practical example:

  1. Connect to your Evergreen database with psql (substitute username / database name as required):
    psql -U postgres evergreen
  2. Add a source for your electronic resources:
    # INSERT INTO config.bib_source(quality, source, transcendant) VALUES (50, 'Institutional repository', TRUE);
  3. Find the ID that was generated for your new source:
    # SELECT ID FROM config.bib_source WHERE source = 'Institutional repository';
  4. Update the source column for your bibliographic record for the electronic resource (for the sake of the example, let's assume that the ID returned from the new source was 4, and that we know that the bib record with ID 75 is an electronic resource from your institutional repository):
    # UPDATE biblio.record_entry SET source = 4 where biblio.record_entry.id=75;

That's all there is to it! :)

==FIXME== OUT OF DATE

Example: Importing the Project Gutenberg records

In this example, we will use all of the import tools that we previously described to load a set of MARC records into Evergreen.

The Project Gutenberg records are available in MARC format to enable access to all of their electronic books. It makes for a splendid set of data to load in Evergreen. To follow along, you can download the Project Gutenberg MARC records in .zip or .bz2 format.

Please note: command lines that end in \ are meant to be continued by the next line.

  1. perl marc2bre.pl --db_user postgres --db_host localhost --db_pw password \
     --db_name evergreen gutenberg.marc > ~/gutenberg.bre

    If, while running this command, you receive the error Can't locate object method "ignore_errors" via package "MARC::Charset" at marc2bre.pl line 27., you need to update the MARC::Charset Perl module to the latest version (1.0 at this point in time).

  2. perl direct_ingest.pl gutenberg.bre > ~/gutenberg.ingest
  3. perl pg_loader.pl -or bre -or mrd -or mfr -or mtfe -or mafe -or msfe -or mkfe -or msefe \
    -a mrd -a mfr -a mtfe -a mafe -a msfe -a mkfe -a msefe --output=gutenberg < ~/gutenberg.ingest

    This step will take a while to order the output properly (all those -or options) to avoid missing foreign keys before it actually dumps any content into gutenberg.sql - be patient :)

  4. psql -U postgres evergreen
    # \i ~/gutenberg.sql

    This should result in 14449 records being loaded into the biblio.record_entry table.

  5. psql -U postgres evergreen
    # \i Evergreen/src/extras/import/quick_metarecord_map.sql

    This creates the metarecord entries in the database that are necessary for placing holds and grouping volumes and editions. Of course, with the Gutenberg records there are no volumes, copies, ISBNs, or other metadata, so these features will not make any visible difference with this data set.

  6. Now, if you want to make these records visible in the Evergreen catalogue, you can set the value of their source column to 3 to tell Evergreen that these are Project Gutenberg records and need to be visible as online resources without having a copy attached:
    psql -U postgres evergreen
    # UPDATE biblio.record_entry SET source = 3;

The '3' is valid because Gutenberg is a source distributed in the default tables. To verify, use

psql -U postgres evergreen
# select * from config.bib_source; 

Restoring your Evergreen database to an empty state

If you've done a test import of records and you want to quickly get Evergreen back to a pristine state, you can create a clean Evergreen database schema by performing the following:

  1. cd ILS/Open-ILS/src/sql/Pg/
  2. Rebuild the database schema:
    ./build-db.sh <db-hostname> <db-port> <db-name> <db-user> <db-password> <db-version>

For example:

cd ILS/Open-ILS/src/sql/Pg/
./build-db.sh localhost 5432 evergreen postgres evergreen 82

There will be some warnings and error messages… ignore them.

evergreen-admin/importing/bibrecords.txt · Last modified: 2022/02/10 13:34 by 127.0.0.1

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki

© 2008-2022 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later.
The Evergreen Project is a U.S. 501(c)3 non-profit organization.