Evergreen DokuWiki

Summary Report for Evergreen Performance Evaluation

OmniTI performed an Evergreen software evaluation on Evergreen this past fall on behalf of the Massachusetts Library Network Cooperative. The performance evaluation evaluated the Evergreen database, staff client and messaging framework. The study was mostly funded with a Library Services and Technology Act grant authorized through the Massachusetts Board of Library Commissioners along with some funds from MassLNC.

OmniTI release its report in December 2013. masslnc_evergreen_-_summary_of_recommendations.pdf

Following the release of the report, MassLNC had asked OmniTI to follow up on two additional questions that are outlined below with OmniTI's responses.

1. In the RFP, MassLNC had asked if there were ways to restore settings stored in the database to improve the relevancy of results retrieved through the OPAC without a serious cost to performance. https://bugs.launchpad.net/evergreen/+bug/844374 OmniTI shared the following thoughts with us:

The first was in looking at the bib search query, which passes the following in as the third argument to the search.query_parser_fts(int, int, text) function. The part that stood out with the query were sequential scans performed on both metabib.metarecord_source_map and metabib.record_attr.

The query planner/explain for that query passed into the function was:

evergreen=# explain SELECT
            m.source AS id,
            ARRAY [ m.source ] AS records,
            1.0 / (
                AVG (
                    (
                        1 ) )
                + 1 ) ::NUMERIC AS rel,
            1.0 / (
                AVG (
                    (
                        1 ) )
                + 1 ) ::NUMERIC AS rank,
            FIRST (
                mrd.attrs -> 'date1' ) AS tie_break
        FROM
            metabib.metarecord_source_map m
        INNER JOIN metabib.record_attr mrd ON (
                m.source = mrd.id )
        WHERE
            1 = 1
            AND (
                TRUE )
        GROUP BY
            1
        ORDER BY
            4 ASC NULLS LAST,
            5 DESC NULLS LAST,
            3 DESC
        LIMIT 10000 ;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                  QUERY PLAN                                                   │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=1159860.83..1159885.83 rows=10000 width=533)                                                     │
│   ->  Sort  (cost=1159860.83..1165766.53 rows=2362279 width=533)                                              │
│         Sort Key: ((1.0 / (avg(1) + 1::numeric))), (first((mrd.attrs -> 'date1'::text)))                      │
│         ->  HashAggregate  (cost=932046.04..991103.01 rows=2362279 width=533)                                 │
│               ->  Hash Join  (cost=68199.58..317853.50 rows=2362279 width=533)                                │
│                     Hash Cond: (mrd.id = m.source)                                                            │
│                     ->  Seq Scan on record_attr mrd  (cost=0.00..193549.79 rows=2362279 width=533)            │
│                     ->  Hash  (cost=38670.37..38670.37 rows=2362337 width=8)                                  │
│                           ->  Seq Scan on metarecord_source_map m  (cost=0.00..38670.37 rows=2362337 width=8) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Sequential scans on small tables (fully resident in memory) aren't really a red flag. Those two tables aren't exactly small, however. That may be an indication that the way they're being joined has some room for improvement. That could either be through more appropriate indexing or creating something along the lines of a rollup table, where the necessary join of the tables are pre-computed and stored in a separate physical table.

2. The second question related to a long-running query found in the logs of one of our production database that wasn't identified as a problem query in the original report. The production system was running 2.3 at the time, and the query in question was:

SELECT
    "ahr".id AS "id",
    "ahr".cut_in_line AS "cut_in_line",
    "ahr".request_time AS "request_time"
FROM
    ACTION.hold_request AS "ahr"
INNER
    JOIN ACTION.hold_copy_map AS "ahcm" ON (
        "ahcm".HOLD = "ahr".id )
INNER
    JOIN ACTION.hold_copy_map AS "ahcm2" ON (
        "ahcm2".target_copy = "ahcm".target_copy )
WHERE (
    "ahcm2".HOLD = 0 )
GROUP BY
    0,
    0,
    0
ORDER BY
    COALESCE (
        "ahr".cut_in_line,
        '' )
    DESC,
    "ahr".request_time;

OmniTI's recommendation:

SELECT "ahr".id AS "id", "ahr".cut_in_line AS "cut_in_line", "ahr".request_time AS "request_time"
   FROM ACTION.hold_request AS "ahr"
    INNER JOIN ACTION.hold_copy_map AS "ahcm" ON ( "ahcm".HOLD = "ahr".id )
   WHERE ( "ahcm".HOLD = 2255161 )
   GROUP BY 1, 2, 3 ORDER BY  "ahr".cut_in_line DESC NULLS LAST, "ahr".request_time;

Changes:

Removed self join to ACTION.hold_copy_map table because there is UNIQUE key index on (hold, target_copy, which makes the self joining redundant
Removed COALESCE function from ORDER BY clause. "ahr".cut_in_line column is boolean. If the value of the column is NULL, the COALESCE function is setting to '0' to make sure the NULLS are getting sorted last. Postgres provides "NULLS LAST" clause to achieve same functionality. This change is trivial but will reduce cost of executing coalesce function.