PostgreSQL as a Schemaless Database

Speaker:Chris Pettus
Date:2013-02-22
Site:thebuild.com
Twitter:@xof

Basics

  • Requires PostgreSQL 9.2
  • NoSQL == Schemaless
  • Documents vs. rows
  • Persistent object storage

Types

  • XML, JSON, hstore, relational

XML

  • Documents up to 2GB
  • Supports Xpath
  • No indexing :(

hstore

  • Heirarchial (docments link to documents)
  • Key/value store (strongs or other hstore objects/values)
  • GiST or GIN indexing on hstore values

JSON

  • Validates JSON going in
  • Indexable as text for strict comparison
  • Uses PL/V8
  • PL/V8 Pro-tips:
    • Use V8 that comes w/ PL/V8
    • Functions are not compiled by PL/V8 until first use
    • JSON injection is possible, stay vigilant
    • PL invocation is non-trivial (tangible overhead)

Comparison

  • Side-by-side comparison of various types
    • Stock 9.2.2 install
    • Will compare against MongoDB
    • No tuning on app or system

Strategy

  • Single-field tables w/ single type
  • Wrap queries in object-mapper
  • Index the column
  • Profit!

Performance

  • Test 1.78M records against MongoDB
  • Load/Disk/Query results ranked by fastest-to-slowest
  • For hstore, raw, GiST, and GIN are tested
  • For XML, JSON, MongoDB index type is expression index
Type Index Load Disk Query (pk)
relational pk 1 6 2
hstore GiST 2,5,7 2,3,7 5,6,7
XML expr 6 4 3
JSON expr 3 5 1
MongoDB expr 4 1 4
  • Query by name: hstore(GIN) is #1, JSON is #6
  • Query by pk: JSON is #1! (Thanks V8!)

Conclusions

  • Stock MongoDB is doesn’t seem to be more performant than PostgreSQL at querying documents when using accessor functions
    • Be realistic, test w/ real data
  • Index your accessor functions!
  • Avoid full-table scans
  • Consider hstore (most flexible)
  • GiST + GIN rock on high-entropy data (b-tree)
    • GIN outperforms GiST as dataset grows