Skip to main content

Overview

Ferrum maintains dedicated search index tables so that FHIR search queries never need to scan raw JSONB resources. When a resource is created or updated, the server extracts searchable values using FHIRPath expressions defined in the search_parameters table and writes them into typed index tables. By default, indexing runs inline — the CRUD operation indexes the resource synchronously before returning the HTTP response, so it is searchable immediately. This is the recommended mode for normal FHIR operations. For large bulk ingests where eventual consistency is acceptable, you can set fhir.search.inline_indexing: false to defer indexing to background workers. In this mode, the CRUD operation returns immediately and enqueues a background job. A separate worker process picks it up, extracts values, and writes the index rows inside a single PostgreSQL transaction.

Index tables

Each FHIR search parameter type maps to a dedicated table:
TableParameter typeKey payload columns
search_stringstringvalue, value_normalized
search_tokentokensystem, code, display
search_token_identifiertokentype_system, type_code, value (:of-type triple)
search_datedatestart_date, end_date (half-open UTC range)
search_numbernumbervalue (NUMERIC, lossless)
search_quantityquantityvalue, system, code, unit
search_referencereferencetarget_type, target_id, canonical_url, …
search_uriurivalue, value_normalized
search_textspecialPostgreSQL tsvector from narrative HTML
search_contentspecialPostgreSQL tsvector from all string values
All index tables are UNLOGGED for write throughput — data is recoverable via re-indexing from the authoritative resources table. Every row carries an entry_hash column. Inserts use ON CONFLICT (entry_hash) DO UPDATE so re-indexing the same resource is idempotent.

Indexing pipeline

Resource write (create / update / patch)


CRUD service enqueues indexing job


Worker picks up job


1. Fetch active search parameters for resource type
2. Begin transaction + pg_advisory_xact_lock(resource_type, resource_id)
3. Smart-delete stale index rows (only removed parameters)
4. Evaluate FHIRPath expressions → extract typed values
5. Bulk-insert into search_* tables (UNNEST array binding)
6. Update compartment memberships (CareTeam, Group, List)
7. Record index status → commit

Advisory locking

Concurrent workers (e.g. IndexingWorker and SearchParameterWorker) may try to index the same resource at the same time. Ferrum acquires a per-resource pg_advisory_xact_lock keyed on hash(resource_type, resource_id) at the start of the transaction to serialize these writes.

Smart deletion

Rather than dropping all index rows before re-inserting, Ferrum compares the current set of indexed parameter names against the incoming set. Only parameters that were removed are deleted. Combined with ON CONFLICT DO UPDATE, this minimizes write amplification in the common case where parameters haven’t changed structurally.

Value extraction

String normalization

Search strings are stored in both raw (value) and normalized (value_normalized) form. Normalization applies:
  1. NFKD Unicode decomposition
  2. Combining-mark removal (accent stripping)
  3. Lowercase
  4. Strip non-alphanumeric characters
The :exact modifier matches against value; the default match uses value_normalized.

Date ranges

Every date value is converted to a half-open [start, end) UTC range reflecting its precision:
InputRange
20242024-01-01T00:00Z2025-01-01T00:00Z
2024-032024-03-01T00:00Z2024-04-01T00:00Z
2024-03-152024-03-15T00:00Z2024-03-16T00:00Z
2024-03-15T10:30exact instant (sub-second if provided)
Period types index both start and end; missing boundaries map to sentinel min/max datetimes.

Reference indexing

References are parsed into structured fields (target_type, target_id, canonical_url, canonical_version) supporting relative, absolute, canonical, and fragment forms. When a Reference also carries an identifier, Ferrum mirrors it into search_token under the same parameter name to support the :identifier modifier.

Bulk and batch strategies

Ferrum selects an indexing strategy based on the number of resources:
CountStrategyDetail
< 1 000Single batchOne transaction, parameters fetched once per type
1 000 – 9 999Chunked batchesSplit into 1 000-resource transactions
≥ 10 000COPY FROM STDINPostgreSQL bulk-load protocol, 10k–50k rows/sec
Thresholds are configurable via database.indexing_batch_size and database.bulk_threshold in config.yaml.

Re-indexing

The $reindex operation rebuilds search indexes for one or all resource types. Trigger it from the admin API:
# Re-index a single resource type
curl -X POST "http://localhost:8080/fhir/Patient/\$reindex" \
  -H "Content-Type: application/fhir+json"

# Re-index everything (enqueues one job per resource type)
curl -X POST "http://localhost:8080/fhir/\$reindex" \
  -H "Content-Type: application/fhir+json"
$reindex always runs as background jobs. Each job uses cursor-based pagination to process resources in batches of 500, keeping memory usage constant regardless of dataset size. The response includes the number of jobs enqueued and their IDs.

Caching

Two in-process caches reduce repeated work:
  • Search parameter cache — search parameter definitions per resource type, populated on first index and shared across all workers.
  • FHIRPath plan cache — compiled FHIRPath expression plans keyed by expression string, shared across all requests. Each distinct expression is compiled once per process lifetime.

Configuration

Relevant settings in config.yaml:
fhir:
  search:
    inline_indexing: true   # Index synchronously during CRUD (default)
    enable_text: true       # Index _text (narrative HTML)
    enable_content: true    # Index _content (all string values)

database:
  indexing_batch_size: 1000   # Resources per transaction chunk
  bulk_threshold: 10000       # Switch to COPY protocol above this
Set inline_indexing: false when you expect large data ingests and eventual consistency is acceptable. Admin operations like $reindex and package installs always use background workers regardless of this setting.

Known gaps

  • Composite search parameter indexing is not yet supported.
  • Date period overlap search logic is incomplete for edge cases.