Overview
Ferrum maintains dedicated search index tables so that FHIR search queries never need to scan raw JSONB resources. When a resource is created or updated, the server extracts searchable values using FHIRPath expressions defined in thesearch_parameters table and writes them into typed index tables.
By default, indexing runs inline — the CRUD operation indexes the resource
synchronously before returning the HTTP response, so it is searchable
immediately. This is the recommended mode for normal FHIR operations.
For large bulk ingests where eventual consistency is acceptable, you can set
fhir.search.inline_indexing: false to defer indexing to background workers.
In this mode, the CRUD operation returns immediately and enqueues a background
job. A separate worker process picks it up, extracts values, and writes the
index rows inside a single PostgreSQL transaction.
Index tables
Each FHIR search parameter type maps to a dedicated table:| Table | Parameter type | Key payload columns |
|---|---|---|
search_string | string | value, value_normalized |
search_token | token | system, code, display |
search_token_identifier | token | type_system, type_code, value (:of-type triple) |
search_date | date | start_date, end_date (half-open UTC range) |
search_number | number | value (NUMERIC, lossless) |
search_quantity | quantity | value, system, code, unit |
search_reference | reference | target_type, target_id, canonical_url, … |
search_uri | uri | value, value_normalized |
search_text | special | PostgreSQL tsvector from narrative HTML |
search_content | special | PostgreSQL tsvector from all string values |
resources table.
Every row carries an entry_hash column. Inserts use
ON CONFLICT (entry_hash) DO UPDATE so re-indexing the same resource is
idempotent.
Indexing pipeline
Advisory locking
Concurrent workers (e.g.IndexingWorker and SearchParameterWorker) may try
to index the same resource at the same time. Ferrum acquires a per-resource
pg_advisory_xact_lock keyed on hash(resource_type, resource_id) at the start
of the transaction to serialize these writes.
Smart deletion
Rather than dropping all index rows before re-inserting, Ferrum compares the current set of indexed parameter names against the incoming set. Only parameters that were removed are deleted. Combined withON CONFLICT DO UPDATE, this
minimizes write amplification in the common case where parameters haven’t changed
structurally.
Value extraction
String normalization
Search strings are stored in both raw (value) and normalized (value_normalized)
form. Normalization applies:
- NFKD Unicode decomposition
- Combining-mark removal (accent stripping)
- Lowercase
- Strip non-alphanumeric characters
:exact modifier matches against value; the default match uses
value_normalized.
Date ranges
Every date value is converted to a half-open[start, end) UTC range reflecting
its precision:
| Input | Range |
|---|---|
2024 | 2024-01-01T00:00Z – 2025-01-01T00:00Z |
2024-03 | 2024-03-01T00:00Z – 2024-04-01T00:00Z |
2024-03-15 | 2024-03-15T00:00Z – 2024-03-16T00:00Z |
2024-03-15T10:30 | exact instant (sub-second if provided) |
Period types index both start and end; missing boundaries map to sentinel
min/max datetimes.
Reference indexing
References are parsed into structured fields (target_type, target_id,
canonical_url, canonical_version) supporting relative, absolute, canonical,
and fragment forms. When a Reference also carries an identifier, Ferrum
mirrors it into search_token under the same parameter name to support the
:identifier modifier.
Bulk and batch strategies
Ferrum selects an indexing strategy based on the number of resources:| Count | Strategy | Detail |
|---|---|---|
| < 1 000 | Single batch | One transaction, parameters fetched once per type |
| 1 000 – 9 999 | Chunked batches | Split into 1 000-resource transactions |
| ≥ 10 000 | COPY FROM STDIN | PostgreSQL bulk-load protocol, 10k–50k rows/sec |
database.indexing_batch_size and
database.bulk_threshold in config.yaml.
Re-indexing
The$reindex operation rebuilds search indexes for one or all resource types.
Trigger it from the admin API:
$reindex always runs as background jobs. Each job uses cursor-based pagination
to process resources in batches of 500, keeping memory usage constant regardless
of dataset size. The response includes the number of jobs enqueued and their IDs.
Caching
Two in-process caches reduce repeated work:- Search parameter cache — search parameter definitions per resource type, populated on first index and shared across all workers.
- FHIRPath plan cache — compiled FHIRPath expression plans keyed by expression string, shared across all requests. Each distinct expression is compiled once per process lifetime.
Configuration
Relevant settings inconfig.yaml:
inline_indexing: false when you expect large data ingests and eventual
consistency is acceptable. Admin operations like $reindex and package installs
always use background workers regardless of this setting.
Known gaps
- Composite search parameter indexing is not yet supported.
- Date period overlap search logic is incomplete for edge cases.
Related docs
- Search — query syntax and parameter types
- Performance — tuning index throughput
- Configuration — full config reference