Relational Database Best Practices

A living engineering standard for schema design, SQL authoring, migrations, performance, and security. Treat every item here as a default — deviate only with explicit justification.

Naming Conventions
Schema Design
SQL Authoring
Migrations & Version Control
Indexing
Performance
Security
Refs

Naming Conventions

Tables — singular nouns, snake_case. The table represents a type, not a collection.

✅  ad_campaign, ad_placement, user_profile, product_review
❌  ad_campaigns, AdCampaign, tbl_users

Columns — snake_case, descriptive, no type suffixes.

✅  created_at, geo_location_lat, loyalty_point
❌  strName, dtCreatedDate, n_id

Timestamps — suffix with _at (preferred) or _datetime_utc / _datetime_local when storing both timezones. The _on suffix is also acceptable for date-only fields.
```
created_at, updated_at, deleted_at
created_datetime, modified_datetime, deleted_datetime
event_datetime_utc, event_datetime_local
```

Booleans — prefix with a verb that forms a yes/no question: is_, has_, can_, should_.

is_deleted, has_verified_email, can_edit, should_track_user
creative_id, is_deleted
region_id, should_track_user

Foreign keys — name as <referenced_table>_id.
```
user_id, ad_campaign_id, order_id
```
Primary keys — use id (not user_id on the user table itself — that creates ambiguity in JOINs).
Indexes — idx_<table>_<columns>. Unique constraints: uq_<table>_<columns>.
```
idx_ad_placement_campaign_id
uq_user_profile_email
```
Junction/association tables — name after both entities in alphabetical order.
```
campaign_tag  (not tag_campaign, not campaign_tags_map)
```

Schema Design

Normalisation

Start at 3NF (Third Normal Form) by default. Each non-key column must depend on the whole key and nothing but the key.
Denormalise only for measured read performance gains, and document the trade-off explicitly.

Primary Keys

Prefer BIGSERIAL (PostgreSQL) / BIGINT AUTO_INCREMENT (MySQL) for single-node tables.
Use UUID (gen_random_uuid()) for distributed systems or when rows are exposed in URLs (prevents enumeration attacks).
Never use a mutable business value (email, phone, SSN) as a primary key.

Timestamps

Always store timestamps in UTC. Convert to local time in the application layer.
Every table must have created_at TIMESTAMPTZ NOT NULL DEFAULT now().
Mutable tables must have updated_at TIMESTAMPTZ NOT NULL DEFAULT now() — keep it current via a trigger or ORM hook.
Implement soft deletes with deleted_at TIMESTAMPTZ rather than physical deletes for any business-critical entity.

-- Standard audit columns on every table
id          BIGSERIAL    PRIMARY KEY,
created_at  TIMESTAMPTZ  NOT NULL DEFAULT now(),
updated_at  TIMESTAMPTZ  NOT NULL DEFAULT now(),
deleted_at  TIMESTAMPTZ  -- NULL means active

Constraints — enforce at the database layer, not just application code

NOT NULL on every column that must have a value.
UNIQUE constraints for natural uniqueness (email, slug, external_id).
FOREIGN KEY constraints with explicit ON DELETE behaviour (RESTRICT is the safe default; never CASCADE without deliberate design).
CHECK constraints for bounded domains (e.g., CHECK (status IN ('active', 'paused', 'archived'))).

Nullability

Be deliberate: a NULL should mean "genuinely unknown", not "not applicable" or "we forgot".
Prefer NOT NULL with a sentinel default over nullable columns where the domain allows it.

Anti-patterns

Anti-pattern	Why it's harmful	Alternative
Comma-separated values in a column	Violates 1NF; can't index, filter, or JOIN efficiently	Junction table
EAV (Entity-Attribute-Value)	No type safety, no constraints, catastrophic query performance	Proper columns or JSONB with a schema
Storing JSON blobs for queryable data	Can't index individual fields efficiently	Structured columns; use `JSONB` only for genuinely schemaless data
`SELECT *` in application queries	Fetches unused columns, breaks if schema changes, bloats network I/O	Explicit column list
Nullable foreign keys to model optionality	Leaks optional relationships into the core table	Separate association table
`INT` primary keys on distributed tables	Sequence contention, enumerable IDs	`UUID` or distributed sequence

SQL Authoring

Use `RETURNING` to eliminate round-trips

After a write or update, use RETURNING to fetch the resulting row state in the same statement. This eliminates a follow-up SELECT and avoids a race condition on the returned data.

-- Insert and immediately get the auto-generated id and defaults
INSERT INTO ad_targeting (ad_id, geo_location_lat, geo_location_long)
VALUES (7, 47.6062, -122.3321)
RETURNING id, created_at, geo_location_lat, geo_location_long;

-- Update and confirm new state
UPDATE ad_campaign
SET status = 'paused', updated_at = now()
WHERE id = 42
RETURNING id, status, updated_at;

Always use bind parameters (never string interpolation)

// ✅ Correct — parameterised query
PreparedStatement ps = conn.prepareStatement(
    "SELECT * FROM user_profile WHERE email = ?"
);
ps.setString(1, email);

// ❌ Wrong — SQL injection vector
String query = "SELECT * FROM user_profile WHERE email = '" + email + "'";

Explicit column lists on INSERT

-- ✅ Schema changes won't silently break this
INSERT INTO ad_campaign (name, status, created_at) VALUES (?, ?, now());

-- ❌ Breaks if column order or count changes
INSERT INTO ad_campaign VALUES (?, ?, now());

Prefer `EXISTS` over `COUNT` for existence checks

-- ✅ Short-circuits on first match — O(1) best case
SELECT EXISTS (SELECT 1 FROM user_profile WHERE email = ?);

-- ❌ Scans all matching rows to count them
SELECT COUNT(*) FROM user_profile WHERE email = ?;

Paginate large result sets

-- Keyset pagination (preferred — stable, index-friendly)
SELECT id, name, created_at
FROM ad_campaign
WHERE id > :last_seen_id
ORDER BY id
LIMIT 50;

-- Offset pagination (avoid beyond page ~100 — full scan cost grows linearly)
SELECT id, name FROM ad_campaign ORDER BY id LIMIT 50 OFFSET 5000;

Use transactions explicitly for multi-statement writes

BEGIN;
  INSERT INTO customer_order (customer_id, total) VALUES (?, ?) RETURNING id INTO :order_id;
  INSERT INTO order_line_item (order_id, sku, qty) VALUES (:order_id, ?, ?);
COMMIT;

Migrations & Version Control

Always version migrations — every schema change must have a numbered, ordered migration script.
Store the canonical schema in db/<engine>/ so any engineer can recreate the database locally from scratch with a single command.
Write migration scripts in db/<engine>/ using sequential numbering: 001-create-user.sql, 002-add-email-index.sql.
Use Liquibase (see liquibase/README.md) for tracked, audited, multi-environment deployments.
Never edit an already-applied migration. Create a new one instead. Liquibase detects checksum changes and will fail the deployment.
Every ALTER TABLE that adds a column must include a DEFAULT or the column must be NOT NULL with backfill handled in a separate migration step.
For large tables, use CREATE INDEX CONCURRENTLY (PostgreSQL) to avoid locking reads during index creation.

-- ✅ Non-blocking index creation
CREATE INDEX CONCURRENTLY idx_ad_placement_campaign_id ON ad_placement (campaign_id);

-- ❌ Locks the entire table during creation
CREATE INDEX idx_ad_placement_campaign_id ON ad_placement (campaign_id);

Indexing

Index all foreign key columns — missing FK indexes cause full-table scans on every JOIN and CASCADE operation.
Index every column used in a WHERE, ORDER BY, or JOIN ON clause that appears in high-frequency queries.

Use partial indexes to index only the rows that queries actually touch:

CREATE INDEX idx_ad_campaign_active ON ad_campaign (created_at)
WHERE is_deleted = false;

Use composite indexes and order columns from highest to lowest cardinality, matching the query predicate order.

Avoid over-indexing: each index adds write overhead on INSERT/UPDATE/DELETE. Audit unused indexes with:

SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;

Run EXPLAIN (ANALYZE, BUFFERS) before merging any query that touches a table with more than 10K rows.

Performance

Measure before optimising. Use EXPLAIN (ANALYZE, BUFFERS) to understand actual execution plans. Never guess.
Keep transactions short. Long-running transactions hold locks, block VACUUM, and cause replication lag.
Avoid N+1 queries — fetch related data in a single JOIN or a single IN (...) query rather than one query per row.
Use connection pooling (HikariCP, PgBouncer). Never open a new connection per request.
- Pool size formula: (cpu_cores × 2) + effective_spindle_count

For batch inserts, use a single multi-row INSERT or COPY rather than individual row inserts:

-- ✅ Single round-trip
INSERT INTO visiting_user (name, email) VALUES
  ('alice', 'alice@example.com'),
  ('bob',   'bob@example.com'),
  ('carol', 'carol@example.com');

For read-heavy workloads, offload analytics queries to a read replica or a columnar store (Redshift, BigQuery). Never run long OLAP queries against the OLTP primary.
Set statement timeouts in your application connection to prevent runaway queries from starving the connection pool:
```
SET statement_timeout = '30s';
```

Security

Never interpolate user input into SQL strings. Always use parameterised queries / prepared statements.

Grant the minimum required privileges to each application role:

-- Application service account: read/write only, no DDL
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
REVOKE ALL ON ALL TABLES IN SCHEMA public FROM PUBLIC;

Use separate roles for migrations (DDL permissions) and application runtime (DML only).
Do not store plaintext secrets (passwords, API keys) in any column. Store hashed credentials (bcrypt, argon2) and encrypt sensitive PII at the application layer before writing.
Rotate database credentials via a secrets manager (AWS Secrets Manager, Vault). Never hardcode credentials in code or config files.
Enable SSL/TLS for all database connections in production. Reject plaintext connections at the server level (ssl = on, ssl_min_protocol_version = TLSv1.2).
Audit privileged operations with pg_audit (PostgreSQL) or equivalent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relational Database Best Practices

Table of Contents

Naming Conventions

Schema Design

Normalisation

Primary Keys

Timestamps

Constraints — enforce at the database layer, not just application code

Nullability

Anti-patterns

SQL Authoring

Use `RETURNING` to eliminate round-trips

Always use bind parameters (never string interpolation)

Explicit column lists on INSERT

Prefer `EXISTS` over `COUNT` for existence checks

Paginate large result sets

Use transactions explicitly for multi-statement writes

Migrations & Version Control

Indexing

Performance

Security

Refs

FilesExpand file tree

database_practices.md

Latest commit

History

database_practices.md

File metadata and controls

Relational Database Best Practices

Table of Contents

Naming Conventions

Schema Design

Normalisation

Primary Keys

Timestamps

Constraints — enforce at the database layer, not just application code

Nullability

Anti-patterns

SQL Authoring

Use RETURNING to eliminate round-trips

Always use bind parameters (never string interpolation)

Explicit column lists on INSERT

Prefer EXISTS over COUNT for existence checks

Paginate large result sets

Use transactions explicitly for multi-statement writes

Migrations & Version Control

Indexing

Performance

Security

Refs

Use `RETURNING` to eliminate round-trips

Prefer `EXISTS` over `COUNT` for existence checks