Database decisions made early in a project can have lasting impacts on performance, scalability, and maintainability. After working with dozens of production databases, I've learned that the right patterns applied early save countless hours of refactoring later. Let's explore the patterns and practices that lead to scalable, maintainable data layers.
SQL vs NoSQL: A Decision Framework
The debate isn't about which is better – it's about which fits your use case. Both SQL and NoSQL databases have evolved significantly, and the lines between them have blurred. Here's my framework for making this critical decision:
- Choose SQL when: You need ACID transactions, complex queries with JOINs, data integrity is critical, reporting requirements exist
- Choose NoSQL when: Schema flexibility matters, horizontal scaling is needed, document-based data fits naturally, rapid prototyping is required
- Consider both when: Different parts of your system have different requirements – polyglot persistence is increasingly common
PostgreSQLBest for ACID
MongoDBBest for flexibility
RedisBest for caching
ClickHouseBest for analytics
PostgreSQL: The Swiss Army Knife
PostgreSQL has evolved into an incredibly versatile database. It handles JSON documents, full-text search, geospatial data, and time-series data. With extensions like pg_vector, it even supports AI embeddings.
sql
-- Modern PostgreSQL is more than just relational
-- JSON support for flexible schema
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
metadata JSONB DEFAULT '{}',
search_vector TSVECTOR,
created_at TIMESTAMP DEFAULT NOW()
);
-- Index JSON fields for performance
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
-- Full-text search
CREATE INDEX idx_products_search ON products USING GIN (search_vector);
-- Query JSON data
SELECT * FROM products
WHERE metadata->>'category' = 'electronics'
AND (metadata->>'rating')::numeric > 4.0;
-- Full-text search
SELECT * FROM products
WHERE search_vector @@ plainto_tsquery('english', 'wireless keyboard');MongoDB Schema Design
Document databases offer flexibility, but that doesn't mean no structure. Good schema design is still essential. The key decision is embedding vs referencing – get this wrong and you'll face performance issues at scale.
javascript
// Schema design with Mongoose
import mongoose from 'mongoose'
// Embedding vs Referencing
// Embed when: data is accessed together, 1:few relationship, data doesn't change often
// Reference when: many:many relationship, data updates frequently, need to query independently
const orderSchema = new mongoose.Schema({
orderNumber: { type: String, unique: true, required: true },
customer: {
// Denormalized for read performance
name: { type: String, required: true },
email: { type: String, required: true },
// Reference for full customer data
customerId: { type: mongoose.Schema.Types.ObjectId, ref: 'Customer' }
},
items: [{
product: { type: mongoose.Schema.Types.ObjectId, ref: 'Product' },
// Snapshot for order history (prices change)
name: String,
price: Number,
quantity: Number,
// Computed subtotal
subtotal: Number
}],
shipping: {
address: {
street: String,
city: String,
state: String,
zip: String,
country: String
},
method: { type: String, enum: ['standard', 'express', 'overnight'] },
cost: Number,
tracking: String
},
payment: {
method: { type: String, enum: ['card', 'paypal', 'crypto'] },
status: { type: String, enum: ['pending', 'completed', 'failed', 'refunded'] },
transactionId: String
},
totals: {
subtotal: Number,
tax: Number,
shipping: Number,
discount: Number,
total: Number
},
status: {
type: String,
enum: ['pending', 'processing', 'shipped', 'delivered', 'cancelled'],
default: 'pending'
}
}, {
timestamps: true,
toJSON: { virtuals: true }
})
// Indexes for common queries
orderSchema.index({ 'customer.customerId': 1, createdAt: -1 })
orderSchema.index({ status: 1, createdAt: -1 })
orderSchema.index({ orderNumber: 1 })Indexing Strategies
Indexes are the difference between milliseconds and seconds. But more isn't always better – each index adds overhead to writes and storage. Profile your queries and index strategically.
javascript
// MongoDB indexing strategies
// Single field - for simple queries
db.orders.createIndex({ "customer.email": 1 })
// Compound - for queries with multiple conditions
// Field order matters! Most selective first
db.orders.createIndex({ status: 1, createdAt: -1 })
// Partial - index only relevant documents
db.orders.createIndex(
{ createdAt: -1 },
{ partialFilterExpression: { status: { $in: ['pending', 'processing'] } } }
)
// Text - for search
db.products.createIndex({ name: 'text', description: 'text' })
// TTL - automatic deletion
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 })sql
-- PostgreSQL indexing strategies
-- B-tree (default) - equality and range
CREATE INDEX idx_orders_created ON orders(created_at DESC);
-- Partial - for frequently filtered data
CREATE INDEX idx_active_orders ON orders(created_at)
WHERE status IN ('pending', 'processing');
-- Covering index - includes all needed columns
CREATE INDEX idx_orders_list ON orders(status, created_at DESC)
INCLUDE (order_number, total);
-- Expression index
CREATE INDEX idx_orders_month ON orders(DATE_TRUNC('month', created_at));
-- Analyze query performance
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 20;Redis Caching Patterns
Caching can provide 10-100x performance improvements for read-heavy operations. Redis is the go-to choice for its versatility – it's not just a cache, but also a message broker, session store, and rate limiter.
javascript
import Redis from 'ioredis'
const redis = new Redis(process.env.REDIS_URL)
// Cache-aside pattern (most common)
async function getUser(id) {
const cacheKey = `user:${id}`
// Try cache first
const cached = await redis.get(cacheKey)
if (cached) {
return JSON.parse(cached)
}
// Cache miss - fetch from database
const user = await db.users.findById(id)
if (!user) return null
// Cache for 1 hour
await redis.setex(cacheKey, 3600, JSON.stringify(user))
return user
}
// Write-through pattern
async function updateUser(id, data) {
const user = await db.users.findByIdAndUpdate(id, data, { new: true })
// Update cache immediately
await redis.setex(`user:${id}`, 3600, JSON.stringify(user))
return user
}
// Cache invalidation on delete
async function deleteUser(id) {
await db.users.findByIdAndDelete(id)
await redis.del(`user:${id}`)
// Invalidate related caches
await redis.del(`user:${id}:orders`)
await redis.del(`user:${id}:preferences`)
}
// Rate limiting with Redis
async function rateLimit(key, limit, windowSeconds) {
const current = await redis.incr(key)
if (current === 1) {
await redis.expire(key, windowSeconds)
}
return current <= limit
}
// Session management
async function createSession(userId, metadata) {
const sessionId = crypto.randomUUID()
const session = { userId, ...metadata, createdAt: Date.now() }
await redis.setex(
`session:${sessionId}`,
86400 * 7, // 7 days
JSON.stringify(session)
)
return sessionId
}Prisma: Type-Safe Database Access
Prisma has revolutionized how we interact with databases in TypeScript applications. It provides type safety, auto-generated migrations, and an intuitive API.
typescript
// schema.prisma
model User {
id String @id @default(cuid())
email String @unique
name String?
orders Order[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([email])
}
model Order {
id String @id @default(cuid())
user User @relation(fields: [userId], references: [id])
userId String
items OrderItem[]
total Decimal
status OrderStatus @default(PENDING)
createdAt DateTime @default(now())
@@index([userId, createdAt(sort: Desc)])
@@index([status])
}
// Type-safe queries
const ordersWithItems = await prisma.order.findMany({
where: {
userId: user.id,
status: { in: ['PENDING', 'PROCESSING'] }
},
include: {
items: {
include: { product: true }
}
},
orderBy: { createdAt: 'desc' },
take: 10
})Key Takeaways
- Design schemas based on access patterns, not just data structure
- Index queries you actually run, not every field – use EXPLAIN to analyze
- Implement caching at the service layer for flexibility
- Plan for cache invalidation from the start – it's harder than caching
- Use Prisma or similar ORMs for type safety in TypeScript projects
- Consider polyglot persistence – use the right database for each job
100xCache speedup
10msTarget query time
3Max JOIN depth
5Indexes per table max