Architecture
Understanding ZapFS distributed storage system architecture and data flow.
System Overview
ZapFS is a distributed object storage system with S3-compatible API. The architecture separates concerns across specialized services:
┌──────────┐ ┌─────────────────┐ ┌─────────────┐ ┌──────────────┐
│ Client │────▶│ Metadata Server │────▶│ Manager │────▶│ File Servers │
└──────────┘ └─────────────────┘ └─────────────┘ └──────────────┘
│ │ │ │
│ S3 API │ GetReplicationTargets │
│ (HTTP) │ (gRPC, any node) │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────────┐ │ │
│ │ FileClientPool│──────────────┼───────────────────▶│
│ └───────────────┘ │ PutObject/ │
│ │ │ GetObject (gRPC) │
│ │ │ │
│ ▼ │ ▼
│ ┌───────────────┐ │ ┌────────────────┐
│ │ Metadata DB │ │ │ Storage Backend│
│ │ (Vitess) │ │ │ (Local/S3/etc) │
│ └───────────────┘ │ └────────────────┘Component Responsibilities
| Component | Responsibility | Consistency |
|---|---|---|
| Metadata Server | S3 API, object metadata, auth | Read from Vitess |
| Manager | Cluster topology, placement, Raft consensus | Leader for writes |
| File Server | Data I/O, replication, erasure coding | Local |
| Vitess | Durable object/bucket metadata | Sharded MySQL |
Metadata Server
The Metadata Server is the S3 API gateway. It handles all client requests and coordinates with other services:
- Exposes HTTP S3 API (default port 8082)
- Authenticates requests via AWS SigV4/SigV2 signatures
- Authorizes requests against IAM policies
- Routes data operations to File Servers via gRPC
- Stores object metadata in Vitess/MySQL
- Syncs IAM credentials from Manager in real-time
Request Filter Chain
Request → RequestID → Validation → Authentication → Authorization → Handler
↓ ↓ ↓ ↓ ↓
Generate Validate Verify AWS Evaluate Execute
unique ID headers SigV4 signature IAM policies operationManager Cluster
The Manager cluster uses Raft consensus for distributed coordination:
- 3+ nodes for high availability (tolerates 1 failure with 3 nodes)
- Leader handles all writes; reads can go to any node
- Manages service registry (file servers, metadata servers)
- Makes placement decisions for data replication
- Stores IAM state (users, credentials, policies)
- Exposes Admin HTTP API (default port 8060)
Manager Operations
| Type | Operations | Routing |
|---|---|---|
| Writes | RegisterService, CreateCollection, IAM changes | Leader only |
| Reads | GetReplicationTargets, GetTopology, ListCollections | Any node |
File Server
File Servers handle actual data storage and retrieval:
- Content-addressed storage with SHA-256 deduplication
- Erasure coding (K data + M parity shards)
- Synchronous or asynchronous replication
- Reference-counted garbage collection
- Supports local filesystem or S3 backends
- gRPC streaming for efficient data transfer
Data Flow
PutObject Flow
1. Client → Metadata Server (S3 PutObject HTTP) 2. Metadata Server → Manager.GetReplicationTargets() [any node] 3. Manager returns file server addresses + backend IDs 4. Metadata Server → FileService.PutObject() [streaming gRPC] 5. File Server writes to local backend 6. File Server replicates to peer file servers (sync/async) 7. Metadata Server stores ObjectRef in DB 8. Return ETag to client
GetObject Flow
1. Client → Metadata Server (S3 GetObject HTTP) 2. Metadata Server looks up ObjectRef in DB 3. Metadata Server → FileService.GetObject() [streaming gRPC] 4. File Server reads from local backend 5. File Server streams chunks back 6. Metadata Server streams to client
DeleteObject Flow
1. Client → Metadata Server (S3 DeleteObject HTTP) 2. Metadata Server marks ObjectRef deleted in DB 3. Return 204 to client (immediate) 4. GC worker later calls FileService.DeleteObject()
Key Packages
| Package | Location | Purpose |
|---|---|---|
| metadata | pkg/metadata/ | S3 API gateway, request processing |
| manager | pkg/manager/ | Raft control plane, service registry |
| file | pkg/file/ | Chunk storage, replication, GC |
| iam | pkg/iam/ | Credential & policy management |
| storage | pkg/storage/ | Storage backends, indexing |
| s3api | pkg/s3api/ | S3 signature verification, types |
| cache | pkg/cache/ | High-performance LRU cache |
Default Ports
| Service | Port | Protocol | Purpose |
|---|---|---|---|
| Metadata | 8082 | HTTP | S3 API |
| Metadata | 8083 | gRPC | Internal API |
| Manager | 8050 | gRPC | Raft + Service API |
| Manager | 8060 | HTTP | Admin API |
| File Server | 8081 | gRPC | Data API |
Distributed Patterns
Leader-Aware Routing
Manager client discovers leader via Ping RPC, caches with TTL, and automatically fails over on leader changes.
Credential Streaming
Metadata services subscribe to IAM changes via gRPC streaming, maintaining local cache with real-time updates.
Service Registry
All services register with Manager on startup. Manager tracks heartbeats and removes dead services.
Erasure Coding
Data split into K data + M parity shards. Recoverable from up to M node failures (default: 3+2).