Architecture

Understanding ZapFS distributed storage system architecture and data flow.

Metadata
S3 API Gateway
Manager
Raft Control Plane
File Server
Data Storage
Vitess
Metadata DB

System Overview

ZapFS is a distributed object storage system with S3-compatible API. The architecture separates concerns across specialized services:

┌──────────┐     ┌─────────────────┐     ┌─────────────┐     ┌──────────────┐
│  Client  │────▶│ Metadata Server │────▶│   Manager   │────▶│ File Servers │
└──────────┘     └─────────────────┘     └─────────────┘     └──────────────┘
     │                   │                      │                    │
     │  S3 API           │  GetReplicationTargets                    │
     │  (HTTP)           │  (gRPC, any node)    │                    │
     │                   │                      │                    │
     │                   ▼                      │                    │
     │           ┌───────────────┐              │                    │
     │           │ FileClientPool│──────────────┼───────────────────▶│
     │           └───────────────┘              │  PutObject/        │
     │                   │                      │  GetObject (gRPC)  │
     │                   │                      │                    │
     │                   ▼                      │                    ▼
     │           ┌───────────────┐              │           ┌────────────────┐
     │           │  Metadata DB  │              │           │ Storage Backend│
     │           │   (Vitess)    │              │           │ (Local/S3/etc) │
     │           └───────────────┘              │           └────────────────┘

Component Responsibilities

ComponentResponsibilityConsistency
Metadata ServerS3 API, object metadata, authRead from Vitess
ManagerCluster topology, placement, Raft consensusLeader for writes
File ServerData I/O, replication, erasure codingLocal
VitessDurable object/bucket metadataSharded MySQL

Metadata Server

The Metadata Server is the S3 API gateway. It handles all client requests and coordinates with other services:

  • Exposes HTTP S3 API (default port 8082)
  • Authenticates requests via AWS SigV4/SigV2 signatures
  • Authorizes requests against IAM policies
  • Routes data operations to File Servers via gRPC
  • Stores object metadata in Vitess/MySQL
  • Syncs IAM credentials from Manager in real-time

Request Filter Chain

Request → RequestID → Validation → Authentication → Authorization → Handler
              ↓            ↓              ↓                ↓            ↓
           Generate    Validate       Verify AWS       Evaluate     Execute
           unique ID   headers      SigV4 signature   IAM policies  operation

Manager Cluster

The Manager cluster uses Raft consensus for distributed coordination:

  • 3+ nodes for high availability (tolerates 1 failure with 3 nodes)
  • Leader handles all writes; reads can go to any node
  • Manages service registry (file servers, metadata servers)
  • Makes placement decisions for data replication
  • Stores IAM state (users, credentials, policies)
  • Exposes Admin HTTP API (default port 8060)

Manager Operations

TypeOperationsRouting
WritesRegisterService, CreateCollection, IAM changesLeader only
ReadsGetReplicationTargets, GetTopology, ListCollectionsAny node

File Server

File Servers handle actual data storage and retrieval:

  • Content-addressed storage with SHA-256 deduplication
  • Erasure coding (K data + M parity shards)
  • Synchronous or asynchronous replication
  • Reference-counted garbage collection
  • Supports local filesystem or S3 backends
  • gRPC streaming for efficient data transfer

Data Flow

PutObject Flow

1. Client → Metadata Server (S3 PutObject HTTP)
2. Metadata Server → Manager.GetReplicationTargets() [any node]
3. Manager returns file server addresses + backend IDs
4. Metadata Server → FileService.PutObject() [streaming gRPC]
5. File Server writes to local backend
6. File Server replicates to peer file servers (sync/async)
7. Metadata Server stores ObjectRef in DB
8. Return ETag to client

GetObject Flow

1. Client → Metadata Server (S3 GetObject HTTP)
2. Metadata Server looks up ObjectRef in DB
3. Metadata Server → FileService.GetObject() [streaming gRPC]
4. File Server reads from local backend
5. File Server streams chunks back
6. Metadata Server streams to client

DeleteObject Flow

1. Client → Metadata Server (S3 DeleteObject HTTP)
2. Metadata Server marks ObjectRef deleted in DB
3. Return 204 to client (immediate)
4. GC worker later calls FileService.DeleteObject()

Key Packages

PackageLocationPurpose
metadatapkg/metadata/S3 API gateway, request processing
managerpkg/manager/Raft control plane, service registry
filepkg/file/Chunk storage, replication, GC
iampkg/iam/Credential & policy management
storagepkg/storage/Storage backends, indexing
s3apipkg/s3api/S3 signature verification, types
cachepkg/cache/High-performance LRU cache

Default Ports

ServicePortProtocolPurpose
Metadata8082HTTPS3 API
Metadata8083gRPCInternal API
Manager8050gRPCRaft + Service API
Manager8060HTTPAdmin API
File Server8081gRPCData API

Distributed Patterns

Leader-Aware Routing

Manager client discovers leader via Ping RPC, caches with TTL, and automatically fails over on leader changes.

Credential Streaming

Metadata services subscribe to IAM changes via gRPC streaming, maintaining local cache with real-time updates.

Service Registry

All services register with Manager on startup. Manager tracks heartbeats and removes dead services.

Erasure Coding

Data split into K data + M parity shards. Recoverable from up to M node failures (default: 3+2).