Object StorageAI Data Lakehouse

MinIO

The canonical object store of our AI Data Lakehouse — S3-compatible, Kubernetes-native, and capable of sustained multi-gigabyte-per-second throughput for AI training, inference, and archival workloads.

AppSofa Lab·AI Data Lakehouse

Overview

MinIO is a high-performance, S3-compatible object storage system designed for AI and machine learning workloads. In the AppSofa AI Data Lakehouse, MinIO serves as the raw data layer — the single source of truth for all ingested data before it is transformed and distributed to downstream stores.

Its S3 API compatibility means any tool, framework, or cloud service that can write to Amazon S3 can write to our MinIO cluster, enabling seamless integration with the broader AI toolchain — from PyTorch data loaders to LangChain document loaders.

Role in the Lakehouse

Raw Data Store

All incoming data — documents, sensor streams, images, audio — lands in MinIO first. Downstream ETL pipelines read from MinIO to populate ElasticSearch, Qdrant, Oxigraph, and Neo4J.

Model Artifact Registry

Training checkpoints, fine-tuned model weights, ONNX exports, and quantized models are versioned in MinIO buckets — accessible to training jobs and inference servers alike.

Vector Index Backups

Periodic snapshots of Qdrant collections and Neo4J graph exports are archived to MinIO, providing point-in-time recovery and offline analysis capabilities.

Document Corpus

Pre-RAG document chunks, PDFs, and markdown files are stored in MinIO. The ETL pipeline reads them, generates embeddings, and upserts vectors into Qdrant.

Performance & Scale

MinIO is benchmarked at over 325 GiB/s read and 165 GiB/s write throughput in distributed mode — sufficient for even the most data-intensive AI training runs on our NVIDIA GPU cluster.

Erasure coding
Bitrot protection
Encryption at rest
Multi-site replication

Collaborate

Need an enterprise object storage layer for AI?

We design and deploy MinIO-based storage infrastructure for AI training, serving, and lakehouse workloads in on-premise and air-gapped environments.

Get in Touch