Overview
MinIO is a high-performance, S3-compatible object storage system designed for AI and machine learning workloads. In the AppSofa AI Data Lakehouse, MinIO serves as the raw data layer — the single source of truth for all ingested data before it is transformed and distributed to downstream stores.
Its S3 API compatibility means any tool, framework, or cloud service that can write to Amazon S3 can write to our MinIO cluster, enabling seamless integration with the broader AI toolchain — from PyTorch data loaders to LangChain document loaders.
Role in the Lakehouse
Raw Data Store
All incoming data — documents, sensor streams, images, audio — lands in MinIO first. Downstream ETL pipelines read from MinIO to populate ElasticSearch, Qdrant, Oxigraph, and Neo4J.
Model Artifact Registry
Training checkpoints, fine-tuned model weights, ONNX exports, and quantized models are versioned in MinIO buckets — accessible to training jobs and inference servers alike.
Vector Index Backups
Periodic snapshots of Qdrant collections and Neo4J graph exports are archived to MinIO, providing point-in-time recovery and offline analysis capabilities.
Document Corpus
Pre-RAG document chunks, PDFs, and markdown files are stored in MinIO. The ETL pipeline reads them, generates embeddings, and upserts vectors into Qdrant.
Performance & Scale
MinIO is benchmarked at over 325 GiB/s read and 165 GiB/s write throughput in distributed mode — sufficient for even the most data-intensive AI training runs on our NVIDIA GPU cluster.
Collaborate
Need an enterprise object storage layer for AI?
We design and deploy MinIO-based storage infrastructure for AI training, serving, and lakehouse workloads in on-premise and air-gapped environments.
Get in Touch