Sightwise

Scaling GPU-Intensive AI Workloads on Google Cloud

Sightwise

Client:

Sightwise

Industry:

Industrial AI / AI Manufacturing

Core Technologies:

Google Cloud Platform
Google Kubernetes Engine (GKE)
 NVIDIA GPUs (L4, RTX PRO 6000, T4)
Cloud SQL
Google Cloud Storage
Artifact Registry

Background

Sightwise, a European AI company developing manufacturing applications relied on an on-prem Kubernetes cluster to power synthetic data generation and ML workloads. These processes required significant GPU resources for ray tracing, simulation, and model training. As workloads grew, maintaining on-prem infrastructure became increasingly complex and limited the team’s ability to scale GPU capacity quickly. The company needed a cloud platform that could support high-performance AI workloads while reducing operational overhead.

Challenges

Operational Maintenance Overhead: Self-hosted services such as S3-compatible object storage and GitLab CI/CD required constant management alongside on-prem hardware infrastructure.

High GPU Demand: Synthetic data generation and ray tracing workloads required powerful NVIDIA GPUs that were difficult to scale efficiently on-premise.

Limited Deployment Automation: The existing Kubernetes environment lacked automation and resilience, slowing deployments and ML experimentation.

Solutions Delivered

Zazmic designed and implemented a scalable AI infrastructure on Google Cloud:

GPU-Optimized GKE Clusters: Deployed a multi-node GKE cluster optimized for AI workloads using NVIDIA L4, RTX PRO 6000, and T4 GPUs to support compute-intensive tasks.

Lift-and-Shift Cloud Migration: Migrated Kubernetes workloads to Google Cloud with minimal application changes, enabling rapid adoption while preserving existing workflows.

Managed Data & Storage Services: Moved databases to Cloud SQL and object storage to Google Cloud Storage, eliminating the maintenance burden of self-hosted infrastructure.

GitOps CI/CD with ArgoCD: Implemented automated deployment pipelines using ArgoCD and Artifact Registry for consistent, repeatable releases.

Outcomes

The new cloud platform transformed the client’s AI infrastructure:

Resilient AI Platform

Highly available GKE environment for GPU-intensive workloads

Reduced Operational Overhead

Eliminated maintenance of self-hosted infrastructure

Optimized GPU Performance

Scalable NVIDIA GPU resources for faster training and synthetic data generation

Automated Deployments

GitOps pipelines ensure reliable, repeatable releases

Production-Ready ML Infrastructure

Scalable foundation for continued AI development

Conclusion

Zazmic delivered a scalable Google Cloud platform that enables Sightwise to run GPU-intensive AI workloads efficiently. By combining GKE, managed cloud services, and automated deployments, the solution reduces operational complexity while providing the performance and flexibility needed for advanced manufacturing AI development.

Ready to Transform Your Business?

Let's discuss how Zazmic can help you achieve similar results with AI and cloud solutions.