Optimizing Neural Network Inference on CPUs: IREX.ai's Experience

Arseni Kavalchuk

Technology Expert

January 11, 20243 min

Moving Away from GPUs: The Path to Efficient Infrastructure
Optimization Results
Technical Perspective: How the Optimizations Work
Scaling Through Engineering
Looking Ahead

At the core of IREX.ai's product is a highly optimized module for real-time video stream inference using computer vision algorithms – from image recognition to object detection and classification. A key differentiator of the platform is its ability to efficiently run neural network inference on CPUs, particularly Intel Xeon Skylake and Cascade Lake, without requiring GPUs.

Moving Away from GPUs: The Path to Efficient Infrastructure

Traditionally, the AI industry, especially in computer vision, relies on graphical processing units (GPUs) for both training and inference. However, deploying models in production with GPUs can be expensive and technically challenging. IREX.ai took a strategic bet on CPU-based inference, which has proven to be a winning approach.

A critical step in this direction was the adoption of Post-Training Quantization – converting weights and activations from the float32 format to int8 without the need for model retraining. Even in the early stages of quantization, this approach reduced model sizes by a factor of four and eliminated the need for GPU dependency. This simplified the infrastructure, enabled CPU-based operations, and significantly cut costs.

Optimization Results

Following quantization, CPU loads decreased by 10-12%, resulting in annual savings of tens of thousands of dollars on cloud platforms like AWS, Azure, and GCP. Additionally, stable inference with high video stream throughput enabled seamless scaling without compromising performance or quality.

As a result, the IREX.ai platform has successfully expanded into new markets, including the US, UK, Egypt, UAE, and others. This growth has been driven by a combination of technical optimization and a flexible, scalable architecture.

Technical Perspective: How the Optimizations Work

Neural networks typically contain hundreds of millions of parameters in float32 format, requiring gigabytes of memory and substantial computational power. IREX.ai uses models like YOLO, VGG, and ResNet50, which were not originally designed for CPU inference.

Quantization, particularly Post-Training Quantization, reduces the precision of these weights from float32 to int8, cutting model sizes by fourfold and speeding up processing. This approach is particularly valuable in real-time video stream scenarios where speed, stability, and efficiency are crucial.

Scaling Through Engineering

IREX.ai's experience demonstrates that scaling doesn't always require additional computational power. Sometimes, a different perspective on the problem, leveraging optimizations like quantization, pruning, or clustering, can be just as effective.

Thanks to its optimized architecture and GPU independence, the platform is now deployed in high-impact projects – from detecting weapons in public spaces to locating missing persons. In collaboration with international partners, IREX.ai is developing solutions that make cities smarter and society safer.

Looking Ahead

The team continues to advance its infrastructure, exploring edge inference, AutoML, and other cutting-edge technologies. However, the shift to CPU inference and the adoption of quantization remains the foundation for sustainable growth and international expansion.

2524 views

Stay Ahead in Tech & Startups

Get monthly email with insights, trends, and tips curated by Founders

Moving Away from GPUs: The Path to Efficient Infrastructure
Optimization Results
Technical Perspective: How the Optimizations Work
Scaling Through Engineering
Looking Ahead

Optimizing Neural Network Inference on CPUs: IREX.ai's Experience

Table of contents

Moving Away from GPUs: The Path to Efficient Infrastructure

Optimization Results

Technical Perspective: How the Optimizations Work

Scaling Through Engineering

Looking Ahead

Stay Ahead in Tech & Startups

Table of contents

Pangea Unveils Advanced AI Security Platform

E-Commerce Trends: The Role of AI in Shaping Online Retail (2023-2024)

Elon Musk's xAI and X Corp Leverage Public Data for Ambitious AI Goals: A Privacy Policy Update Makes Waves

Expert Explains: The Role of Open Source in Shaping Software — and Why It Matters

Google Is Set to Make Android "All-In on AI"