Biphoo.eu - Guest Posting Services

collapse
Home / Daily News Analysis / Researchers build an encrypted routing layer for private AI inference

Researchers build an encrypted routing layer for private AI inference

Apr 21, 2026  Twila Rosenbaum  6 views
Researchers build an encrypted routing layer for private AI inference

Organizations in sectors such as healthcare and finance are increasingly seeking ways to leverage large AI models without exposing sensitive data to the cloud servers that operate these models. Researchers have introduced a cryptographic method known as Secure Multi-Party Computation (MPC) that addresses this need. This technique involves dividing data into encrypted fragments, which are then distributed across multiple servers that do not share any information with each other. It allows these servers to compute AI results without ever accessing the raw input.

However, a significant challenge remains: speed. A typical mid-sized language model that can produce results in under a second may take over 60 seconds when processed through MPC due to the substantial encryption overhead.

Limitations of Existing Solutions

Previous efforts in private inference have concentrated on redesigning AI models to function more efficiently under encryption. While these initiatives have been helpful, they share a crucial structural limitation: each query, regardless of its complexity, is processed through the same model at a uniform cost.

In conventional AI deployment, a common optimization strategy involves directing simple queries to smaller, faster models while reserving larger, more expensive models for complex queries that genuinely require them. This routing method is standard in plaintext systems, but implementing it under encryption poses challenges since the routing decision typically requires access to the input, which must remain encrypted throughout.

Introducing SecureRouter

To tackle these challenges, researchers at the University of Central Florida have developed a system called SecureRouter, which introduces input-adaptive routing for encrypted AI inference. This system maintains a diverse pool of models, varying in size from a compact model with approximately 4.4 million parameters to a more extensive model with around 340 million parameters. A lightweight routing component assesses each incoming encrypted query and selects the most suitable model from the pool to process it, all while maintaining encryption. The routing decision is never disclosed in plaintext, ensuring data security.

The router is designed to balance accuracy with computational cost, measuring cost in terms of encrypted execution time instead of the parameter counts typically used in plaintext systems. A load-balancing goal ensures that the router does not default to a single model for all queries, promoting efficient use of available resources.

Performance Enhancements

When tested against SecFormer, a private inference system that relies on a fixed large model, SecureRouter demonstrated a significant reduction in average inference time by 1.95 times across five language understanding tasks. Speed improvements varied, with a 1.83 times increase on the most demanding task and a 2.19 times increase on the simplest task, showcasing the router’s capacity to align model size with query complexity.

In comparison to executing a large model for every query, regardless of its complexity, the average speedup across eight benchmark tasks was 1.53 times. Most tasks achieved accuracy levels within a fraction of a percentage point of the large-model baseline. However, one task related to grammatical analysis experienced a more noticeable accuracy decline, indicating that some highly specialized tasks may be sensitive to being processed by a smaller model.

Minimal Overhead

Integrating a routing layer into an encrypted inference system could potentially become a bottleneck. Nevertheless, the routing component in practice utilizes about 39 MB of memory within a two-server setup, which is comparable to the 38 MB required by the smallest model in the pool. The largest model necessitates around 3,100 MB of memory. The addition of the router results in approximately 4 seconds of extra inference time and contributes about 1.86 GB of network communication, figures that are similar to those observed when running the smallest model independently.

Practical Implications

The SecureRouter system does not necessitate the reconstruction of existing infrastructure. Instead, it operates on top of current MPC frameworks and utilizes standard language model architectures that are accessible through common libraries. Straightforward queries are resolved rapidly using smaller models, while more complex queries are escalated to larger models. The client submitting the query is only presented with the final result and does not gain insights into which model processed the request, thereby preserving confidentiality.


Source: Help Net Security News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy