Difference between Serving Techniques: BentoML and Seldon Core

4 min readFeb 22, 2024

BentoML and Seldon Core are both powerful tools for serving machine learning models in production environments. While they share similarities in their objectives, there are distinct differences in their approaches, handling, operations, inner structures, and performance characteristics.

Handling:

BentoML:

BentoML focuses on creating a containerized environment for serving machine learning models. It allows packaging trained models along with their dependencies into Docker containers, enabling easy deployment and serving.
BentoML offers flexibility in handling various machine learning frameworks, allowing users to deploy models built with TensorFlow, PyTorch, Scikit-learn, and others.
It emphasizes simplicity and ease of use, providing a straightforward interface for packaging and deploying models without extensive configuration.
BentoML may face challenges in handling high request loads, as it might experience crashes under heavy traffic without proper load balancing mechanisms.

Seldon Core:

Seldon Core provides a comprehensive platform for deploying and managing machine learning models at scale. It offers advanced features for model deployment, scaling, monitoring, and governance.
Seldon Core leverages Kubernetes for orchestration, allowing seamless scaling of model deployments based on demand. It provides robust support for high availability and fault tolerance.
Unlike BentoML, Seldon Core offers sophisticated features for model versioning, A/B testing, and canary deployments, enabling advanced deployment strategies and experimentation.
Seldon Core’s architecture is designed to handle high request rates efficiently, with built-in mechanisms for load balancing and horizontal scaling. It excels in serving models reliably even under heavy workloads.

Operations:

BentoML:

BentoML simplifies the process of packaging and deploying machine learning models, making it suitable for smaller-scale deployments and rapid prototyping.
It provides a user-friendly interface for managing model versions and deployments, allowing users to easily update and roll back models as needed.
BentoML’s operational capabilities are more focused on simplicity and ease of use, which may limit its suitability for complex production environments with stringent requirements.

Seldon Core:

Seldon Core caters to enterprise-grade deployments with a comprehensive set of operational features. It offers fine-grained control over model deployments, monitoring, and governance.
The platform supports advanced operational workflows, including model governance, compliance, and auditing, making it suitable for regulated industries and large-scale deployments.
Seldon Core integrates with various monitoring and logging systems, enabling comprehensive observability into model performance and behavior in production environments.

Inner Structure:

BentoML:

Internally, BentoML utilizes lightweight containers to encapsulate machine learning models along with their dependencies and serving logic.
It follows a modular architecture, allowing users to extend and customize functionality through plugins and extensions.
BentoML’s architecture prioritizes simplicity and modularity, aiming for lightweight and flexible deployments.

Seldon Core:

Seldon Core’s architecture is built on top of Kubernetes, leveraging its orchestration capabilities for managing model deployments at scale.
It consists of multiple components, including the Seldon Deployment Manager, Seldon Core API Server, and model servers, orchestrated within Kubernetes clusters.
Seldon Core’s architecture is designed for scalability, resilience, and extensibility, supporting complex deployment scenarios and integration with various ecosystem tools.

Performance:

Load Balancing Testing:

Load balancing testing reveals differences in how BentoML and Seldon Core handle high request rates:
Seldon Core demonstrates superior performance in handling a high volume of requests, thanks to its built-in mechanisms for load balancing and horizontal scaling.
BentoML may experience crashes or degraded performance under heavy loads, especially without proper load balancing configurations.

Flexibility vs. Scalability:

BentoML prioritizes flexibility and ease of use, making it suitable for smaller-scale deployments and rapid experimentation.
Seldon Core emphasizes scalability, resilience, and advanced operational capabilities, making it ideal for large-scale, mission-critical deployments in enterprise environments.

Ecosystem Integration:

Both BentoML and Seldon Core integrate with popular machine learning frameworks and tools, but Seldon Core offers deeper integration with Kubernetes and ecosystem components, providing a more comprehensive solution for production deployments.

Community and Support:

Consider the community size, support offerings, and ecosystem partnerships associated with each platform, as these factors can impact long-term viability and adoption in enterprise environments.

Here’s a simple example demonstrating how you can implement a machine learning model serving using BentoML and Seldon Core

BentoML Implementation:

# Importing required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from bentoml import BentoService, api, env, artifacts
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModelArtifact
# Defining BentoML service
@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('model')])
class RandomForestClassifierService(BentoService):
    # Define an API endpoint for model inference
    @api(input=DataframeInput(), batch=True)
    def predict(self, df: pd.DataFrame):
        return self.artifacts.model.predict(df)
        
# Instantiate the model and save it using BentoML
model = RandomForestClassifier()
model.fit(X_train, y_train)
bento_service = RandomForestClassifierService()
bento_service.pack('model', model)
saved_path = bento_service.save()
print("Model saved in BentoML format:", saved_path)

Seldon Core Implementation (Kubernetes YAML):

apiVersion: machinelearning.seldon.io/v1alpha3
kind: SeldonDeployment
metadata:
  name: my-model-deployment
spec:
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: gs://your-bucket/your-model
      name: classifier
      parameters:
        - name: method
          type: STRING
          value: predict
    name: default
    replicas: 1

In the Seldon Core implementation, modelUri should point to the location where your serialized model is stored, such as a cloud storage bucket. Replace gs://your-bucket/your-model with the appropriate path.

In summary, while BentoML and Seldon Core share common goals of serving machine learning models in production, they differ significantly in their handling, operations, inner structures, and performance characteristics. Choosing between the two depends on the specific requirements, scale, and complexity of the deployment environment, with Seldon Core offering a more robust solution for enterprise-grade deployments requiring scalability, reliability, and advanced operational features

Difference between Serving Techniques: BentoML and Seldon Core

Written by Yadnyesh Gosavi