MLOps : Pipeline de Machine Learning
Découvrez comment mettre en place un pipeline MLOps complet pour vos projets ML.
Configuration du Pipeline
1. Structure du Projet
project/
├── data/
│ ├── raw/
│ └── processed/
├── models/
│ ├── trained/
│ └── deployed/
├── src/
│ ├── data/
│ ├── features/
│ ├── models/
│ └── utils/
├── tests/
├── notebooks/
├── configs/
└── .github/
└── workflows/
2. Configuration DVC
# dvc.yaml
stages:
prepare:
cmd: python src/data/prepare.py
deps:
- data/raw
outs:
- data/processed
train:
cmd: python src/models/train.py
deps:
- data/processed
outs:
- models/trained
evaluate:
cmd: python src/models/evaluate.py
deps:
- models/trained
- data/processed
Pipeline de Données
1. Préparation des Données
# src/data/prepare.py
import pandas as pd
from sklearn.model_selection import train_test_split
def prepare_data():
# Chargement
data = pd.read_csv("data/raw/data.csv")
# Nettoyage
data = clean_data(data)
# Feature engineering
data = engineer_features(data)
# Split
train, test = train_test_split(data, test_size=0.2)
# Sauvegarde
train.to_csv("data/processed/train.csv")
test.to_csv("data/processed/test.csv")
2. Feature Engineering
# src/features/engineering.py
from sklearn.preprocessing import StandardScaler
def engineer_features(data):
# Création de features
data["new_feature"] = data["feature1"] * data["feature2"]
# Normalisation
scaler = StandardScaler()
data[["feature1", "feature2"]] = scaler.fit_transform(data[["feature1", "feature2"]])
return data
Pipeline d’Entraînement
1. Configuration MLflow
# src/models/train.py
import mlflow
from mlflow.tracking import MlflowClient
def train_model():
# Configuration MLflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("model_training")
with mlflow.start_run():
# Paramètres
params = {
"learning_rate": 0.1,
"epochs": 100,
"batch_size": 32
}
# Entraînement
model = train(params)
# Logging
mlflow.log_params(params)
mlflow.log_metrics({"accuracy": 0.95})
mlflow.log_model(model, "model")
2. Évaluation des Modèles
# src/models/evaluate.py
from sklearn.metrics import accuracy_score, precision_score, recall_score
def evaluate_model(model, test_data):
# Prédictions
predictions = model.predict(test_data)
# Métriques
metrics = {
"accuracy": accuracy_score(test_data.target, predictions),
"precision": precision_score(test_data.target, predictions),
"recall": recall_score(test_data.target, predictions)
}
return metrics
Déploiement et Monitoring
1. Configuration Docker
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "src/app.py"]
2. Monitoring avec Prometheus
# src/monitoring/metrics.py
from prometheus_client import Counter, Histogram
# Métriques
PREDICTIONS = Counter('model_predictions_total', 'Total des prédictions')
LATENCY = Histogram('model_latency_seconds', 'Latence des prédictions')
def track_prediction(latency):
PREDICTIONS.inc()
LATENCY.observe(latency)
CI/CD avec GitHub Actions
1. Pipeline d’Entraînement
# .github/workflows/train.yml
name: Train Model
on:
push:
branches: [ main ]
workflow_dispatch:
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Train model
run: dvc repro train
- name: Upload model
uses: actions/upload-artifact@v2
with:
name: model
path: models/trained
2. Pipeline de Déploiement
# .github/workflows/deploy.yml
name: Deploy Model
on:
workflow_run:
workflows: ["Train Model"]
types:
- completed
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Download model
uses: actions/download-artifact@v2
with:
name: model
- name: Build and push Docker image
run: |
docker build -t model-app .
docker push model-app
- name: Deploy to production
run: |
kubectl apply -f k8s/
Bonnes Pratiques
-
Versioning
- Code
- Données
- Modèles
- Configurations
-
Monitoring
- Métriques
- Logs
- Alertes
- Performance
-
Sécurité
- Secrets
- Permissions
- Validation
- Audit
Conclusion
Points clés à retenir :
- Pipeline automatisé
- Versioning complet
- Monitoring efficace
- Déploiement sécurisé
Recommandations :
- Automatiser les processus
- Suivre les métriques
- Documenter les changements
- Se former sur MLOps
- Maintenir la qualité
À propos de InSkillCoach
Expert en formation et technologies
Coach spécialisé dans les technologies avancées et l'IA, porté par GNeurone Inc.
Certifications:
- AWS Certified Solutions Architect – Professional
- Certifications Google Cloud
- Microsoft Certified: DevOps Engineer Expert
- Certified Kubernetes Administrator (CKA)
- CompTIA Security+
517
265
Commentaires
Les commentaires sont alimentés par GitHub Discussions
Connectez-vous avec GitHub pour participer à la discussion