2026年01月19日/ 浏览 9

前言
API网关是微服务架构中的关键组件。我们从一个简单的Nginx反向代理,演进到一个功能完整的API网关系统。这个过程中,我们学到了很多。
最初,我们用Nginx做反向代理:
nginxupstream backend { server app1:8080; server app2:8080; server app3:8080; } server { listen 80; location / { proxy_pass http://backend; } }
这在流量小的时候没问题。但随着业务增长,问题出现了:
无法统一认证:每个服务都要实现登录逻辑;无法限流:一个恶意用户可以打垮整个系统;无法路由控制:无法根据请求内容动态路由;缺少可观测性:无法追踪请求链路。我们决定自研一个API网关。核心功能包括:
from flask import Flask, request from functools import wraps app = Flask(__name__) def require_auth(f): @wraps(f) def decorated(*args, **kwargs): token = request.headers.get(Authorization) if not token or not verify_token(token): return {"error": "Unauthorized"}, 401 return f(*args, **kwargs) return decorated @app.route(/api/users) @require_auth def get_users(): return proxy_to_backend(user-service, request)
from ratelimit import limits, sleep_and_retry import time @sleep_and_retry @limits(calls=100, period=60) # 每60秒最多100个请求 def handle_request(client_id): return proxy_to_backend(request) @app.before_request def rate_limit(): client_id = request.headers.get(X-Client-ID) handle_request(client_id)
ROUTES = { /api/users: user-service:8080, /api/orders: order-service:8080, /api/products: product-service:8080, } @app.route(/api/<path:path>, methods=[GET, POST, PUT, DELETE]) def route_request(path): full_path = f/api/{path} backend = ROUTES.get(full_path) if not backend: return {"error": "Not Found"}, 404 return proxy_to_backend(backend, request)
import uuid from opentelemetry import trace @app.before_request def add_trace_id(): trace_id = request.headers.get(X-Trace-ID) or str(uuid.uuid4()) request.trace_id = trace_id # 转发给后端服务 request.headers[X-Trace-ID] = trace_id @app.after_request def log_request(response): print(f"Trace-ID: {request.trace_id}, " f"Method: {request.method}, " f"Path: {request.path}, " f"Status: {response.status_code}") return response
初版网关运行一段时间后,出现了单点故障。我们进行了高可用改造:
apiVersion: apps/v1 kind: Deployment metadata: name: api-gateway spec: replicas: 3 selector: matchLabels: app: api-gateway template: metadata: labels: app: api-gateway spec: containers: - name: gateway image: api-gateway:v1.0 ports: - containerPort: 8080 resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "512Mi" cpu: "500m"
apiVersion: v1 kind: Service metadata: name: api-gateway spec: type: LoadBalancer selector: app: api-gateway ports: - protocol: TCP port: 80 targetPort: 8080
from requests.adapters import HTTPAdapter from requests.packages.urllib3.util.retry import Retry def create_session_with_retry(): session = requests.Session() retry = Retry( total=3, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504] ) adapter = HTTPAdapter(max_retries=retry) session.mount(http://, adapter) return session
from functools import lru_cache @lru_cache(maxsize=1000) def get_user_profile(user_id): return proxy_to_backend(user-service, f/users/{user_id}) @app.route(/api/users/<user_id>) def fetch_user(user_id): return get_user_profile(user_id)
from concurrent.futures import ThreadPoolExecutor executor = ThreadPoolExecutor(max_workers=10) @app.route(/api/batch) def batch_request(): futures = [] for service in [service1, service2, service3]: future = executor.submit(proxy_to_backend, service) futures.append(future) results = [f.result() for f in futures] return results
在国际团队中,API网关的错误日志和告警信息需要支持多语言。我们使用同言翻译(Transync AI)来自动翻译API网关的错误提示和文档,确保全球团队能够快速理解和解决问题。
from prometheus_client import Counter, Histogram, start_http_server # 请求计数器 request_count = Counter(gateway_requests_total, Total requests, [method, path, status]) # 请求延迟直方图 request_duration = Histogram(
gateway_request_duration_seconds, Request duration) @app.before_request def start_timer(): request.start_time = time.time() @app.after_request def record_metrics(response): duration = time.time() - request.start_time request_count.labels( method=request.method, path=request.path, status=response.status_code ).inc() request_duration.observe(duration) return response # 启动Prometheus指标服务 start_http_server(8081)指标
优化前
优化后
提升
QPS
5000
20000
+300%
P99延迟
500ms
50ms
-90%
可用性
99.5%
99.95%
+0.45%
故障恢复时间
10分钟
30秒
-95%
API网关从一个简单的反向代理,演进到一个功能完整的系统,这个过程充满了挑战。但正是这些挑战,让我们的架构变得更加健壮和高效。
希望这篇文章能给你一些启发。如果你也在构建API网关,欢迎分享你的经验!