SpringCloud其他
SpringCloud其他
79. 你们的服务怎么做监控和告警?
回答:
我们用Prometheus收集监控数据,Grafana做可视化和告警。监控指标包括响应时间、错误率、请求量、系统资源使用情况。当指标超过阈值时,通过邮件、短信或Slack通知相关人员。还结合ELK做日志分析,快速定位问题。
分析:
监控体系其实就是团队的"千里眼",它能帮我们第一时间发现服务异常,避免小问题演变成大事故。设计监控时,最重要的是别只盯着技术指标,还要关注业务健康,比如下单成功率、支付延迟等。我们团队一开始只监控CPU、内存,后来发现业务异常经常漏报,才逐步补充了业务维度的监控。
告警这块,经验教训就是"宁可少报、不要乱报"。一旦告警太多,大家都麻木了,真正的故障反而没人管。我们会定期复盘告警规则,合并重复、优化阈值,还会把告警和自动化运维结合,比如自动重启服务、自动扩容等。
监控体系的建设其实是个持续演进的过程,随着业务发展、团队成熟,监控内容和方式也会不断调整。最终目标是让团队能用最短时间发现和定位问题,保障业务平稳运行。
监控配置示例:
# Prometheus配置
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'spring-boot-apps'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['app1:8080', 'app2:8080', 'app3:8080']
scrape_interval: 10s告警规则配置:
# 告警规则
groups:
- name: spring-boot-alerts
rules:
- alert: HighResponseTime
expr: http_server_requests_seconds_max > 2
for: 5m
labels:
severity: warning
annotations:
summary: "High response time detected"
description: "Service {{ $labels.instance }} has high response time"
- alert: HighErrorRate
expr: rate(http_server_requests_errors_total[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Service {{ $labels.instance }} has high error rate"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service is down"
description: "Service {{ $labels.instance }} is not responding"自定义指标收集:
@Component
public class CustomMetricsCollector {
private final MeterRegistry meterRegistry;
private final Counter orderCounter;
private final Timer orderProcessingTimer;
public CustomMetricsCollector(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.orderCounter = Counter.builder("orders.total")
.description("Total number of orders")
.register(meterRegistry);
this.orderProcessingTimer = Timer.builder("orders.processing.time")
.description("Order processing time")
.register(meterRegistry);
}
@EventListener
public void handleOrderCreated(OrderCreatedEvent event) {
orderCounter.increment();
Timer.Sample sample = Timer.start(meterRegistry);
// 处理订单逻辑
processOrder(event.getOrder());
sample.stop(orderProcessingTimer);
}
@Scheduled(fixedRate = 60000)
public void collectBusinessMetrics() {
// 收集业务指标
Gauge.builder("users.active")
.description("Active users count")
.register(meterRegistry, this, this::getActiveUsersCount);
}
}80. 你们的服务怎么做日志收集?
回答:
用Logback做日志框架,Filebeat收集日志文件,发送到Elasticsearch存储和索引,Kibana做可视化和分析。支持全文搜索、复杂查询、聚合分析,快速定位问题。
分析:
日志系统是分布式架构下的"黑匣子",没有它,排查问题就像"盲人摸象"。日志收集的难点在于服务多、节点多,日志格式和内容很容易五花八门。我们一开始就统一了日志格式,所有服务都用JSON结构化输出,这样后续检索和分析都方便。
技术上,日志采集要保证实时性和可靠性,不能丢日志,也不能拖延太久。我们用Filebeat做边车采集,配合Elasticsearch做存储和索引,Kibana做可视化。遇到过磁盘爆满、索引膨胀等问题,后来加了日志归档和冷热分层存储,既省钱又能查历史。
日志不仅仅用来排错,还能做安全审计、用户行为分析,甚至辅助业务决策。比如我们通过日志分析,发现某些接口被频繁刷单,及时做了风控。日志系统的价值,远不止"出问题时查一查"这么简单。
日志配置:
<!-- Logback配置 -->
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
<providers>
<timestamp/>
<logLevel/>
<loggerName/>
<message/>
<mdc/>
<stackTrace/>
</providers>
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/application.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/application.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
<providers>
<timestamp/>
<logLevel/>
<loggerName/>
<message/>
<mdc/>
<stackTrace/>
</providers>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="FILE"/>
</root>
</configuration>Filebeat配置:
# Filebeat配置
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/spring-boot/*.log
fields:
service: user-service
environment: production
fields_under_root: true
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
output.elasticsearch:
hosts: ["localhost:9200"]
index: "spring-boot-logs-%{+yyyy.MM.dd}"
setup.template.name: "spring-boot-logs"
setup.template.pattern: "spring-boot-logs-*"
setup.ilm.enabled: false日志分析查询:
// Elasticsearch查询示例
{
"query": {
"bool": {
"must": [
{
"match": {
"level": "ERROR"
}
},
{
"range": {
"@timestamp": {
"gte": "now-1h",
"lte": "now"
}
}
}
],
"filter": [
{
"term": {
"service": "user-service"
}
}
]
}
},
"aggs": {
"error_count": {
"terms": {
"field": "logger_name.keyword",
"size": 10
}
},
"error_timeline": {
"date_histogram": {
"field": "@timestamp",
"interval": "5m"
}
}
}
}81. 说下你对DDD的理解?
回答:
DDD就是领域驱动设计,说白了就是让代码和业务保持一致。它的核心思想是深入理解业务领域,然后用这个理解来指导软件设计,而不是反过来让业务去适应技术。
DDD里面有几个重要概念:领域模型是核心,反映业务规则;聚合根是领域模型的根,负责维护一致性;实体有唯一标识,值对象没有;领域服务处理跨实体的业务逻辑。这些概念都是为了更好地表达业务,让代码更贴近业务本质。
分析:
DDD不是简单的技术架构,而是一种思维方式,它要求我们跳出技术视角,真正理解业务本质。面试官问这个问题,是想看候选人是否具备业务思维,能否用领域模型来指导技术设计。
DDD的核心价值在于"统一语言",让业务人员和技术人员说同一种语言。比如订单系统中的"订单"、"商品"、"库存",这些概念在业务和技术层面都要保持一致。聚合根的设计体现了业务边界,比如订单聚合包含了订单项、收货地址等,这些数据要么一起成功,要么一起失败。
在实际项目中,DDD的应用需要循序渐进。一开始可能只是简单的实体设计,随着对业务理解的深入,逐步引入值对象、领域服务、领域事件等概念。微服务拆分时,DDD的限界上下文概念特别有用,它能帮助我们找到合适的服务边界,避免服务之间的过度耦合。
DDD的难点不在于概念理解,而在于实践落地。需要团队有足够的业务理解能力,需要产品经理和技术团队密切配合,需要持续的业务建模和代码重构。但一旦落地成功,代码的可读性、可维护性都会有质的提升。
领域模型设计:
// 聚合根示例
@Entity
@Table(name = "orders")
public class Order implements AggregateRoot<OrderId> {
@EmbeddedId
private OrderId id;
@Embedded
private CustomerInfo customerInfo;
@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
@JoinColumn(name = "order_id")
private List<OrderItem> items;
@Enumerated(EnumType.STRING)
private OrderStatus status;
@Embedded
private Money totalAmount;
// 领域方法
public void addItem(Product product, int quantity) {
OrderItem item = new OrderItem(product, quantity);
items.add(item);
recalculateTotal();
}
public void confirm() {
if (status != OrderStatus.CREATED) {
throw new DomainException("Order cannot be confirmed");
}
status = OrderStatus.CONFIRMED;
DomainEvents.publish(new OrderConfirmedEvent(this));
}
public void cancel() {
if (status == OrderStatus.SHIPPED) {
throw new DomainException("Shipped order cannot be cancelled");
}
status = OrderStatus.CANCELLED;
DomainEvents.publish(new OrderCancelledEvent(this));
}
private void recalculateTotal() {
totalAmount = items.stream()
.map(OrderItem::getSubtotal)
.reduce(Money.ZERO, Money::add);
}
}
// 值对象示例
@Embeddable
public class Money {
private BigDecimal amount;
private Currency currency;
public Money(BigDecimal amount, Currency currency) {
this.amount = amount;
this.currency = currency;
}
public Money add(Money other) {
if (!this.currency.equals(other.currency)) {
throw new DomainException("Cannot add different currencies");
}
return new Money(this.amount.add(other.amount), this.currency);
}
public Money multiply(int quantity) {
return new Money(this.amount.multiply(BigDecimal.valueOf(quantity)), this.currency);
}
}
// 领域服务示例
@Service
public class OrderDomainService {
private final InventoryService inventoryService;
private final PaymentService paymentService;
public OrderDomainService(InventoryService inventoryService,
PaymentService paymentService) {
this.inventoryService = inventoryService;
this.paymentService = paymentService;
}
public void processOrder(Order order) {
// 检查库存
for (OrderItem item : order.getItems()) {
if (!inventoryService.isAvailable(item.getProductId(), item.getQuantity())) {
throw new DomainException("Insufficient inventory");
}
}
// 处理支付
PaymentResult result = paymentService.processPayment(order.getTotalAmount());
if (!result.isSuccess()) {
throw new DomainException("Payment failed: " + result.getErrorMessage());
}
// 更新库存
for (OrderItem item : order.getItems()) {
inventoryService.reserve(item.getProductId(), item.getQuantity());
}
order.confirm();
}
}DDD在微服务中的应用:
在微服务架构中,每个微服务对应一个限界上下文,这样就能明确服务边界。我们把相关的实体和值对象组织成聚合,确保数据一致性。通过领域事件实现服务间的松耦合通信,领域服务处理跨聚合的业务逻辑,避免聚合之间的直接依赖。还用仓储模式来封装数据访问逻辑,保持领域模型的纯净性。