Skip to content

[FEAT] 무중단 배포 도입#352

Open
unifolio0 wants to merge 2 commits intodevelopfrom
feat/#351
Open

[FEAT] 무중단 배포 도입#352
unifolio0 wants to merge 2 commits intodevelopfrom
feat/#351

Conversation

@unifolio0
Copy link
Copy Markdown
Contributor

closed #351

작업 내용

스크린샷

참고 사항

@unifolio0 unifolio0 self-assigned this Apr 12, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

이번 PR은 서비스의 안정적인 운영을 위해 무중단 배포 체계를 도입하고, 장애 발생 시 신속하게 대응할 수 있도록 Slack 알림 시스템을 구축하는 것을 목적으로 합니다. Traefik을 이용한 Blue-Green 배포 구조를 통해 배포 중 서비스 중단을 최소화하며, 커스텀 로깅 필터와 레이아웃을 통해 가독성 높은 에러 리포팅 환경을 마련했습니다.

Highlights

  • 무중단 배포 도입: Traefik을 활용한 Blue-Green 배포 환경을 구축하고, 이를 자동화하기 위한 배포 및 롤백 스크립트를 추가했습니다.
  • Slack 에러 알림 통합: logback-slack-appender를 도입하여 서버 에러 발생 시 Slack으로 실시간 알림을 받을 수 있도록 설정했습니다.
  • 로그 중복 방지: RateLimitingTurboFilter를 구현하여 동일한 에러가 반복적으로 발생할 경우 Slack 알림이 도배되지 않도록 1분 쿨다운을 적용했습니다.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (2)
    • .github/workflows/cd-api-dev.yml
    • .github/workflows/cd-api-prod.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.


Blue and Green in dance, Traefik guides the traffic flow, Errors caught in Slack, Stability starts to grow, Deployment without a woe.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

Summary by CodeRabbit

릴리스 노트

  • 새 기능

    • Slack을 통한 에러 알림 기능 추가
    • 프로덕션 배포 시 무중단 배포(Blue-Green) 방식으로 변경
    • 반복적인 에러 로그 중복 알림 방지를 위한 속도 제한 추가
    • 배포 후 서비스 상태 확인 헬스 체크 기능 추가
  • 개선사항

    • 프로덕션 인프라 구성 최적화를 통한 배포 신뢰성 향상

개요

이 변경 사항은 무중단 배포(블루-그린 배포) 인프라와 에러 로깅을 위한 Slack 통합을 도입합니다. Traefik 기반 라우팅, 배포/롤백 스크립트, Slack 알림 및 로그 레이트 제한 기능이 추가되었습니다.

워크스루

무중단 배포를 위한 블루-그린 배포 아키텍처가 도입되었으며, Traefik 라우팅, 배포/롤백 스크립트, Slack 기반 에러 알림, 로그 레이트 제한 필터가 추가되었습니다. 프로덕션 및 개발 환경에 Slack 웹훅 환경 변수가 설정되었습니다.

변경 사항

Cohort / File(s) 요약
Slack 통합 인프라
build.gradle, src/main/java/.../RateLimitingTurboFilter.java, src/main/java/.../SlackErrorLayout.java, src/main/resources/logback-spring.xml
Logback Slack 어댑터 의존성 추가, 에러 로그 레이트 제한을 위한 TurboFilter 구현, Slack 메시지 형식 지정을 위한 커스텀 레이아웃 클래스 추가, 비동기 Slack 어댑터 설정 및 프로필별 활성화
Slack 설정
src/main/resources/application-dev.yml, src/main/resources/application-prod.yml
dev/prod 환경별 Slack 웹훅 URL, 채널, 환경 구분값 설정
블루-그린 배포 스크립트
docker/prod/api/deploy.sh, docker/prod/api/rollback.sh
무중단 배포 및 롤백을 위한 Bash 스크립트 추가, 헬스 체크 루프, Traefik 라우팅 안정화 대기, 우아한 종료 처리
프로덕션 인프라
docker/prod/api/docker-compose-prod.yml, docker/prod/api/traefik/traefik.yml
Nginx 제거 및 Traefik 기반 라우팅 도입, API를 blue/green 서비스로 분리, 헬스 체크 추가, Traefik 설정 파일 생성
개발 및 CI/CD 설정
docker/dev/docker-compose-dev.yml, .github/workflows/cd-api-dev.yml, .github/workflows/cd-api-prod.yml
개발 환경에 SLACK_WEBHOOK_URL_DEV 추가, cd-api-dev 워크플로우에 환경 변수 추가, cd-api-prod 워크플로우 전면 재구성: 배포 스크립트 실행 및 헬스 체크 검증

시퀀스 다이어그램

sequenceDiagram
    participant GHA as GitHub Actions
    participant Docker as Docker Daemon
    participant Traefik
    participant API_Blue as API (Blue)
    participant API_Green as API (Green)
    
    GHA->>Docker: Pull compose files & deploy.sh
    GHA->>Docker: Check active container (blue/green)
    alt Neither active
        GHA->>Docker: Start Traefik
    else One active
        GHA->>Docker: Traefik already running
    end
    
    GHA->>Docker: Start target API service<br/>(blue/green profile)
    Docker->>API_Blue: Container up
    
    loop Health Check
        Docker->>API_Blue: Poll /actuator/health
        API_Blue-->>Docker: 200 OK (healthy)
    end
    
    GHA->>Traefik: Wait for routing stabilization
    Traefik->>API_Blue: Route requests
    
    GHA->>Docker: Stop previous container<br/>(graceful + force remove)
    Docker->>API_Green: Stop & remove
    
    GHA->>Docker: curl health verification
    Docker-->>GHA: Success
Loading
sequenceDiagram
    participant App as Application
    participant Logback as Logback
    participant RateFilter as RateLimitingTurboFilter
    participant Appender as SlackAppender
    participant Slack
    
    App->>Logback: Log ERROR event
    Logback->>RateFilter: decide(ERROR)
    
    alt Within cooldown window
        RateFilter-->>Logback: DENY
        Logback-->>App: Log suppressed
    else Outside cooldown window
        RateFilter-->>Logback: NEUTRAL (allow)
        Logback->>Appender: ASYNC_SLACK appender
        Appender->>Appender: Format with SlackErrorLayout
        Appender->>Slack: POST webhook request
        Slack-->>Appender: 200 OK
    end
Loading

추정 코드 리뷰 노력

🎯 4 (Complex) | ⏱️ ~45분

추천 검토자

  • nak-honest
  • kargowild

🐰 블루와 그린이 춤을 추네,
무중단 배포로 서비스는 쉬어 가고,
Slack의 목소리로 에러 알림,
로그는 현명하게 걸러지며,
운영의 악몽은 이제 안녕! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR 제목 '[FEAT] 무중단 배포 도입'은 blue-green 배포 스크립트, Traefik 라우팅, 무중단 배포 자동화 등의 핵심 변경사항을 정확히 반영합니다.
Description check ✅ Passed PR 설명이 기본 템플릿 형식만 포함하고 구체적인 내용이 없어 무의미하지만, '#351' 이슈 참조로 변경사항의 의도와 관련성이 있습니다.
Linked Issues check ✅ Passed 링크된 이슈 #351도 구체적인 요구사항이나 수용 기준이 없는 템플릿 상태이지만, 커밋 메시지와 변경사항이 무중단 배포 및 Slack 에러 알림이라는 명확한 기능 구현을 보여줍니다.
Out of Scope Changes check ✅ Passed 모든 변경사항이 무중단 배포 인프라(blue-green, Traefik, 배포 스크립트) 또는 운영 기능(Slack 알림)과 직접 관련되어 있어 범위 내입니다.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#351

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a Blue-Green deployment strategy using Traefik and Docker Compose, supported by new deployment and rollback scripts. It also integrates a Slack notification system for error logging, including a custom layout and a rate-limiting filter to manage alert frequency. Feedback focuses on enhancing code quality and maintainability, specifically recommending the use of thread-safe DateTimeFormatter for date handling, utilizing YAML anchors to eliminate configuration redundancy, and removing sudo from deployment scripts to improve portability. Additionally, it is suggested to use variables for timeout values in scripts to ensure consistency.

public class SlackErrorLayout extends LayoutBase<ILoggingEvent> {

private static final int MAX_STACK_TRACE_LENGTH = 2000;
private static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

SimpleDateFormat은 스레드 안전(thread-safe)하지 않습니다. Layout.doLayout은 여러 스레드에서 동시에 호출될 수 있으므로, 날짜 형식이 손상되거나 예외가 발생할 위험이 있습니다. Java 8 이상에서는 스레드 안전한 java.time.format.DateTimeFormatter를 사용하는 것이 권장됩니다.

Suggested change
private static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private static final java.time.format.DateTimeFormatter DATE_FORMAT = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").withZone(java.time.ZoneId.systemDefault());

sb.append(":label: *Request ID:* `").append(requestId).append("`\n");
}

sb.append(":clock3: *Time:* ").append(DATE_FORMAT.format(new Date(event.getTimeStamp()))).append("\n");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

DateTimeFormatter를 사용하도록 변경함에 따라 날짜 포맷팅 로직을 수정해야 합니다.

Suggested change
sb.append(":clock3: *Time:* ").append(DATE_FORMAT.format(new Date(event.getTimeStamp()))).append("\n");
sb.append(":clock3: *Time:* ").append(DATE_FORMAT.format(java.time.Instant.ofEpochMilli(event.getTimeStamp()))).append("\n");

container_name: nginx
ports:
- "80:80"
kokomen-api-green:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

kokomen-api-green 서비스의 설정이 kokomen-api-blue와 거의 동일합니다. YAML anchor(&)와 alias(*)를 사용하여 공통 설정을 정의하면 중복을 제거하고 유지보수성을 높일 수 있습니다.

# Step 1: Traefik이 실행 중인지 확인
if ! docker ps -q -f name=traefik | grep -q .; then
log_info "Step 0: Traefik 시작"
sudo -E docker compose -f $COMPOSE_FILE up -d traefik
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

스크립트 내에서 sudo를 사용하는 것은 실행 환경의 권한 설정에 의존하게 되어 이식성이 떨어집니다. 일반적으로 배포 사용자를 docker 그룹에 포함시켜 sudo 없이 명령을 실행할 수 있도록 설정하는 것이 권장됩니다.

Suggested change
sudo -E docker compose -f $COMPOSE_FILE up -d traefik
docker compose -f $COMPOSE_FILE up -d traefik


# 현재 활성 컨테이너 종료
log_info "현재 컨테이너 종료: kokomen-api-$CURRENT"
docker stop -t 65 "kokomen-api-$CURRENT" || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

기존 컨테이너 종료 시의 대기 시간(65)이 하드코딩되어 있습니다. deploy.sh와 동일하게 상단에 변수를 정의하여 관리하는 것이 일관성 측면에서 좋습니다.

@github-actions
Copy link
Copy Markdown

Test Results

 50 files   50 suites   1m 16s ⏱️
279 tests 278 ✅ 1 💤 0 ❌
281 runs  280 ✅ 1 💤 0 ❌

Results for commit d1c8d66.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/cd-api-prod.yml (1)

53-62: ⚠️ Potential issue | 🟠 Major

배포 job이 빌드한 커밋이 아니라 main 최신 상태를 다시 가져오고 있습니다.

Line 59-62는 deploy 시점의 main HEAD를 다시 checkout/pull하므로, build job이 만든 이미지와 다른 revision의 deploy.sh/docker-compose-prod.yml/traefik.yml 조합을 배포할 수 있습니다. blue-green 전환 로직은 이미지와 배포 스크립트가 같은 커밋이어야 하니 ${{ github.sha }}로 고정하는 편이 안전합니다.

수정 예시
-          git fetch origin main
-          git checkout main
+          git fetch origin "${GITHUB_SHA}"
+          git checkout --detach "${GITHUB_SHA}"
           git sparse-checkout set docker/prod/api
-          git pull origin main
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-api-prod.yml around lines 53 - 62, The deployment step
"Pull docker compose and deployment files" currently checks out and pulls the
latest main, causing deploy to potentially use a different revision than the
build; change it to fetch and checkout the exact build commit (use the GitHub
Actions commit variable ${{ github.sha }}) instead of pulling main so the
deployed docker/prod/api files (docker-compose-prod.yml, deploy.sh, traefik.yml)
match the built image; specifically, replace the git checkout/pull of main with
a fetch of the specific commit and checkout that commit (or use
FETCH_HEAD/$GITHUB_SHA) after sparse-checkout set, ensuring the workflow
variable ${{ github.sha }} is used to lock the revision.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cd-api-prod.yml:
- Around line 93-96: The "Verify deployment" step currently swallows failures by
using `curl -sf ... || echo ...`; change it so a failed health check causes the
job to fail by replacing the swallow logic with an explicit non-zero exit path
(e.g., run the same curl and on failure print the message and exit 1). Locate
the step named "Verify deployment" and the line containing `curl -sf
http://localhost:80/actuator/health || echo "Health check endpoint not
responding"` and update it to print the error message and then exit non-zero so
the workflow fails and can trigger rollback/alerting.

In `@build.gradle`:
- Line 57: Replace the incompatible Logback Slack appender dependency
'com.github.maricn:logback-slack-appender:1.6.1' with the maintained fork
'com.cyfrania:logback-slack-appender:1.2' in the Gradle dependencies so the app
uses a Logback 1.5.x–compatible, actively maintained artifact; update the
dependency declaration where the current implementation line contains
'com.github.maricn:logback-slack-appender:1.6.1' to use
'com.cyfrania:logback-slack-appender:1.2' instead.

In `@docker/prod/api/deploy.sh`:
- Around line 13-21: The script mixes plain docker calls with sudo -E docker
compose causing failures on hosts where docker requires sudo; define a single
command variable (e.g., DOCKER_CMD) that detects whether sudo is needed (try
running `docker ps` and fall back to `sudo -E docker`) and use that variable
everywhere instead of raw `docker` or `sudo -E docker compose`, and update all
affected helpers (get_active, wait_healthy, the old-container cleanup logic and
any docker compose invocations) to invoke $DOCKER_CMD (and $DOCKER_CMD compose)
so all docker operations run under the same privilege model.
- Line 4: COMPOSE_FILE is set to a relative file name
("docker-compose-prod.yml") which breaks when deploy.sh is run outside
docker/prod/api; change the assignment in deploy.sh so COMPOSE_FILE points to
the docker-compose-prod.yml located next to the script by resolving the script's
directory (i.e., base the path on the script's directory rather than the current
working directory) so references to "docker-compose-prod.yml" always work
regardless of where the script is executed.

In `@docker/prod/api/docker-compose-prod.yml`:
- Around line 49-57: The Traefik labels currently use identical router/service
names (traefik.http.routers.api.* and traefik.http.services.api.*) so blue and
green containers are merged into one LB pool; change the strategy to perform
real blue-green switches by either (a) assigning distinct service/router names
per color (e.g., traefik.http.services.api-blue.* and
traefik.http.services.api-green.* with matching
traefik.http.routers.api-blue.rule / api-green.rule), or (b) toggling
traefik.enable=true/false on the inactive container from your deployment scripts
(deploy.sh / rollback.sh), or (c) use router priority and switch the active
router during deploy; update the docker-compose labels and the deploy/rollback
scripts to implement one of these approaches so only the active color receives
traffic.

In `@docker/prod/api/rollback.sh`:
- Around line 12-29: Mixed use of sudo causes failures when Docker requires
elevated privileges; make Docker invocations consistent by introducing a single
command variable (e.g., DOCKER_CMD) that conditionally includes "sudo -E" and
use that variable everywhere (replace direct calls in get_active,
check_container_exists and all other docker ps/inspect/start/stop/rm/docker
compose usages across the script) so active color detection, health checks and
container cleanup all run with the same privilege context.
- Around line 5-6: 현재 HEALTH_TIMEOUT(60)과 HEALTH_INTERVAL(5)은 Compose
healthcheck 설정(start_period: 40s, interval: 10s)의 마지막 가능한 검사 시점보다 짧아 false
negative가 발생할 수 있으므로 HEALTH_TIMEOUT 값을 Compose의 start_period 및 interval을 고려해 늘려야
합니다; 업데이트할 곳은 스크립트 상단의 HEALTH_TIMEOUT/HEALTH_INTERVAL 변수와 루프 검사 타이밍을 사용하는
rollback logic(해당 블록 및 37-47행 근처)이고, 간단히 HEALTH_TIMEOUT을 start_period + interval
(또는 그 이상, 예: 40s + 10s = 50s보다 크게) 기준으로 재계산하거나 문서화된 Compose 값에 맞춰 더 큰 값(예: 70s)을
설정하여 마지막 Compose 건강검사 후에도 롤백 타임아웃이 남도록 조정하세요.
- Line 4: The COMPOSE_FILE variable in rollback.sh is currently a bare filename
which breaks when the script is run outside docker/prod/api; update rollback.sh
to build an explicit path for COMPOSE_FILE (e.g., derive the script directory
with dirname "$0" or resolve the repository root) and join it with
"docker-compose-prod.yml" so the script always points to the correct file
regardless of current working directory; change the COMPOSE_FILE assignment in
rollback.sh accordingly (refer to the COMPOSE_FILE variable and the rollback.sh
script).

In
`@src/main/java/com/samhap/kokomen/global/logging/RateLimitingTurboFilter.java`:
- Around line 11-44: The current RateLimitingTurboFilter class (decide method)
is a TurboFilter that returns DENY for rate-limited ERROR events which prevents
all appenders (Slack, file, console) from receiving them; change this to an
appender-level Filter implementation (e.g., create RateLimitingFilter that
extends ch.qos.logback.core.filter.Filter and implements decide/decideInternal)
and attach that filter only to the Slack appender so only Slack notifications
are rate-limited; update places that register RateLimitingTurboFilter to instead
add the new RateLimitingFilter to the Slack appender configuration and keep the
global TurboFilter logic removed or converted to a NEUTRAL-only TurboFilter if
global behavior is still needed.
- Around line 29-37: The current pattern in RateLimitingTurboFilter using
lastLogTimes.get(errorKey) followed by lastLogTimes.put(errorKey, now) is not
atomic and can race; replace that logic by using lastLogTimes.compute(errorKey,
...) to atomically inspect the previous timestamp and decide whether to update
to now or keep the old value so the cooldownMillis check is done atomically;
update the code around buildErrorKey(...) / lastLogTimes / cooldownMillis so
compute returns the correct stored timestamp and use its result to return
FilterReply.DENY when the cooldown has not expired, otherwise allow logging and
set the timestamp to now.

In `@src/main/java/com/samhap/kokomen/global/logging/SlackErrorLayout.java`:
- Around line 13-15: SlackErrorLayout currently defines DATE_FORMAT as a shared
SimpleDateFormat which is not thread-safe; replace it with a thread-safe
java.time DateTimeFormatter (e.g. DateTimeFormatter.ofPattern("yyyy-MM-dd
HH:mm:ss")) and update all code in SlackErrorLayout that uses DATE_FORMAT to
format timestamps via java.time types (Instant/LocalDateTime/ZonedDateTime with
a ZoneId) instead of SimpleDateFormat; keep MAX_STACK_TRACE_LENGTH as-is and
ensure the new DateTimeFormatter is a static final constant so formatting is
immutable and safe in multithreaded contexts.

In `@src/main/resources/logback-spring.xml`:
- Around line 49-58: The AsyncAppender named ASYNC_SLACK is currently using
blocking behavior (default neverBlock=false) which can delay request threads
when the queue is full; update the ASYNC_SLACK appender configuration to enable
non-blocking mode by adding the neverBlock element with value true (i.e., set
neverBlock to true on the AsyncAppender definition) so events are dropped
instead of blocking request processing; ensure you modify the <appender
name="ASYNC_SLACK" class="ch.qos.logback.classic.AsyncAppender"> block (the
AsyncAppender configuration that references SLACK) to include
<neverBlock>true</neverBlock> and keep includeCallerData, queueSize and
discardingThreshold as appropriate.
- Around line 9-12: The global turboFilter using
com.samhap.kokomen.global.logging.RateLimitingTurboFilter currently denies
repeated ERROR events for all appenders, causing FILE logs to be dropped; update
the logging config so rate-limiting applies only to the Slack appender: remove
or disable the <turboFilter> block that registers RateLimitingTurboFilter
globally and instead add an equivalent rate-limit filter configuration inside
the SLACK appender (or create a separate appender-ref/group for SLACK that uses
RateLimitingTurboFilter), ensuring the FILE appender remains unfiltered so ERROR
events still reach file logs.

---

Outside diff comments:
In @.github/workflows/cd-api-prod.yml:
- Around line 53-62: The deployment step "Pull docker compose and deployment
files" currently checks out and pulls the latest main, causing deploy to
potentially use a different revision than the build; change it to fetch and
checkout the exact build commit (use the GitHub Actions commit variable ${{
github.sha }}) instead of pulling main so the deployed docker/prod/api files
(docker-compose-prod.yml, deploy.sh, traefik.yml) match the built image;
specifically, replace the git checkout/pull of main with a fetch of the specific
commit and checkout that commit (or use FETCH_HEAD/$GITHUB_SHA) after
sparse-checkout set, ensuring the workflow variable ${{ github.sha }} is used to
lock the revision.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 845eb313-365c-4e95-9725-e52a7abad74e

📥 Commits

Reviewing files that changed from the base of the PR and between 3b14cc3 and d1c8d66.

📒 Files selected for processing (13)
  • .github/workflows/cd-api-dev.yml
  • .github/workflows/cd-api-prod.yml
  • build.gradle
  • docker/dev/docker-compose-dev.yml
  • docker/prod/api/deploy.sh
  • docker/prod/api/docker-compose-prod.yml
  • docker/prod/api/rollback.sh
  • docker/prod/api/traefik/traefik.yml
  • src/main/java/com/samhap/kokomen/global/logging/RateLimitingTurboFilter.java
  • src/main/java/com/samhap/kokomen/global/logging/SlackErrorLayout.java
  • src/main/resources/application-dev.yml
  • src/main/resources/application-prod.yml
  • src/main/resources/logback-spring.xml

Comment on lines +93 to +96
- name: Verify deployment
run: |
sleep 5
curl -sf http://localhost:80/actuator/health || echo "Health check endpoint not responding"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

배포 검증이 실패해도 workflow가 성공 처리됩니다.

Line 96의 curl -sf ... || echo ...는 health check 실패를 삼켜서, 라우팅 전환 후 서비스가 비정상이더라도 job이 green으로 끝납니다. 무중단 배포 검증 단계라면 여기서 명시적으로 실패시키고 rollback/알림 흐름과 연결해야 합니다.

수정 예시
       - name: Verify deployment
         run: |
           sleep 5
-          curl -sf http://localhost:80/actuator/health || echo "Health check endpoint not responding"
+          curl -sf http://localhost:80/actuator/health
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Verify deployment
run: |
sleep 5
curl -sf http://localhost:80/actuator/health || echo "Health check endpoint not responding"
- name: Verify deployment
run: |
sleep 5
curl -sf http://localhost:80/actuator/health
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-api-prod.yml around lines 93 - 96, The "Verify
deployment" step currently swallows failures by using `curl -sf ... || echo
...`; change it so a failed health check causes the job to fail by replacing the
swallow logic with an explicit non-zero exit path (e.g., run the same curl and
on failure print the message and exit 1). Locate the step named "Verify
deployment" and the line containing `curl -sf
http://localhost:80/actuator/health || echo "Health check endpoint not
responding"` and update it to print the error message and then exit non-zero so
the workflow fails and can trigger rollback/alerting.

implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'ch.qos.logback.contrib:logback-json-classic:0.1.5'
implementation 'ch.qos.logback.contrib:logback-jackson:0.1.5'
implementation 'com.github.maricn:logback-slack-appender:1.6.1'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is com.github.maricn:logback-slack-appender:1.6.1 compatible with Logback 1.5.x and Spring Boot 3.5.x?

💡 Result:

No, com.github.maricn:logback-slack-appender:1.6.1 is likely not fully compatible with Logback 1.5.x (used by Spring Boot 3.5.x). The original repository is archived since Feb 2024 with no updates since 2021, and the maintainer recommends the maintained fork https://github.com/cyfrania/logback-slack-appender, which explicitly supports Logback 1.3.x/1.4.x but notes Java 11+ for 1.4.x (Spring Boot 3.5.x requires Java 21+). The original appender depends on Logback via ${logback.version} (unspecified concrete version), but lacks confirmation of testing with Logback 1.5.x changes (e.g., supplier patterns). Potential configuration or API incompatibilities may arise, as seen in other Logback appenders with Spring Boot 3.x. Use the cyfrania fork (v1.2, 2024) for better compatibility.

Citations:


🏁 Script executed:

cat build.gradle

Repository: samhap-soft/kokomen-backend

Length of output: 1905


Spring Boot 3.5.3과 호환되지 않는 Logback appender 라이브러리 교체 필요

Line 57의 com.github.maricn:logback-slack-appender:1.6.1은 Spring Boot 3.5.3에서 사용하는 Logback 1.5.x와 호환되지 않습니다. 또한 원본 리포지토리는 2024년 2월부터 아카이브된 상태로 2021년 이후 업데이트가 없습니다. 로깅이 애플리케이션 초기화 경로에 직접 영향을 주므로 배포 시 런타임 오류가 발생할 수 있습니다.

대신 유지보수 중인 포크인 com.cyfrania:logback-slack-appender:1.2를 사용하세요. 이 버전은 최근에 업데이트되었으며 현재 스택과의 호환성이 보장됩니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build.gradle` at line 57, Replace the incompatible Logback Slack appender
dependency 'com.github.maricn:logback-slack-appender:1.6.1' with the maintained
fork 'com.cyfrania:logback-slack-appender:1.2' in the Gradle dependencies so the
app uses a Logback 1.5.x–compatible, actively maintained artifact; update the
dependency declaration where the current implementation line contains
'com.github.maricn:logback-slack-appender:1.6.1' to use
'com.cyfrania:logback-slack-appender:1.2' instead.

#!/bin/bash
set -e

COMPOSE_FILE="docker-compose-prod.yml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Compose 파일 경로가 현재 작업 디렉터리에 종속됩니다.

이 스크립트도 docker/prod/api 밖에서 실행하면 docker-compose-prod.yml를 찾지 못합니다. 배포 스크립트는 실행 위치와 무관하게 자기 디렉터리를 기준으로 경로를 잡아야 안전합니다.

🔧 제안 수정
-COMPOSE_FILE="docker-compose-prod.yml"
+SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
+COMPOSE_FILE="$SCRIPT_DIR/docker-compose-prod.yml"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COMPOSE_FILE="docker-compose-prod.yml"
SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
COMPOSE_FILE="$SCRIPT_DIR/docker-compose-prod.yml"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/prod/api/deploy.sh` at line 4, COMPOSE_FILE is set to a relative file
name ("docker-compose-prod.yml") which breaks when deploy.sh is run outside
docker/prod/api; change the assignment in deploy.sh so COMPOSE_FILE points to
the docker-compose-prod.yml located next to the script by resolving the script's
directory (i.e., base the path on the script's directory rather than the current
working directory) so references to "docker-compose-prod.yml" always work
regardless of where the script is executed.

Comment on lines +13 to +21
get_active() {
if docker ps -q -f name=kokomen-api-blue | grep -q .; then
echo "blue"
elif docker ps -q -f name=kokomen-api-green | grep -q .; then
echo "green"
else
echo "none"
fi
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

sudo 사용이 섞여 있어서 Docker 권한이 필요한 서버에서는 배포가 중간에 끊깁니다.

docker composesudo -E로 띄우고, 상태 조회/헬스체크/정리 단계는 일반 docker를 쓰고 있습니다. 이 구성은 Docker 그룹 권한이 없는 서버에서 get_active, wait_healthy, old container 정리 단계가 전부 실패합니다.

🔧 제안 수정
+DOCKER="sudo -E docker"
...
-    if docker ps -q -f name=kokomen-api-blue | grep -q .; then
+    if $DOCKER ps -q -f name=kokomen-api-blue | grep -q .; then
...
-        status=$(docker inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
+        status=$($DOCKER inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
...
-    if ! docker ps -q -f name=traefik | grep -q .; then
+    if ! $DOCKER ps -q -f name=traefik | grep -q .; then
...
-        sudo -E docker compose -f $COMPOSE_FILE up -d traefik
+        $DOCKER compose -f "$COMPOSE_FILE" up -d traefik
...
-    sudo -E docker compose -f $COMPOSE_FILE --profile $TARGET up -d "kokomen-api-$TARGET"
+    $DOCKER compose -f "$COMPOSE_FILE" --profile "$TARGET" up -d "kokomen-api-$TARGET"
...
-        docker rm -f "kokomen-api-$TARGET" 2>/dev/null || true
+        $DOCKER rm -f "kokomen-api-$TARGET" 2>/dev/null || true
...
-        docker stop -t $GRACEFUL_SHUTDOWN_WAIT "$OLD" || true
-        docker rm -f "$OLD" 2>/dev/null || true
+        $DOCKER stop -t "$GRACEFUL_SHUTDOWN_WAIT" "$OLD" || true
+        $DOCKER rm -f "$OLD" 2>/dev/null || true

Also applies to: 23-44, 67-75, 79-82, 93-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/prod/api/deploy.sh` around lines 13 - 21, The script mixes plain
docker calls with sudo -E docker compose causing failures on hosts where docker
requires sudo; define a single command variable (e.g., DOCKER_CMD) that detects
whether sudo is needed (try running `docker ps` and fall back to `sudo -E
docker`) and use that variable everywhere instead of raw `docker` or `sudo -E
docker compose`, and update all affected helpers (get_active, wait_healthy, the
old-container cleanup logic and any docker compose invocations) to invoke
$DOCKER_CMD (and $DOCKER_CMD compose) so all docker operations run under the
same privilege model.

Comment on lines +49 to +57
labels:
- "traefik.enable=true"
- "traefik.http.routers.api.rule=Host(`api.kokomen.kr`)"
- "traefik.http.routers.api.entrypoints=web"
- "traefik.http.services.api.loadbalancer.server.port=8080"
- "traefik.http.services.api.loadbalancer.healthcheck.path=/actuator/health"
- "traefik.http.services.api.loadbalancer.healthcheck.port=8081"
- "traefik.http.services.api.loadbalancer.healthcheck.interval=5s"
- "traefik.http.services.api.loadbalancer.healthcheck.timeout=3s"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "docker-compose-prod.yml" -o -name "deploy.sh" -o -name "rollback.sh" | head -20

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

# Get the full structure of the docker directory
fd -t f "docker" | grep -E "(compose|deploy|rollback)" | head -30

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

# Look for the specific file mentioned in the review
cat docker/prod/api/docker-compose-prod.yml 2>/dev/null || find . -name "docker-compose-prod.yml" -exec cat {} \;

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🌐 Web query:

Traefik identical labels multiple containers blue-green deployment behavior

💡 Result:

Traefik's behavior with multiple Docker containers having identical labels in blue-green deployments depends on how the labels are configured, particularly the router and service names. By default, Traefik creates one router and one service per container based on container-specific identifiers (like container name or ID). Identical router rules (e.g., same Host(example.com)) across multiple containers lead to configuration conflicts, as Traefik detects multiple routers with the same rule but different services, resulting in errors like "Router defined multiple times with different configurations" and failure to route properly. To enable load balancing across multiple containers with identical routing needs (useful for scaling or canary/blue-green with weights), all containers must explicitly share the same service name via labels like traefik.http.services.shared-service.loadbalancer.server.port=80, and reference it in routers with traefik.http.routers.myrouter.service=shared-service. Traefik then aggregates matching containers into a single load-balanced service with servers for each container instance. For strict blue-green deployments (zero-downtime version switching without concurrent load balancing): - Use separate containers/services for blue/green with distinct router names or priorities (e.g., traefik.http.routers.app-blue.rule=Host(app.localhost); traefik.http.routers.app-blue.priority=10 vs green with higher priority). - Enable/disable via traefik.enable=true/false or weights (e.g., traefik.http.services.app.loadbalancer.weight=100 for active, 0 for inactive). - Switch by updating labels (recreating containers), health checks, or dynamic file provider configs pointing to @docker services. This avoids conflicts while allowing atomic traffic shifts. Official docs confirm Traefik creates services with servers per container instance when configurations match, supporting load balancing natively when service names align. Conflicts arise only from mismatched router/service declarations across identical-rule containers.

Citations:


🌐 Web query:

Traefik http.routers http.services same labels multiple backends load balancing

💡 Result:

In Traefik, to configure an HTTP router (e.g., http.routers.http) to load balance across multiple backends using the same service labels, use the Docker provider with identical service names across multiple containers. Traefik automatically aggregates containers with the same service name into a single loadBalancer service, adding each container's instance as a server. Key steps: 1. Define the same service name (e.g., my-service) on multiple containers via labels like traefik.http.services.my-service.loadbalancer.server.port=8080. 2. Point one or more routers to that service name using traefik.http.routers.http.service=my-service. 3. Traefik detects matching service configurations and load balances traffic across all containers exposing that service (using strategies like wrr by default). Example Docker Compose for two whoami containers load balanced by router "http": version: '3' services: whoami-1: image: traefik/whoami labels: - traefik.enable=true - traefik.http.routers.http.rule=Host(example.com) - traefik.http.routers.http.service=my-service - traefik.http.services.my-service.loadbalancer.server.port=80 whoami-2: image: traefik/whoami labels: - traefik.enable=true - traefik.http.routers.http.service=my-service # Router points to shared service - traefik.http.services.my-service.loadbalancer.server.port=80 # Same service name deploy: replicas: 1 # Scale as needed Traefik creates a loadBalancer for my-service with servers from both containers. Without explicit router labels on the second container, Traefik auto-creates routers if rules are defined, but explicit shared service ensures aggregation. For static/file provider, explicitly list servers: http: services: my-service: loadBalancer: servers: - url: http://backend1:80 - url: http://backend2:80 Load balancing strategies (wrr, p2c, etc.) configurable via traefik.http.services.my-service.loadbalancer.strategy=wrr. This works in Traefik v2+ and v3 (current as of 2026), per official docs.

Citations:


🌐 Web query:

Traefik blue-green deployment routing cutover strategy

💡 Result:

Traefik supports blue-green deployment routing through its dynamic configuration and weighted services, enabling zero-downtime cutovers by switching traffic between blue (current) and green (new) environments. Core Strategy: Maintain two identical environments: blue (live) and green (staging new version). Deploy and test the new version in green while blue handles all production traffic. Once validated, perform cutover by updating Traefik routing to direct 100% traffic to green. Rollback by switching back to blue. Cutover Methods by Environment: 1. Docker/Docker Compose (using labels): - Enable/disable services via labels like traefik.enable=true/false. - Update docker-compose.yml with env vars: BLUE_ENABLED=true, GREEN_ENABLED=false initially. - Deploy: docker compose up -d - Test green, then cutover: BLUE_ENABLED=false GREEN_ENABLED=true; docker compose up -d - Traefik auto-detects changes and switches instantly. - Example labels: traefik.http.services.app-blue.loadbalancer.server.port=3000 traefik.enable=${BLUE_ENABLED:-true} 2. File Provider (dynamic config): - Define separate services for blue/green. - Router points to active service: service: green@file - Edit config file (watched by Traefik), change service reference, save for instant switch. 3. Kubernetes (CRDs recommended): - Deploy blue/green deployments + services (e.g., app-blue-svc, app-green-svc). - Use IngressRoute with TraefikService for weighted routing: apiVersion: traefik.io/v1alpha1 kind: TraefikService metadata: name: bg-service spec: weighted: services: - name: app-blue-svc port: 80 weight: 100 # Initially 100% blue - name: app-green-svc port: 80 weight: 0 - IngressRoute references bg-service. - Cutover: Update weights (100->0 blue, 0->100 green), apply. Supports gradual shift (90/10 -> 50/50 -> 0/100). - Health checks ensure only healthy services receive traffic. Additional Best Practices: - Use health checks (Traefik auto-removes unhealthy servers). - Weighted Round Robin (WRR) for gradual cutover or canary. - Traffic mirroring for green testing (duplicate requests to green while blue responds). - Tools like Flagger automate with metrics-based promotion. - Monitor via Traefik dashboard. This leverages Traefik's service discovery and hot-reload for zero-downtime.

Citations:


현재 Traefik 설정은 동시에 두 백엔드로 트래픽을 분산합니다. 진정한 blue-green 전환이 아닙니다.

동일한 traefik.http.routers.api.*traefik.http.services.api.* 라벨을 blue/green 컨테이너에 붙이면, Traefik이 두 컨테이너를 하나의 로드밸런싱 풀로 통합하여 동시에 트래픽을 분산합니다. blue-green 디플로이먼트에서 진정한 zero-downtime 전환을 위해서는 다음 중 하나가 필요합니다:

  • traefik.enable=true/false를 사용한 활성/비활성 제어
  • 각 색상별 별도의 서비스 이름 지정
  • 라우터 우선순위 설정 후 전환 시 활성 라우터 변경

현재 deploy.sh/rollback.sh에 이러한 전환 메커니즘이 없으면, 실제 동작은 두 버전이 동시에 트래픽을 받는 rolling/canary 배포에 가깝습니다.

Also applies to: 96-104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/prod/api/docker-compose-prod.yml` around lines 49 - 57, The Traefik
labels currently use identical router/service names (traefik.http.routers.api.*
and traefik.http.services.api.*) so blue and green containers are merged into
one LB pool; change the strategy to perform real blue-green switches by either
(a) assigning distinct service/router names per color (e.g.,
traefik.http.services.api-blue.* and traefik.http.services.api-green.* with
matching traefik.http.routers.api-blue.rule / api-green.rule), or (b) toggling
traefik.enable=true/false on the inactive container from your deployment scripts
(deploy.sh / rollback.sh), or (c) use router priority and switch the active
router during deploy; update the docker-compose labels and the deploy/rollback
scripts to implement one of these approaches so only the active color receives
traffic.

Comment on lines +11 to +44
public class RateLimitingTurboFilter extends TurboFilter {

private static final int MAX_CACHE_SIZE = 1000;

private final ConcurrentHashMap<String, Long> lastLogTimes = new ConcurrentHashMap<>();
private long cooldownMillis = 60000;

public void setCooldownSeconds(int seconds) {
this.cooldownMillis = seconds * 1000L;
}

@Override
public FilterReply decide(Marker marker, Logger logger, Level level,
String format, Object[] params, Throwable t) {
if (level != Level.ERROR) {
return FilterReply.NEUTRAL;
}

String errorKey = buildErrorKey(logger, format, t);
long now = System.currentTimeMillis();

Long lastTime = lastLogTimes.get(errorKey);
if (lastTime != null && (now - lastTime) < cooldownMillis) {
return FilterReply.DENY;
}

lastLogTimes.put(errorKey, now);

if (lastLogTimes.size() > MAX_CACHE_SIZE) {
cleanupOldEntries(now);
}

return FilterReply.NEUTRAL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -type f -name "*.xml" | grep -i logback | head -20

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

rg "RateLimitingTurboFilter" -A 5 -B 5

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

fd -type f -name "logback*.xml"

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

rg "TurboFilter" -A 5 -B 5

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

rg "appender" -t xml | head -30

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🌐 Web query:

Logback TurboFilter vs Filter appender level difference

💡 Result:

TurboFilters and appender-level Filters (like ThresholdFilter) in Logback serve different purposes in the logging pipeline, with TurboFilters operating earlier and more globally than appender-specific filters. Key differences: Scope: TurboFilters apply to the entire logging context (all loggers), while Filters are attached to specific appenders. Timing: TurboFilters are invoked on every logging request, before LoggingEvent creation (using parameters like level, logger, format), enabling high-performance early rejection. Regular Filters act after LoggingEvent creation, within the appender's doAppend method. Level handling: Appender level thresholds (or ThresholdFilter) determine the minimum level events reaching that appender will be processed/output. Logger levels enable/disable requests via the basic selection rule (request level >= effective logger level). TurboFilters can override or supplement these by deciding ACCEPT/DENY/NEUTRAL early based on dynamic criteria (e.g., DynamicThresholdFilter uses MDC for per-user levels), potentially bypassing logger/appender level checks. Processing order (from architecture): 1. TurboFilters (early), 2. Logger effective level check, 3. LoggingEvent creation, 4. Appender filters (including ThresholdFilter). Example: A TurboFilter can deny DEBUG logs for certain users before logger level evaluation, while a ThresholdFilter on an appender denies below INFO after the event reaches it. Appender "level" often refers to ThresholdFilter's or appender threshold, filtering post-event.

Citations:


TurboFilter로 구현하면 Slack뿐 아니라 모든 ERROR 로그가 누락됩니다.

TurboFilter는 appender 선택 전 전역 범위에서 동작하므로, 여기서 DENY를 반환하면 Slack뿐 아니라 파일/콘솔 appender까지 모든 ERROR 이벤트를 잃게 됩니다. 의도가 "Slack 알림만 rate limit"이라면 Slack appender에 붙는 일반 filter로 구현해야 합니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/samhap/kokomen/global/logging/RateLimitingTurboFilter.java`
around lines 11 - 44, The current RateLimitingTurboFilter class (decide method)
is a TurboFilter that returns DENY for rate-limited ERROR events which prevents
all appenders (Slack, file, console) from receiving them; change this to an
appender-level Filter implementation (e.g., create RateLimitingFilter that
extends ch.qos.logback.core.filter.Filter and implements decide/decideInternal)
and attach that filter only to the Slack appender so only Slack notifications
are rate-limited; update places that register RateLimitingTurboFilter to instead
add the new RateLimitingFilter to the Slack appender configuration and keep the
global TurboFilter logic removed or converted to a NEUTRAL-only TurboFilter if
global behavior is still needed.

Comment on lines +29 to +37
String errorKey = buildErrorKey(logger, format, t);
long now = System.currentTimeMillis();

Long lastTime = lastLogTimes.get(errorKey);
if (lastTime != null && (now - lastTime) < cooldownMillis) {
return FilterReply.DENY;
}

lastLogTimes.put(errorKey, now);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/main/java/com/samhap/kokomen/global/logging/RateLimitingTurboFilter.java

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

web_search
Java ConcurrentHashMap race condition get-put pattern thread safety best practices

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🌐 Web query:

Java ConcurrentHashMap race condition get-put pattern atomicity thread safety

💡 Result:

ConcurrentHashMap provides thread-safe individual get and put operations, but the common get-put pattern (e.g., if (!map.containsKey(key)) map.put(key, value)) is NOT atomic and susceptible to race conditions in multithreaded environments. Individual operations like get, put, putIfAbsent, etc., are atomic and thread-safe. Retrievals (get) reflect the most recently completed updates due to happens-before guarantees: an update operation for a key happens-before any subsequent non-null retrieval for that key. However, composing get followed by put (or similar patterns like check-then-act) creates a race window where multiple threads can pass the check concurrently before any completes the put, leading to duplicate computations/insertions or lost updates. To achieve atomicity for get-put patterns: - Use computeIfAbsent(key, k -> computeValue): Atomically checks absence and computes/inserts if needed (Java 8+). - Use putIfAbsent(key, value) for simple insertion if absent. - For counters: map.computeIfAbsent(key, k -> new AtomicInteger).incrementAndGet. Official docs confirm: "Retrieval operations generally do not block, so may overlap with update operations." Compound user logic requires atomic methods. Examples from sources demonstrate races in naive patterns and fixes via atomic methods.

Citations:


중복 억제 체크 시 race condition으로 인한 동시성 문제를 해결해야 합니다.

get()put()을 하는 패턴은 원자적이지 않아서, 같은 에러가 동시에 들어올 때 여러 스레드가 모두 cooldown 체크를 통과할 수 있습니다. compute() 메서드로 한 번에 처리하여 실제 rate limiting이 작동하도록 수정하세요.

🔧 제안 수정
+import java.util.concurrent.atomic.AtomicBoolean;
...
-        Long lastTime = lastLogTimes.get(errorKey);
-        if (lastTime != null && (now - lastTime) < cooldownMillis) {
-            return FilterReply.DENY;
-        }
-
-        lastLogTimes.put(errorKey, now);
+        AtomicBoolean allowed = new AtomicBoolean(false);
+        lastLogTimes.compute(errorKey, (key, lastTime) -> {
+            if (lastTime == null || (now - lastTime) >= cooldownMillis) {
+                allowed.set(true);
+                return now;
+            }
+            return lastTime;
+        });

         if (lastLogTimes.size() > MAX_CACHE_SIZE) {
             cleanupOldEntries(now);
         }

-        return FilterReply.NEUTRAL;
+        return allowed.get() ? FilterReply.NEUTRAL : FilterReply.DENY;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/samhap/kokomen/global/logging/RateLimitingTurboFilter.java`
around lines 29 - 37, The current pattern in RateLimitingTurboFilter using
lastLogTimes.get(errorKey) followed by lastLogTimes.put(errorKey, now) is not
atomic and can race; replace that logic by using lastLogTimes.compute(errorKey,
...) to atomically inspect the previous timestamp and decide whether to update
to now or keep the old value so the cooldownMillis check is done atomically;
update the code around buildErrorKey(...) / lastLogTimes / cooldownMillis so
compute returns the correct stored timestamp and use its result to return
FilterReply.DENY when the cooldown has not expired, otherwise allow logging and
set the timestamp to now.

Comment on lines +13 to +15
private static final int MAX_STACK_TRACE_LENGTH = 2000;
private static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "SlackErrorLayout.java" | head -5

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

cat -n src/main/java/com/samhap/kokomen/global/logging/SlackErrorLayout.java

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

rg "SimpleDateFormat" -t java -B 2 -A 2

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

# Check Logback LayoutBase documentation and threading model
web_search SimpleDateFormat thread safety issue Java

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

# Verify DateTimeFormatter is the recommended replacement
web_search DateTimeFormatter thread safe replacement SimpleDateFormat Java

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🌐 Web query:

SimpleDateFormat thread safety issue concurrency Java

💡 Result:

SimpleDateFormat in Java is not thread-safe due to its use of mutable internal state, particularly a shared Calendar instance field that stores intermediate results during formatting and parsing operations. When multiple threads access the same instance concurrently, they can interfere with each other's operations, leading to corrupted data, incorrect date strings, or exceptions like NumberFormatException. Official Java documentation (Java SE 24 & JDK 24, and earlier versions) explicitly states: "Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally." It also recommends DateTimeFormatter as an immutable, thread-safe alternative. The issue arises because methods like format and parse modify instance fields without synchronization. For example, one thread might call calendar.clear and calendar.add during parse, but another thread could interrupt and alter the calendar state mid-operation, mixing results from different dates. Workarounds for SimpleDateFormat include: - Create a new instance per use (thread-safe but slower due to object creation). - Use ThreadLocal to give each thread its own instance. - Synchronize access externally (e.g., synchronized block), but this creates bottlenecks. Best practice: Since Java 8, use java.time.format.DateTimeFormatter, which is immutable and thread-safe, suitable for static final use. Example: private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd"); String formatted = date.format(FORMATTER); This remains true as of Java 24 (2025 documentation), with no changes to SimpleDateFormat's thread-safety status.

Citations:


🌐 Web query:

DateTimeFormatter thread safe replacement SimpleDateFormat Java

💡 Result:

The thread-safe replacement for SimpleDateFormat in Java is DateTimeFormatter from the java.time.format package (introduced in Java 8). Unlike SimpleDateFormat, which mutates internal state (like Calendar) and is not thread-safe, DateTimeFormatter is immutable and explicitly documented as thread-safe, allowing safe use as static final instances across threads. Key advantages: - Thread-safe: Can be shared globally without synchronization. - Part of modern java.time API: Use with LocalDate, LocalDateTime, etc., instead of legacy Date/Calendar. - Similar pattern syntax, but note differences (e.g., use 'u' for year instead of 'y' in some cases for proleptic years). Example usage: import java.time.LocalDate; import java.time.format.DateTimeFormatter; private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd"); public LocalDate parse(String dateStr) { return LocalDate.parse(dateStr, FORMATTER); } Official Oracle documentation (Java SE 24, 22, 8) confirms: "This class is immutable and thread-safe." Multiple sources (Stack Overflow, blogs) recommend this migration for multi-threaded environments like web apps.

Citations:


🌐 Web query:

Logback LayoutBase threading model concurrent access

💡 Result:

Logback's LayoutBase is an abstract base class for layouts that does not implement the doLayout method, leaving it abstract for subclasses. LayoutBase itself has no synchronization and manages only simple shared fields like headers and footers, which are set during configuration (single-threaded) and read-only afterward. Layouts extending LayoutBase, such as PatternLayout (via PatternLayoutBase), are designed for concurrent access. Their doLayout implementations create a fresh StringBuilder per call (e.g., new StringBuilder(256) in writeLoopOnConverters), avoiding shared mutable state. Past bugs with shared/recycled StringBuilders (e.g., LOGBACK-1270 fixed in 1.1.11, LOGBACK-427) have been resolved in modern versions (1.5.28 as of sources). Logback ensures thread-safety at the appender level: AppenderBase.doAppend is synchronized, serializing calls to layout.doLayout for standard appenders. AsyncAppenderBase uses UnsynchronizedAppenderBase but processes events single-threaded in its worker, so layout.doLayout is still not concurrent. Layouts are thus safe for concurrent doLayout calls when used with async appenders or multiple appenders. Official docs confirm Logback is thread-safe overall, with layouts performing efficiently without inherent synchronization needs due to per-call immutability.

Citations:


SimpleDateFormat을 static으로 공유하면 멀티스레드 환경에서 타임스탬프가 손상될 수 있습니다.

SimpleDateFormat은 내부 상태를 변경하며 thread-safe하지 않습니다. 여러 스레드가 동시에 접근할 때 Calendar 인스턴스가 간섭되어 데이터가 깨지거나 예외가 발생할 수 있습니다. DateTimeFormatter로 변경하면 불변이며 thread-safe하므로 안전합니다.

🔧 제안 수정
-import java.text.SimpleDateFormat;
-import java.util.Date;
+import java.time.Instant;
+import java.time.ZoneId;
+import java.time.format.DateTimeFormatter;

-    private static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
+    private static final DateTimeFormatter DATE_FORMAT =
+            DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
+                    .withZone(ZoneId.systemDefault());

-        sb.append(":clock3: *Time:* ").append(DATE_FORMAT.format(new Date(event.getTimeStamp()))).append("\n");
+        sb.append(":clock3: *Time:* ")
+                .append(DATE_FORMAT.format(Instant.ofEpochMilli(event.getTimeStamp())))
+                .append("\n");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/samhap/kokomen/global/logging/SlackErrorLayout.java` around
lines 13 - 15, SlackErrorLayout currently defines DATE_FORMAT as a shared
SimpleDateFormat which is not thread-safe; replace it with a thread-safe
java.time DateTimeFormatter (e.g. DateTimeFormatter.ofPattern("yyyy-MM-dd
HH:mm:ss")) and update all code in SlackErrorLayout that uses DATE_FORMAT to
format timestamps via java.time types (Instant/LocalDateTime/ZonedDateTime with
a ZoneId) instead of SimpleDateFormat; keep MAX_STACK_TRACE_LENGTH as-is and
ensure the new DateTimeFormatter is a static final constant so formatting is
immutable and safe in multithreaded contexts.

Comment on lines +9 to +12
<!-- 중복 알림 방지 TurboFilter (ERROR 레벨 1분 쿨다운) -->
<turboFilter class="com.samhap.kokomen.global.logging.RateLimitingTurboFilter">
<cooldownSeconds>60</cooldownSeconds>
</turboFilter>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/main/resources/logback-spring.xml | head -50

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

find . -name "RateLimitingTurboFilter.java" -o -name "RateLimitingTurboFilter.kt" | head -5

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

rg "class RateLimitingTurboFilter" -A 20

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🌐 Web query:

Logback TurboFilter behavior appender filtering

💡 Result:

Logback TurboFilters provide high-performance, context-wide filtering of logging requests, invoked early in the logging pipeline before LoggingEvent creation and logger level checks. They are not attached to specific appenders but apply globally across the entire LoggerContext. Key behavior: - TurboFilters are called every time a logging request is issued, including during SLF4J fluent API methods like atInfo (with limited info: logger and level) and in the final log method for enabled statements (with full details: marker, logger, level, format, params, throwable). - The chain processes filters sequentially; the first ACCEPT or DENY decision short-circuits further processing and determines if the event proceeds (ACCEPT skips to event creation, DENY drops immediately, NEUTRAL continues). - If TurboFilters allow the request (NEUTRAL or ACCEPT), it undergoes logger effective level check, LoggingEvent creation, then appender invocation where per-appender regular filters apply. - TurboFilters cannot directly filter output to specific appenders; for appender-specific filtering, use regular Filter instances attached to appenders. A TurboFilter DENY prevents the event from reaching any appender. Configuration example from docs: username alice ACCEPT DENY ... This globally accepts logs only if MDC contains 'username=alice', affecting all appenders. As of Logback 1.5.28 (current per API docs), TurboFilters remain pre-appender gatekeepers for efficiency.

Citations:


Slack 중복 방지 필터가 파일 로그까지 같이 누락시킵니다.

Line 10의 turboFilter는 appender보다 앞단에서 동작하는 전역 필터입니다. RateLimitingTurboFilter가 반복 ERRORDENY하면 SLACK뿐 아니라 FILE 등 모든 appender도 그 이벤트를 받지 못합니다. 현재 구성은 "Slack만 dedupe"가 아니라 반복 에러를 전역 차단하는 구조이므로, 장애 분석용 파일 로그가 불완전하게 남습니다.

Slack 전용 필터링을 원하면 appender 내부 rate limit으로 옮기거나, 또는 FILE appender에는 turboFilter를 적용하지 않도록 구조를 분리해 주세요.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/resources/logback-spring.xml` around lines 9 - 12, The global
turboFilter using com.samhap.kokomen.global.logging.RateLimitingTurboFilter
currently denies repeated ERROR events for all appenders, causing FILE logs to
be dropped; update the logging config so rate-limiting applies only to the Slack
appender: remove or disable the <turboFilter> block that registers
RateLimitingTurboFilter globally and instead add an equivalent rate-limit filter
configuration inside the SLACK appender (or create a separate appender-ref/group
for SLACK that uses RateLimitingTurboFilter), ensuring the FILE appender remains
unfiltered so ERROR events still reach file logs.

Comment on lines +49 to +58
<!-- Slack Async Appender (비동기 + ERROR 필터) -->
<appender name="ASYNC_SLACK" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="SLACK"/>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>ERROR</level>
</filter>
<queueSize>500</queueSize>
<discardingThreshold>0</discardingThreshold>
<includeCallerData>true</includeCallerData>
</appender>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and examine its content
find . -name "logback-spring.xml" -type f

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🌐 Web query:

Logback AsyncAppender blocking behavior default neverBlock parameter

💡 Result:

The default value of the neverBlock parameter for Logback's AsyncAppender is false. When neverBlock is false (default), the append method blocks the calling thread by using putUninterruptibly on the BlockingQueue if the queue is full, ensuring no log events are lost unless explicitly discarded based on other thresholds. When set to true, it uses offer, which is non-blocking and may drop events if the queue is full.

Citations:


🌐 Web query:

Logback AsyncAppender discardingThreshold queue full blocking behavior

💡 Result:

Logback's AsyncAppender uses an ArrayBlockingQueue to buffer log events asynchronously. A single worker thread dequeues and processes events. When appending an event: - If remainingCapacity < discardingThreshold (default: queueSize/5) AND the event is discardable (TRACE/DEBUG/INFO by default), it is discarded immediately without queuing. - Otherwise, the event is added to the queue via put(event): - If neverBlock=true, uses non-blocking offer: drops if full. - If neverBlock=false (default), uses putUninterruptibly: blocks the calling thread until space is available or interrupted. Thus, by default (neverBlock=false), when the queue is full, the calling thread blocks on put until the worker thread dequeues events, making it non-discarding but potentially blocking. Set discardingThreshold=0 to disable discarding, or neverBlock=true to avoid blocking (but risk drops via offer). queueSize defaults to 256.

Citations:


AsyncAppender의 기본 blocking 동작이 요청 처리 지연을 유발할 수 있습니다.

Line 50의 AsyncAppender는 기본값이 neverBlock=false로 설정되어 있어, 큐가 꽉 차면 putUninterruptibly를 통해 생산자 스레드(요청 처리 스레드)를 차단합니다. Slack은 부가 알림 채널이므로, 이러한 차단으로 인해 요청 처리 지연이 발생하는 것은 부적절합니다. <neverBlock>true</neverBlock>을 추가하여 non-blocking 모드로 변경하면, 큐가 가득 찬 경우 이벤트가 버려지더라도 요청 처리 스레드는 블로킹되지 않습니다.

수정 예시
     <appender name="ASYNC_SLACK" class="ch.qos.logback.classic.AsyncAppender">
         <appender-ref ref="SLACK"/>
         <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
             <level>ERROR</level>
         </filter>
         <queueSize>500</queueSize>
         <discardingThreshold>0</discardingThreshold>
+        <neverBlock>true</neverBlock>
         <includeCallerData>true</includeCallerData>
     </appender>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<!-- Slack Async Appender (비동기 + ERROR 필터) -->
<appender name="ASYNC_SLACK" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="SLACK"/>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>ERROR</level>
</filter>
<queueSize>500</queueSize>
<discardingThreshold>0</discardingThreshold>
<includeCallerData>true</includeCallerData>
</appender>
<!-- Slack Async Appender (비동기 + ERROR 필터) -->
<appender name="ASYNC_SLACK" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="SLACK"/>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>ERROR</level>
</filter>
<queueSize>500</queueSize>
<discardingThreshold>0</discardingThreshold>
<neverBlock>true</neverBlock>
<includeCallerData>true</includeCallerData>
</appender>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/resources/logback-spring.xml` around lines 49 - 58, The
AsyncAppender named ASYNC_SLACK is currently using blocking behavior (default
neverBlock=false) which can delay request threads when the queue is full; update
the ASYNC_SLACK appender configuration to enable non-blocking mode by adding the
neverBlock element with value true (i.e., set neverBlock to true on the
AsyncAppender definition) so events are dropped instead of blocking request
processing; ensure you modify the <appender name="ASYNC_SLACK"
class="ch.qos.logback.classic.AsyncAppender"> block (the AsyncAppender
configuration that references SLACK) to include <neverBlock>true</neverBlock>
and keep includeCallerData, queueSize and discardingThreshold as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] 무중단 배포 도입

1 participant