Skip to content

Latest commit

Β 

History

History
80 lines (59 loc) Β· 2.21 KB

File metadata and controls

80 lines (59 loc) Β· 2.21 KB

CLAUDE.md

μ„€μΉ˜

pip3 install -r requirements.txt
python3 -m playwright install chromium

ν™˜κ²½ μ„€μ •

.env.example을 λ³΅μ‚¬ν•΄μ„œ ν™˜κ²½λ³„ 파일 생성:

cp .env.example .env.local       # 둜컬 개발
cp .env.example .env.production  # ν”„λ‘œλ•μ…˜

μ„œλ²„ μ‹€ν–‰ μ‹œ APP_ENV둜 ν™˜κ²½ 선택 (κΈ°λ³Έκ°’: local):

APP_ENV=production uvicorn main:app --host 0.0.0.0 --port 8000

파일 ꡬ쑰

main.py       β€” FastAPI μ„œλ²„ + APScheduler (μ£Όκ°„ μŠ€μΌ€μ€„λŸ¬)
scraper.py    β€” μŠ€ν¬λž˜ν•‘ μ½”μ–΄ + CLI μ§„μž…μ 
models.py     β€” Product 데이터 클래슀
parsers.py    β€” νŒŒμ‹± ν•¨μˆ˜ (이름/μˆ˜λŸ‰, 가격, 평점)
storage.py    β€” μ €μž₯ ν•¨μˆ˜ (JSON / CSV / SQL)

API μ„œλ²„ μ‹€ν–‰

uvicorn main:app --host 0.0.0.0 --port 8000
  • GET / β€” μŠ€μΌ€μ€„λŸ¬ μƒνƒœ 및 λ§ˆμ§€λ§‰ μ‹€ν–‰ κ²°κ³Ό
  • POST /run β€” μˆ˜λ™μœΌλ‘œ μŠ€ν¬λž˜ν•‘ μ¦‰μ‹œ μ‹€ν–‰

μŠ€μΌ€μ€„: λ§€μ£Ό μ›”μš”μΌ μžμ • (KST) μžλ™ μ‹€ν–‰

CLI 단독 μ‹€ν–‰

# κΈ°λ³Έ: products/<YYYY-MM-DD>/<μΉ΄ν…Œκ³ λ¦¬>.json + SQL μ €μž₯
python3 scraper.py "https://www.costco.co.kr/Foods/RiceGrains/c/cos_10.1"

# 좜λ ₯ 경둜 직접 μ§€μ •
python3 scraper.py "https://www.costco.co.kr/Foods/RiceGrains/c/cos_10.1" --output products.csv

슀크래퍼 ꡬ쑰

데이터 흐름:

  1. scrape_url() β€” λΈŒλΌμš°μ € μ‹€ν–‰, 전체 νŽ˜μ΄μ§€ 순회
  2. _page_url() β€” νŽ˜μ΄μ§€ URL 생성 (1νŽ˜μ΄μ§€ β†’ κΈ°λ³Έ URL, NνŽ˜μ΄μ§€ β†’ ?page=N-1)
  3. scrape_page() β€” 슀크둀둜 lazy-load 트리거 ν›„ li.product-list-item νŒŒμ‹±

CSS μ…€λ ‰ν„° (costco.co.kr Angular μ•±):

ν•„λ“œ μ…€λ ‰ν„° 폴백
μƒν’ˆ μ»¨ν…Œμ΄λ„ˆ li.product-list-item li[class*='product']
μƒν’ˆλͺ… a.lister-name .notranslate a.lister-name
가격 .original-price .product-price-amount β€”
평점 .star-ratings-css[aria-label] β€”
이미지 picture source[type='image/webp'] picture img

Product μŠ€ν‚€λ§ˆ:

name, quantity, price, rating, review_count, image_url, product_url

SQL μŠ€ν‚€λ§ˆ (products ν…Œμ΄λΈ”):

name, quantity, price, rating, review_count, image_url, product_url, created_at