This project implements an end-to-end analytics pipeline to evaluate vendor performance, inventory efficiency, procurement concentration, and profitability using large-scale transactional data.
Raw data from purchases, sales, pricing, invoices, and inventory snapshots is ingested into a centralized database, transformed into an analytics-ready vendor summary table, and analyzed to generate actionable business insights supported by statistical validation.
The project mirrors a real-world data analyst / analytics engineer workflow and emphasizes:
- Scalable ingestion
- Clean data modeling
- Business-driven KPIs
- Statistical and visual analysis
Organizations working with multiple vendors face challenges such as:
- Over-dependence on a few suppliers
- Capital locked in slow-moving inventory
- Unclear relationship between sales volume and profitability
- Difficulty identifying high-potential brands
This project answers:
- Which vendors and brands drive most revenue?
- Where is procurement risk concentrated?
- Which products are profitable but under-performing?
- Does bulk purchasing reduce unit cost?
- Is there a statistically significant difference between high- and low-performing vendors?
Vendor-Inventory-Analytics/ β βββ data/ β βββ purchases.csv β βββ purchase_prices.csv β βββ vendor_invoice.csv β βββ sales.csv β βββ begin_inventory.csv β βββ end_inventory.csv β βββ (large files excluded from GitHub) β βββ logs/ β βββ ingestion_db.log β βββ get_vendor_summary.log β βββ ingestion_db.py βββ get_vendor_summary.py βββ analytics.ipynb βββ vendor_performance_analysis.ipynb βββ inventory.db βββ README.md
β οΈ Note
Two large CSV files were excluded from GitHub due to size constraints.
The pipeline automatically ingests them when added locally.
| Dataset | Description |
|---|---|
purchases |
Purchase transactions (quantity, dollars, vendor, brand) |
purchase_prices |
Brand-level pricing & volume |
sales |
Sales quantity, revenue, excise tax |
vendor_invoice |
Freight & invoice-level costs |
begin_inventory |
Opening inventory snapshot |
end_inventory |
Closing inventory snapshot |
- Automatically ingest all CSV files from
data/ - Load them into a centralized SQLite database (
inventory.db) - Ensure idempotent, repeatable execution
if_exists="replace"β safe re-runs- Automatic table naming from filenames
- Centralized logging for observability
INFO - Ingesting purchases.csv in db INFO - Ingesting sales.csv in db INFO - Ingestion Completed Successfully Total time taken: ~1.8 minutes
Create a vendorβbrandβlevel summary table to avoid repeated joins on large transactional data.
- Total Purchase Quantity & Dollars
- Total Sales Quantity & Dollars
- Freight & Excise Costs
- Gross Profit
- Profit Margin
- Stock Turnover
- Sales-to-Purchase Ratio
This table serves as the single source of truth for analytics and dashboards.
- Validates dataset scale (up to 12.8M rows)
- Confirms ingestion success
- Performs early sanity checks on schema and distributions
This ensures downstream analysis is built on clean and verified data.
- Loss-making SKUs identified (negative gross profit)
- Slow-moving inventory detected (zero sales)
- Premium pricing outliers observed
- Strong correlation between purchase & sales quantity (β 0.999)
- Weak correlation between price and profitability
- High-volume products operate on lower margins
- 198 brands identified with:
- High profit margins
- Low sales volume
- Sales highly concentrated among global vendors
- Flagship brands dominate revenue contribution
- Top 10 vendors contribute ~65.7% of total purchases
- Indicates procurement dependency risk
- Larger order sizes significantly reduce unit cost
- Confirms economies of scale
- ~$2.71M capital locked in unsold inventory
- Excess stock observed even among top vendors
- 95% confidence interval analysis
- Welchβs t-test confirms:
- Statistically significant difference in profit margins
between high- and low-sales vendors (p < 0.001)
- Statistically significant difference in profit margins
- High-volume vendors trade margin for scale
- Low-volume vendors maintain premium pricing
- Procurement risk is highly concentrated
- Inventory inefficiencies persist across vendors
- Bulk purchasing is a strong cost optimization lever
- Promote high-margin, low-volume brands
- Diversify vendor base to reduce dependency
- Optimize pricing for high-volume SKUs
- Reduce excess inventory for low-turnover vendors
- Leverage bulk procurement strategically
pip install pandas numpy matplotlib seaborn sqlalchemy scipy
python ingestion_db.py
python get_vendor_summary.py
Then open:
analytics.ipynb
vendor_performance_analysis.ipynb