OBJECTIVE: Credit-Card-Fraud Detection pipeline is an en-to-end Machine Learning project that helps in prediction of fradulent transactions for a bank.
DEISGN:
The pipeline utilises a cassandra database for stored credit-card-fraud analysis a source for testing and training data for model. The trained models(for Kafka and REST) were deployed to make predictions from live transaction data.
Tools; Database: Cassandra ML: Estimators: SGDClassifier, RandomForestClassifier, SVM Classifier and choosing the best estimator Sampling : standardscaler; Imbalanced-learn(smote, smoteenn) since the data is highly imbalanced Model for Kafka client and model for REST(Flask) interface
Source of Data: URL : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud?resource=download (API command: kaggle datasets download -d mlg-ulb/creditcardfraud)
KAFKA MODEL:
- Simulate Kafka producer to produce kafka tx records (serialising json contents of a new transaction)
- Kafka consumer is subscribed to topic ('credit-card-tx')
- As and when kafka consumer gets a message, the transaction data is used ot predict if it is a fraud
REST (flask) MODEL:
- Model predict function is behind rest interface
- REST api invocations calls with feature params
Code flow:

