A cloud-native solution for processing, summarizing, and consolidating IoT traffic data from geographically distributed devices using AWS services.
This project implements a serverless architecture to process IoT traffic data collected from multiple enterprise branches. The solution automatically detects anomalies and bottlenecks in network traffic to guide infrastructure investment decisions.
- Scalable Processing: Serverless Lambda functions handle variable workloads without infrastructure management
- Resilient Pipeline: SQS queues ensure reliable message delivery and processing even during worker unavailability
- Efficient Storage: Minimal cloud storage footprint with automatic cleanup of processed files
- Real-time Processing: Immediate processing of uploaded IoT traffic files
- Consolidated Analytics: Aggregated traffic statistics with statistical measures (average, standard deviation)
The solution consists of three main components:
- Uploads IoT traffic CSV files from branch locations to S3
- Sends notification messages to SQS queue to trigger processing
- Runs locally or on branch infrastructure
- Triggered by SQS messages when new files are uploaded
- Reads CSV files from S3
- Summarizes traffic data per source-destination IP pair per day:
- Calculates total flow duration
- Counts total forward packets
- Sends summarized data to consolidation queue
- Processes summarized traffic from SQS queue
- Updates DynamoDB with consolidated statistics:
- Running totals and counts
- Average flow duration
- Average packet counts
- Standard deviation calculations
- Retrieves consolidated traffic data from DynamoDB
- Exports results to CSV file for analysis
- Runs locally to generate final reports
- S3: Raw IoT traffic file storage
- SQS: Message queues for decoupled processing
new-file-queue: Triggers summarizationconsolidate-queue: Triggers consolidation
- Lambda: Serverless compute for workers
- SummarizeWorker (Java 17)
- ConsolidatorWorker (Java 17)
- DynamoDB: Consolidated traffic statistics storage
- CloudWatch: Logging and monitoring
- AWS Account with appropriate IAM permissions
- AWS CLI configured with credentials
- Java 17 or later
- Maven 3.6+
- Bash shell
Before running any scripts, you must configure the AWS CLI with your AWS credentials. This is required for the solution to access your AWS account.
Run the following command to configure your AWS credentials:
aws configureYou will be prompted to enter:
- AWS Access Key ID: Your AWS access key (starts with
AKIA...) - AWS Secret Access Key: Your AWS secret key
- Default region: The AWS region where you want to deploy (e.g.,
us-east-1,eu-west-1) - Default output format: Leave as
json(or press Enter for default)
Example:
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-east-1
Default output format [None]: json
To verify your AWS CLI is properly configured, run:
aws sts get-caller-identityThis should return your AWS account ID, user ARN, and username. If this command fails, your credentials are not properly configured.
- Never commit credentials to version control
- Keep your access keys secure - treat them like passwords
- Consider using IAM roles for production deployments instead of long-term access keys
- Rotate your access keys regularly
Download and extract the project
cd iot-traffic-processor-mainmvn clean packageThis creates a fat JAR (target/iot-traffic-processor.jar) containing all dependencies for Lambda deployment.
Run the full deployment script to set up all AWS infrastructure and deploy Lambda functions:
./full.shThis script performs a complete one-time setup:
- Creates a new S3 bucket for raw IoT traffic files
- Creates SQS queues (
new-file-queueandconsolidate-queue) - Creates DynamoDB table (
IoTTrafficConsolidated) - Builds the Java application
- Deploys SummarizeWorker Lambda function
- Deploys ConsolidatorWorker Lambda function
- Configures event source mappings to connect SQS queues to Lambda functions
Run this script only once at the beginning to initialize your infrastructure.
If you prefer to set up infrastructure manually:
./create-infra.shThen deploy Lambda functions individually:
./deploy-summarize.sh
./deploy-consolidator.shUse the upload script to upload IoT traffic files. Run this script every time you want to upload a new file:
./upload.sh path/to/your/iot-traffic-file.csvWhat this script does:
- Uploads the CSV file to S3 bucket
- Sends a notification message to the SQS queue
- Automatically triggers the SummarizeWorker Lambda function
- Initiates the complete processing pipeline
Example:
./upload.sh data-20221205.csvYou can upload multiple files sequentially. Each file will be processed independently through the pipeline.
Watch Lambda logs in real-time:
# Monitor SummarizeWorker
aws logs tail /aws/lambda/SummarizeWorker --follow
# Monitor ConsolidatorWorker
aws logs tail /aws/lambda/ConsolidatorWorker --followOr view in AWS Console:
- CloudWatch → Log Groups →
/aws/lambda/SummarizeWorker - CloudWatch → Log Groups →
/aws/lambda/ConsolidatorWorker
After processing completes (typically 1-2 minutes), export consolidated traffic data:
mvn exec:java -Dexec.mainClass=com.iotproject.ExportClientThis generates a CSV file with the consolidated traffic statistics:
SrcIP: Source IP addressDstIP: Destination IP addressDate: Traffic dateTotalDuration: Total flow duration for the dayTotalPackets: Total forward packets for the dayCount: Number of traffic records aggregatedAvgDuration: Average flow durationAvgPackets: Average forward packets
Branch IoT Device
↓
Upload Client (./upload.sh)
↓
S3 Bucket (raw/)
↓
SQS new-file-queue
↓
SummarizeWorker Lambda
↓
SQS consolidate-queue
↓
ConsolidatorWorker Lambda
↓
DynamoDB (IoTTrafficConsolidated)
↓
Export Client (mvn exec:java)
↓
CSV Report
IoT traffic CSV files should contain the following columns:
Src IP: Source IP addressDst IP: Destination IP addressFlow Duration: Duration of the traffic flowTot Fwd Pkt: Total forward packets
Example:
Src IP,Dst IP,Flow Duration,Tot Fwd Pkt
192.168.1.10,10.0.0.5,1500,45
192.168.1.11,10.0.0.6,2300,67
iot-traffic-processor-main/
├── src/main/java/com/iotproject/
│ ├── UploadClient.java # Branch upload application
│ ├── SummarizeWorker.java # Lambda: summarizes traffic
│ ├── ConsolidatorWorker.java # Lambda: consolidates statistics
│ ├── ExportClient.java # Export consolidated data
│ └── App.java # Utility class
├── pom.xml # Maven configuration
├── create-infra.sh # Infrastructure setup
├── full.sh # Complete deployment
├── deploy-summarize.sh # Deploy SummarizeWorker
├── deploy-consolidator.sh # Deploy ConsolidatorWorker
├── upload.sh # Upload IoT traffic files
├── bucket-name.txt # Saved S3 bucket name
├── region.txt # Saved AWS region
└── README.md # This file
The following environment variables are automatically set by deployment scripts:
IOT_BUCKET: S3 bucket name for raw traffic filesNEW_FILE_QUEUE_URL: SQS queue URL for new file notificationsCONSOLIDATE_QUEUE_URL: SQS queue URL for consolidation tasksAWS_REGION: AWS region for all resources
- Runtime: Java 17
- Memory: 512 MB (SummarizeWorker), 512 MB (ConsolidatorWorker)
- Timeout: 60 seconds
- Batch Size: 1 (SummarizeWorker), 10 (ConsolidatorWorker)
The solution ensures processing reliability through:
- SQS Message Queues: Decouples components and provides message persistence
- Visibility Timeout: Messages remain in queue if Lambda fails (600 seconds)
- Batch Item Failures: Failed messages are automatically retried
- Event Source Mapping: Lambda automatically polls SQS for new messages
- Processing Speed: Files are processed within seconds of upload
- Scalability: Lambda automatically scales to handle concurrent uploads
- Storage Optimization: Raw files can be deleted after summarization to minimize costs
- DynamoDB: Uses on-demand billing for variable workloads
-
Verify event source mapping exists:
aws lambda list-event-source-mappings --function-name SummarizeWorker
-
Check SQS queue has messages:
aws sqs receive-message --queue-url <queue-url>
-
Verify IAM role has necessary permissions
- Check CloudWatch logs for error messages
- Verify S3 bucket and DynamoDB table exist
- Confirm CSV file format matches expected schema
- Check Lambda memory and timeout settings
-
Wait at least 1-2 minutes for processing to complete
-
Verify DynamoDB table contains data:
aws dynamodb scan --table-name IoTTrafficConsolidated
-
Check for Lambda execution errors in CloudWatch
- AWS SDK v2: 2.25.41
- Apache Commons CSV: 1.10.0
- SLF4J: 2.0.9
- JUnit: 4.13.2 (testing)
- Maven Shade Plugin: 3.5.0 (fat JAR creation)
Run unit tests:
mvn testTo remove all AWS resources:
# Delete Lambda functions
aws lambda delete-function --function-name SummarizeWorker
aws lambda delete-function --function-name ConsolidatorWorker
# Delete SQS queues
aws sqs delete-queue --queue-url <new-file-queue-url>
aws sqs delete-queue --queue-url <consolidate-queue-url>
# Delete DynamoDB table
aws dynamodb delete-table --table-name IoTTrafficConsolidated
# Delete S3 bucket (empty it first)
aws s3 rm s3://<bucket-name> --recursive
aws s3 rb s3://<bucket-name>This project is provided as-is for educational purposes.
For issues or questions:
- Check CloudWatch logs for detailed error messages
- Verify AWS credentials and permissions
- Ensure all prerequisites are installed
- Review the troubleshooting section above