A multithreaded Java program for processing JSON files with Orders and generating statistics in XML format.
- Description
- Basic entities
- File examples
- Available attributes
- Threads Performance Benchmark Summary
- Project architecture
- Installation and Run
The program processes a set of JSON files with orders, collects statistics for the selected attribute, and generates an XML report, sorted from largest to smallest.
- ✅ Multithreaded file processing
- ✅ Support for 8 attributes for statistics
- ✅ Separation of tags by delimiters (
,|;#) - ✅ Sorting results in descending order
- ✅ Thread-safe processing with ConcurrentHashMap
- ✅ Input data validation
The main business entity of the system.
public class Order {
private String id;
private Customer customer;
private String status;
private String tags;
private String paymentMethod;
private double amount;
private long createdAt;
}Contains the customer's data.
public class Customer {
private String id;
private String fullName;
private String email;
private String phone;
private String city;
}Represents a single statistics record.
public class StatisticItem {
private String value; // Attribute value (e.g., “Lviv”)
private Integer count; // Number of occurrences
}The root element of an XML document.
@JacksonXmlRootElement(localName = "statistics")
public class StatisticsWrapper {
@JacksonXmlElementWrapper(localName = "items")
@JacksonXmlProperty(localName = "item")
private List<StatisticItem> items; // List of all statistical records
}Saves the program settings.
public class ApplicationConfig {
private String inputDirectory;
private String outputDirectory;
private String attribute;
private int threadPoolSize;
}[
{
"id": "ord-001",
"customer": {
"id": "cust-101",
"fullName": "Vasyl Cotop",
"email": "vasyl@example.com",
"phone": "+380501112233",
"city": "Lviv"
},
"status": "NEW",
"tags": "gift, urgent",
"paymentMethod": "card",
"amount": 499.99,
"createdAt": 1731600000
},
{
"id": "ord-002",
"customer": {
"id": "cust-102",
"fullName": "Petro Poroh",
"email": "petro@example.com",
"phone": "+380501112233",
"city": "Kyiv"
},
"status": "DONE",
"tags": "gift",
"paymentMethod": "card",
"amount": 499.99,
"createdAt": 1731600000
},
{
"id": "ord-003",
"customer": {
"id": "cust-103",
"fullName": "Zelya Boba",
"email": "zelya@example.com",
"phone": "+380501112233",
"city": "Kyiv"
},
"status": "NEW",
"tags": "gift, urgent, newCustomer",
"paymentMethod": "card",
"amount": 499.99,
"createdAt": 1731600000
}
]<?xml version='1.0' encoding='UTF-8'?>
<statistics>
<items>
<item>
<value>Kyiv</value>
<count>2</count>
</item>
<item>
<value>Lviv</value>
<count>1</count>
</item>
</items>
</statistics><?xml version='1.0' encoding='UTF-8'?>
<statistics>
<items>
<item>
<value>gift</value>
<count>3</count>
</item>
<item>
<value>urgent</value>
<count>2</count>
</item>
<item>
<value>newCustomer</value>
<count>1</count>
</item>
</items>
</statistics><?xml version='1.0' encoding='UTF-8'?>
<statistics>
<items>
<item>
<value>NEW</value>
<count>2</count>
</item>
<item>
<value>DONE</value>
<count>1</count>
</item>
</items>
</statistics>| Attribute | Desc | Example |
|---|---|---|
id |
Customer ID | cust-101 |
status |
Order status | NEW, DONE, CANCELED |
tags |
Tags (separated) | gift, urgent, promo |
paymentMethod |
Payment method | card, cash, PayPal |
fullName |
Customer fullName | Vasyl Cotop |
email |
Customer email | vasyl@example.com |
phone |
Customer phone | +380501112233 |
city |
Customer city | Lviv |
The benchmark was executed on a synthetic dataset designed to simulate large-scale JSON processing:
- Files: 100 JSON files
- Records per file: 10,000
- Total records: 1,000,000
- Average file size: ~3.4 MB
- Total dataset size: ~340 MB
- Record structure: Each record represents an Order object containing nested Customer information. This structure includes multiple string fields, IDs, contact data, and nested objects, making it representative of real-world e-commerce/order-processing workloads.
The following table presents the execution time, memory consumption, and speedup ratio for processing a large JSON dataset using different thread counts:
| Threads | Time (s) | Mem (MB) | Speedup |
|---|---|---|---|
| 1 | 3.462 | 99.904 | 1.00x |
| 2 | 2.169 | 157.668 | 1.60x |
| 4 | 1.587 | 155.025 | 2.18x |
| 6 | 1.540 | 153.367 | 2.25x |
| 8 | 1.591 | 130.650 | 2.18x |
The highest speedup was achieved with 6 threads, after which performance began to plateau due to CPU saturation and increased thread contention.
All benchmarks were executed on the following system configuration:
CPU: AMD Ryzen 5 3550H
Cores: 4
Threads: 8
Base Clock: 2.1 GHz
Boost Clock: 3.7 GHz
Cache: 2MB L2 + 4MB L3
MEM: 16GB DDR4-2400
DISK: NVMe SSD
OS: Linux (kernel 5.x)
JVM: OpenJDK 21
com.halmber
├── config
│ ├── ApplicationConfig # Config
│ └── ConsoleInputHandler # Console Input Handler
├── exception
│ └── InvalidAttributeException # Exception for Invalid Attributes
├── factory
│ └── statistics
│ ├── StatisticItemFactory # Factory for StatisticItem
│ └── StatisticsWrapperFactory # Factory for StatisticsWrapper
├── model
│ ├── Customer # Customer Model
│ ├── Order # Order Model
│ └── statistics
│ ├── StatisticItem # Statistic Item
│ └── StatisticsWrapper # XML Statistics Wrapper
└── service
├── FileService # File Management Service
├── JsonFileReader # JSON Reading
├── XmlFileWriter # XML sorted Writing
└── order
├── ProcessingService # Multi-threaded Processing
├── StatisticProcessor # Statistic Processing
└── StatisticsService # Orchestrator
Input JSON Files
↓
FileService (list files)
↓
ProcessingService (ExecutorService with N threads)
↓
JsonFileReader (parse JSON)
↓
StatisticProcessor (aggregate statistics)
↓
ConcurrentHashMap (thread-safe storage)
↓
XmlFileWriter (sort + serialize)
↓
Output XML File
- JDK 21+
- Maven 3.8+
- IntelliJ IDEA (optional, for running from IDE)
You can also run the project directly from IntelliJ IDEA:
- Open the project in IntelliJ IDEA.
- Make sure the correct JDK (21+) is selected.
- Open the
Mainclass. - Right-click on the class → Run 'Main.main()'.
All dependencies are automatically resolved by the IDE.
> mvn clean compile exec:java
This command compiles the project and runs it directly using Maven. All dependencies are automatically included.
mvn clean packageNote: This JAR does not include dependencies, so it cannot be run with
java -jardirectly because project relies on external libraries.
java -jar target/java-core-profitsoft-internship-1.0-SNAPSHOT.jarThis will work only if a fat-jar is built, including external dependencies.
After launching the project, follow the instructions in the console.
====== Order Statistics Configuration ======
Available attributes: id | status | tags | paymentMethod | fullName | email | phone | city |
Enter input directory path (default: src/main/resources/):
Enter attribute name (default example: id): city
Enter threads pool size (default: 8): 8
Configuration set:
Input directory: src/main/resources/
Attribute: city
Output directory: src/main/resources/outputFiles
Threads count: 8
====== END Order Statistics Configuration END ======
mvn testMIT License
halmber