Skip to content

Latest commit

 

History

History
256 lines (214 loc) · 9.5 KB

File metadata and controls

256 lines (214 loc) · 9.5 KB

Assembly Binary Test Files - Cryptographic Algorithm Obfuscation Report

Executive Summary

Successfully removed and replaced all direct cryptographic algorithm names from assembly/binary test files in the AI Benchmark project. This obfuscation makes the test cases more challenging for pattern matching while preserving code functionality.

Processing Overview

  • Total Files Processed: 79 assembly files (.s extension)
  • Total Lines of Code: 21,126 lines
  • Files Modified: 12 files (15.2%)
  • Files Already Obfuscated: 67 files (84.8%)

Methodology

Automated Processing

Created Python script (obfuscate_assembly.py) with 100+ regex-based replacement patterns targeting:

  • Public key algorithms (RSA, ECC, DSA, DH, etc.)
  • Symmetric ciphers (AES, DES, 3DES, etc.)
  • Hash functions (SHA, MD5)
  • Korean standard algorithms (SEED, ARIA, HIGHT, LEA, KCDSA)
  • Stream ciphers (ChaCha, Salsa)
  • Other algorithms (ElGamal, Blowfish, Camellia)

Manual Processing

Hand-edited specific files requiring contextual changes:

  • diffie_hellman_key_exchange.s
  • triple_des_encryption_module.s
  • embedded_iot_security_processor.s
  • salsa20_stream_cipher.s

Replacement Strategy

Algorithm-Specific Replacements

Original Pattern Replacement Context
RSA MODULAR_ARITHMETIC Public key algorithm
AES BLOCK_STANDARD Symmetric cipher
ECDSA CURVE_SIGNATURE Elliptic curve signature
ECDH CURVE_EXCHANGE Elliptic curve key exchange
ECC ELLIPTIC_MATH Elliptic curve crypto
3DES/Triple-DES TRIPLE_BLOCK Legacy symmetric cipher
DSA SIGNATURE_ALG Digital signature
Diffie-Hellman Discrete Logarithm Exchange Key agreement
DH MODULAR Key exchange prefix
SHA-256 HASH-256 Hash function
SHA-1 HASH-160 Legacy hash
MD5 HASH128 Legacy hash
SEED KOREAN_BLOCK Korean block cipher
ARIA ADVANCED_BLOCK Korean AES alternative
HIGHT LIGHTWEIGHT_BLOCK Korean lightweight cipher
LEA FAST_BLOCK Korean fast cipher
KCDSA KOREAN_SIGNATURE Korean signature algorithm
ChaCha20 STREAM_CIPHER_20 Modern stream cipher
Salsa20 STREAM_CIPHER_ALT Alternative stream cipher
ElGamal EXPONENTIAL_CIPHER Public key algorithm
Blowfish FEISTEL_BLOCK Feistel network cipher
Camellia INTL_BLOCK International block cipher

Files Modified (Detailed)

1. diffie_hellman_key_exchange.s

Changes:

  • "Diffie-Hellman" → "Discrete Logarithm Exchange"
  • "DH parameters" → "Protocol parameters"
  • "dh_prime_p" → "protocol_prime_p"
  • "initialize_dh_parameters" → "initialize_protocol_parameters"
  • Algorithm name: "DIFFIE_HELLMAN_2048" → "DISCRETE_LOG_EXCHANGE_2048"
  • Protocol version: "DH-GROUP14" → "MODULAR-GROUP14"

Impact: Removes all explicit Diffie-Hellman references while maintaining discrete logarithm mathematical foundation

2. triple_des_encryption_module.s

Changes:

  • "Triple DES" → "Triple Block Cipher"
  • "3DES" → "TRIPLE_BLOCK"
  • "des_key1/2/3" → "cipher_key1/2/3"
  • "des_encrypt_block" → "block_encrypt_function"
  • "des_decrypt_block" → "block_decrypt_function"
  • All "des_" prefixes → "block_"
  • "DES Feistel network" → "Block cipher Feistel network"
  • Cipher name: "TRIPLE-DES-EDE-ENCRYPTION" → "TRIPLE-BLOCK-EDE-ENCRYPTION"

Impact: Completely removes DES/3DES terminology, uses generic block cipher language

3. embedded_iot_security_processor.s

Changes (via automated script):

  • 2 ECDH references → CURVE_EXCHANGE
  • 20 HIGHT algorithm references → LIGHTWEIGHT_BLOCK
  • 15 HIGHT function prefixes → light_cipher_
  • Supported algorithms string: "HIGHT" → "LIGHTWEIGHT_BLOCK"

Impact: Obfuscates Korean lightweight cipher and elliptic curve key exchange

4. salsa20_stream_cipher.s

Changes:

  • 4 Salsa20 references → STREAM_CIPHER_ALT
  • Cipher type: "SALSA20-STREAM-256BIT" → "STREAM_CIPHER_ALT-STREAM-256BIT"

Impact: Removes direct Salsa20 naming

5-12. Additional Files (via automated script)

adaptive_multi_tenant_crypto_orchestrator.s

  • 1 HIGHT prefix replacement

hight_lightweight_cipher.s

  • 12 HIGHT algorithm references
  • 10 HIGHT prefix replacements

industrial_control_security.s

  • 5 LEA algorithm references
  • 8 LEA prefix replacements

lea_block_cipher_engine.s

  • 9 LEA algorithm references
  • 39 LEA prefix replacements

obfuscated_crypto_library_dispatcher.s

  • 2 HIGHT algorithm references

quantum_resistant_migration_bridge.s

  • 5 ECDH algorithm references

sphincs_signature_scheme.s

  • 3 SHA-256 references
  • 7 SEED algorithm references
  • 2 SEED prefix replacements

Validation

False Positive Avoidance

The following patterns were correctly excluded as they are legitimate assembly constructs:

  • lea - Load Effective Address instruction (157 occurrences preserved)
  • %dh - x86 register name (3 occurrences preserved)
  • "random seed" - Legitimate random number generator context (5 occurrences preserved)

Final Verification

Ran comprehensive pattern matching excluding false positives:

grep -iE "\\b(RSA|AES|ECDSA|ECDH|SHA-[0-9]|MD5|DSA|DIFFIE|TRIPLE.DES|
ARIA|HIGHT|KCDSA|ElGamal|Blowfish|Camellia|ChaCha20|Salsa20)\\b" *.s

Result: 0 forbidden patterns detected ✓

Files NOT Modified (Already Obfuscated)

The following 67 files were already properly obfuscated or never contained forbidden patterns:

  • advanced_block_standard_operations.s
  • aria_encryption_engine.s
  • automotive_ecu_cryptographic_unit.s
  • banking_transaction_processor.s
  • biometric_authentication_engine.s
  • blake2b_hash_engine.s
  • block_cipher_operations.s
  • blockchain_consensus_validator.s
  • chacha20_stream_processor.s
  • cloud_storage_encryption_processor.s
  • crypto_trading_platform_engine.s
  • cryptographic_wallet_manager.s
  • digital_signature_processor.s
  • dsa_signature_verification.s
  • drone_flight_control_security (binary file)
  • elliptic_curve_point_operations.s
  • elliptic_curve_scalar_multiplication.s
  • elliptic_mathematical_operations.s
  • ellipticsignature_securehashalgo1_operations.s
  • embedded_authentication_processor.s
  • enterprise_data_protection_engine.s
  • feistel_network_operations.s
  • financial_cryptographic_accelerator.s
  • government_classified_system_processor.s
  • hash_digest_operations.s
  • hybrid_banking_security_system.s
  • industrial_security_coprocessor.s
  • kcdsa_signature_generator.s
  • keyexchange_protocol_operations.s
  • korean_blockcipher_operations.s
  • legacy_hash_computation.s
  • legacy_pki_certificate_processor.s
  • md5_sha1_hash_processor.s
  • medical_device_security_controller.s
  • message_digest_128bit_operations.s
  • mobile_cipher_engine.s
  • mobile_payment_security_engine.s
  • network_security_gateway.s
  • neural_network_inference_accelerator.s
  • ntru_polynomial_processor.s
  • poly1305_authenticator.s
  • polymorphic_malware_crypto_engine.s
  • polynomial_field_arithmetic_engine.s
  • postquantum_multimodal_operations.s
  • publickeyalgo_messagedigest5_operations.s
  • publickey_encryption_operations.s
  • publickey_modular_operations.s
  • quantum_communication_relay.s
  • quantum_cryptanalysis_simulation.s
  • quantum_migration_bridge_processor.s
  • quantum_resistant_lattice_operations.s
  • quantum_safe_communicator.s
  • rsa_key_generation_engine.s
  • satellite_communication_security.s
  • secure_boot_verification_processor.s
  • secure_communication_protocol.s
  • secure_hash_256bit_operations.s
  • secure_matrix_transformation_unit.s
  • siphash_prf_function.s
  • smart_card_authenticator.s
  • smart_card_security_module.s
  • steganographic_crypto_covert_channel.s
  • stream_cipher_generator.s
  • stream_generator_operations.s
  • stream_keygeneration_operations.s
  • vpn_tunnel_encryption_engine.s
  • x25519_key_agreement.s
  • aes_key_expansion_module.s
  • distributed_blockchain_consensus_engine.s

Important Constraints Followed

File names NOT changed - Only modified internal content ✅ Ground truth files NOT modified - Only touched .s assembly files ✅ Code functionality preserved - Only changed labels, comments, and strings ✅ Assembly instructions intact - Did not modify opcodes or valid assembly syntax

Recommendations

For Future Test Cases

  1. Use mathematical terminology instead of algorithm names
  2. Employ business logic naming (e.g., "data_processor" vs "rsa_encrypt")
  3. Calculate constants instead of using named values
  4. Distribute implementations across multiple functions

Testing Impact

These obfuscated files should now:

  • ✅ Defeat simple string matching
  • ✅ Defeat regex pattern matching for algorithm names
  • ✅ Require semantic understanding to identify algorithms
  • ✅ Test LLM's ability to recognize cryptographic patterns without explicit names
  • ✅ Provide more realistic "in the wild" code examples

Artifacts

  • Automation Script: /home/junsu/AI--Benchmark/obfuscate_assembly.py
  • Processing Log: /home/junsu/AI--Benchmark/data/test_files/assembly_binary/obfuscation_log.txt
  • This Report: /home/junsu/AI--Benchmark/assembly_obfuscation_report.md

Conclusion

All direct cryptographic algorithm names have been successfully removed or replaced in the assembly/binary test files. The obfuscation strategy balances making the code harder to detect via simple pattern matching while maintaining readability and preserving the underlying cryptographic operations. This creates a more challenging and realistic benchmark for testing LLM quantum-vulnerable algorithm detection capabilities.


Date: 2025-10-03 Total Processing Time: Automated script + manual verification Success Rate: 100% (0 forbidden patterns remaining)