-
Notifications
You must be signed in to change notification settings - Fork 0
Adding generate module for complex grammars. #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
seems more appropriate for general audiences and future proofs the package api.
|
But it's not generation, it's conversion. All the information required is in the input. I wouldn't rename until there is a clear second case to generalize from. |
|
Fair enough, holding off on the merge until then. We can circle back at a later date. |
a multiply trace sequence, ^X. a data issue chance, %dX. These can work in combination with each other.
|
So I have made some additions to the form of the delimited traces. Each delimited trace can have some augments attached. Thoughts on this direction? from koalas.generate import gen_log
# generate from lists
variant_a = ["a b e f || ^20"]
variant_b = ["a b e c d b f || ^30"]
# each generated trace could have a data issue
variant_c = ["a b c e d b f || ^20 %d25"]
variants = variant_a + variant_b + variant_c
log = gen_log(*variants)
print(log)
# show some __repr__
print(log.__repr__())
print(log.language().pop().__repr__())
print(log.directly_follow_relations().__repr__())Which produces the following: [<a,b,e,f>^20,<a,b,e,c,d,b,f>^30,<a,b,c,e,d,b,f>^16,<a,b,c,e,b,f>^1,<e,b,c,a,d,b,f>^1,<a,b,c,d,e,b,f>^1,<a,b,c,e,d,f,b>^1]
EventLog(
[Trace(['a','b','e','f'])] * 20+
[Trace(['a','b','e','c','d','b','f'])] * 30+
[Trace(['a','b','c','e','d','b','f'])] * 16+
[Trace(['a','b','c','e','b','f'])] * 1+
[Trace(['e','b','c','a','d','b','f'])] * 1+
[Trace(['a','b','c','d','e','b','f'])] * 1+
[Trace(['a','b','c','e','d','f','b'])] * 1
)
Trace(['a','b','c','e','d','f','b'])
FlowLanguage([
DirectlyFlowsPair(left='SOURCE',right='a',freq=69),
DirectlyFlowsPair(left='a',right='b',freq=69),
DirectlyFlowsPair(left='b',right='e',freq=50),
DirectlyFlowsPair(left='e',right='f',freq=20),
DirectlyFlowsPair(left='f',right='END',freq=69),
DirectlyFlowsPair(left='e',right='c',freq=30),
DirectlyFlowsPair(left='c',right='d',freq=31),
DirectlyFlowsPair(left='d',right='b',freq=47),
DirectlyFlowsPair(left='b',right='f',freq=49),
DirectlyFlowsPair(left='b',right='c',freq=20),
DirectlyFlowsPair(left='c',right='e',freq=18),
DirectlyFlowsPair(left='e',right='d',freq=17),
DirectlyFlowsPair(left='e',right='b',freq=3),
DirectlyFlowsPair(left='SOURCE',right='e',freq=1),
DirectlyFlowsPair(left='c',right='a',freq=1),
DirectlyFlowsPair(left='a',right='d',freq=1),
DirectlyFlowsPair(left='d',right='e',freq=1),
DirectlyFlowsPair(left='d',right='f',freq=1),
DirectlyFlowsPair(left='f',right='b',freq=1),
DirectlyFlowsPair(left='b',right='END',freq=1),
]) |
|
It's neat, but I have some questions:-
For the hat operator ^, why limit it to dtlogs? Seems like it would work on any koalas log?
I can sort of guess what the formatting parameter does, but I don't understand this example. Is it so you can hold particular types (eg ints) in the log object? Are you planning to mix types?
Why is gen_log() expecting varargs and not a list?
Are you expecting logs on the filesystem to look like this?
|
We have two modules for generating logs.
dtlogis a quickly and simple way to generate simplified logs without any fuss. Whilegenerateoffers alternative approachs with many options for generate traces and/or specifying the data that should be generated for events.Features
dtlogto remain the same as before.generateto offer more complex trace generation patterns.generate.generate_logallows for the creation of a log using augmented patterns (see example 1)generate.generate_from_grammarallows for the creation of a log using a grammar-based approach (see example 2)Augmented patterns
Example 1
See the code snippet below for an example. The augmented patterns for multiplying a trace and rolling for a data issue are shown below. These can be combined to generate many traces, each with a roll for a data issue.
Grammar-based approach
TODOs
Grammar
Example 2
See the example code snippet below for using the grammar to generate a log. This example has a system with two xor choices, where one has a discriminative cut on d1 and has a somewhat discriminative cut on d4 for the latter xor choice. Histograms are shown afterwards to showcase the effect of each shift on process attributes.
Histogram for d_1 after 'A'



Histogram for d_2 and d_3 after 'A'
Histogram for d4 after 'E'