Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOWTO: How can one re-use Generator definitions? #46

Open
canthony opened this issue Jun 15, 2021 · 9 comments
Open

HOWTO: How can one re-use Generator definitions? #46

canthony opened this issue Jun 15, 2021 · 9 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@canthony
Copy link

canthony commented Jun 15, 2021

How can one define a "basic" generator template in a ben.xml and re-use it in multiple scenarios in the same .ben.xml ?

As an example, I am generating data for 3 different classes of organisation, where address and bank account generation are the same, but other attributes are overriden (and new attributes are added).

Currently, I can only see cut-and-paste reuse as an option, or define a Generator class in Java - which is doable, of course, but it's not terribly clear how to map the XML elements/attributes to the constituent generator classes...

<generate type="buyer" count="{buyers_count}" consumer="buyers.csv">
  <variable name="bank_data" source="bank.ent.csv" distribution="random"/>
  <variable name="swift_suffix" pattern="\d{2}[A-Z]{3}"/>
  <variable name="cg" generator="CompanyNameGenerator" dataset="GB" locale="en_GB"/>
  
[…]

  <attribute name="addressLine1" script="ag.houseNumber + ' ' + ag.street"/>
  <attribute name="city" script="ag.city.name"/>
  <attribute name="postcode" script="ag.postalCode"/>
</generate>


<generate type="seller" count="{seller_count}" consumer="sellers.csv">
  <variable name="bank_data" source="bank.ent.csv" distribution="random"/>
  <variable name="swift_suffix" pattern="\d{2}[A-Z]{3}"/>
  <variable name="cg" generator="CompanyNameGenerator" dataset="GB" locale="en_GB"/>
  
[…]

  <attribute name="addressLine1" script="ag.houseNumber + ' ' + ag.street"/>
  <attribute name="city" script="ag.city.name"/>
  <attribute name="postcode" script="ag.postalCode"/>
</generate>

Ideally, one would be able to define a basic generator and reuse them as a source for a variable, e.g.

  <generate type="company" consumer="NoConsumer">
     <variable name="bank_data" source="bank.ent.csv" distribution="random"/>
     <variable name="swift_suffix" pattern="\d{2}[A-Z]{3}"/>
     <variable name="cg" generator="CompanyNameGenerator" dataset="GB" locale="en_GB"/>
  
[…]

     <attribute name="addressLine1" script="ag.houseNumber + ' ' + ag.street"/>
     <attribute name="city" script="ag.city.name"/>
     <attribute name="postcode" script="ag.postalCode"/>
   </generate>   
  
  <!-- or <generator name="company">...</generator> -->
  
  <generate type="seller" count="{seller_count}" consumer="sellers.csv">
     <variable name="c" source="company"/>

     <attribute name="addressLine1" script="c.addressLine1"/>
     <attribute name="city" script="c.city"/>
     <attribute name="postcode" script="c.postcode"/>
  </generate>
  
  <generate type="buyer" count="{buyer_count}" consumer="buyers.csv">
     <variable name="c" source="company"/>

     <attribute name="addressLine1" script="c.addressLine1"/>
     <attribute name="city" script="c.city"/>
     <attribute name="postcode" script="c.postcode"/>
  </generate>

   
@ake2l ake2l self-assigned this Jun 15, 2021
@ake2l ake2l added the question Further information is requested label Jun 15, 2021
@ake2l
Copy link
Member

ake2l commented Jun 15, 2021

In general copy paste would be always an option when it comes to xml part, but in your case you can create a custom generator as already suggested by taking for example the PersonGenerator as example check out how the return object looks like , the Persongenerator is returning a Person object,
you should create a BankAccount object as return value , than you should do something similar to PersonGenerator
than you should be nearly done.

create a bean for example

<bean id="account_uk" class="BankAccountGenerator">

    <property name="country" value="uk"/>

</bean>
<bean id="account_us" class="BankAccountGenerator">

    <property name="country" value="us"/>

</bean>

or you can access the propertis like this

<variable name="cg" generator="new BankAccountGenerator({country='us'})"/>

or you can somehow use the already existing xml properties like dataset="GB" or locale="en_GB" and use these in your BankAccountGenerator class to configure your generator.

@canthony
Copy link
Author

canthony commented Jun 15, 2021

What is the equivalent of

     <variable name="bank_data" source="bank.ent.csv" distribution="random"/>

to be used in the custom generator?

There's quite a lot of functionality tied up in <variable> which it's not clear on how to reproduce.

@ake2l
Copy link
Member

ake2l commented Jun 15, 2021

actually the person generator is doing excatly what you need, it is also loading csv files in background and using distribution funtions to pick data from a list.

@ake2l
Copy link
Member

ake2l commented Jun 15, 2021

maybe something important i should add, some generators are structured in datasets and these datasets are reflected in csv files like the GivenNameGenerator and these classes are child of WeightedDatasetCSVGenerator, related and loaded datasets you can find here https://github.com/rapiddweller/rapiddweller-benerator-ce/tree/development/src/main/resources/com/rapiddweller/domain/person
first column is value and second value is weight

hope this helps to make things more clear

@ake2l
Copy link
Member

ake2l commented Jun 15, 2021

i would probably build a simple generator class, handle the distribution stuff in this class by using java random utils and when it come to using this generator inside my specs (.ben.xml) i would go for this approach

<variable name="cg" generator="new BankAccountGenerator({country='us'})"/>

@canthony
Copy link
Author

canthony commented Jun 15, 2021

Thanks - although, PersonGeneratar (and AddressGenerator and friends) are all weighted CSV Generators : they have just one data column, and a weight column.

My example is actually an entity CSV source (see name bank.ent.csv) - I need multiple columns, not just a single one

(Sorry, not trying to be deliberately obtuse!)

@ake2l
Copy link
Member

ake2l commented Jun 15, 2021

The PersonGenerator actually ist generating a Person object , thats why you can say

<import domains="person"/>

<generate type="user" count="5" consumer="ConsoleExporter">
    <variable name="person" generator="PersonGenerator" dataset="FR" locale="fr" />
    <attribute name="salutation" source="person.salutation" />
    <attribute name="name" script="{person.givenName +' '+ person.familyName}}" />
</generate>

you are getting a Person object with related attributes you can access with person.familyName for example.

My understanding is , this is the datastructure you want to generate with custom generator

  <generate type="company" consumer="NoConsumer">
     <variable name="bank_data" source="bank.ent.csv" distribution="random"/>
     <variable name="swift_suffix" pattern="\d{2}[A-Z]{3}"/>
     <variable name="cg" generator="CompanyNameGenerator" dataset="GB" locale="en_GB"/>
  
[…]

     <attribute name="addressLine1" script="ag.houseNumber + ' ' + ag.street"/>
     <attribute name="city" script="ag.city.name"/>
     <attribute name="postcode" script="ag.postalCode"/>
   </generate>   

and access via

<variable name="customGen" generator="new BankAccountGenerator({country='us'})"/>
<attribute name="iban" script="customGen.iban "/>

please tell me when i have the wrong understanding of what you want to archive ?

@canthony
Copy link
Author

canthony commented Jun 15, 2021

My apologies, I probably am not being entirely clear!

I want to have a Company generator - generating a bank account, and a company name, and an address.
I want to be able to re-use the company generator for differnt types of company, and override some attributes.

The CompanyGenerator - if it is to be a Java Generator, which it sounds like it needs to be - would be like PersonGenerator, and delegate to lots of other Generators (adress generators, company name generators, Regex generators).
One of those generators would need to be able to generate an entity from "bank.ent.csv", randomly distributed. This is not a CSVWEighterGenrator, but - I think - some how selecting random elements froma CSVEntitySource. I am absolutely sure that it is possible - behind the covers, that's what <variable name="bank_data" source="bank.ent.csv" distribution="random"/> is doing - but I can't see how!

That would lead to something like the following. I am not wildly happy with the repeated and duplicated
attribute defintions in the generator elements for buyer and seller, but I guess I can put up with it.

 <bean id="MyCompanyGenerator" class="MyCompanyGenerator/>

<generate type="seller" consumer="sellers.csv">
   <variable name="company" generator="MyCompanyGenerator"/>
   <attribute name="addressLine1" script="company.addressLine1"/>

   <attribute name="name" script="company.name"/>
   <attribute name="addressLine1" script="company.addressLine1"/>
   <attribute name="postcode" script="company.postcode"/>     
      
   <attribute name="sortCode" script="company.bank.sortCode"/>
   <attribute name="accountNumber" script="company.bank.accountNumber"/>
   <attribute name="swift" script="company.bank.swift"/>
   <attribute name="iban" script="company.bank.iban"/>
 </generate>
 
  <generate type="buyer" consumer="buyers.csv">
   <variable name="company" generator="MyCompanyGenerator"/>
   <variable name="buyerNameGenerator" generator="BuyerNameGenerator"/>
   <!-- different name generator here -->
   <attribute name="name" script="buyerNameGenerator"/>
   <attribute name="addressLine1" script="company.addressLine1"/>
   <attribute name="postcode" script="company.postcode"/>     
      
   <attribute name="sortCode" script="company.bank.sortCode"/>
   <attribute name="accountNumber" script="company.bank.accountNumber"/>
   <attribute name="swift" script="company.bank.swift"/>
   <attribute name="iban" script="company.bank.iban"/>
 </generate>

In summary : I think the "difficulty" I am having is that the <variable> element is so powerful :
It can do pattern generation (which I think maps to the RegexString generator)
It can do value selection
It can read entities and select them randomly or weighted
It can do more!
And yet there isn't any documentation (or factory class or similar) that says "This feature here maps to this Generator class"

In this particular case, I think I want to be able to know how to select/genereate a random entity from a CSVfile, and use it in a custom generator
More widely, I think that the SPI and basic generators could do with more explanation.

And finally, it would be good if you could reuse some template generator in the xml, and then override individual attributes.

Don't want much, do i? :-)

@ake2l
Copy link
Member

ake2l commented Jun 15, 2021

Thank you for your input ! i am going to thing about how to realize such kind of template generator.
And i put it on the list to extend manual when it comes to <variable> to describe more the architecture behind.

I can also recommend to use javascript in benerator context , you can interact with benerator object through polyglot
I started to implement this and it works pretty good.

for example using the database connection to query data more dynamic

<iterate type="test" source="{js:db.queryEntities('testEntity', getQuery(), null)}" consumer="ConsoleExporter">

or generating data by using javascript functions

<variable name="company" script="{js:MyCompanyGenerator()}"/>

@ake2l ake2l added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 17, 2021
@ake2l ake2l removed documentation Improvements or additions to documentation question Further information is requested labels Aug 6, 2021
@PeterBrinkhoff PeterBrinkhoff added this to the Release 2.1.0 milestone Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants