ModelSet

ModelSet is a labelled dataset of software models.

This repository contains:

The ModelSet databases with the labelled datasets. See the Downloading ModelSet section for more information.
The scripts to create the databases and generate the release. See the Building ModelSet section for more information.

You can find more information about ModelSet in the following Open Access paper: https://link.springer.com/article/10.1007%2Fs10270-021-00929-3.

Downloading ModelSet

To download ModelSet follow these steps:

You can download the latest release from https://github.com/modelset/modelset-dataset/releases
Unzip the package in some location of your workspace
The structure of the decompressed package is the following:
```
+ datasets
  + dataset.ecore/data/ecore.db
  + dataset.genmymodel/data/genmymodel.db
+ graph
+ raw-data
+ txt
```
3.1. The datasets folder contains the databases with the labelled models. The .db files are SQLite databases containing the information about the models.

The database schema includes a table called model with model data (i.e., unique identifier, source repository and filename) and a table called metadata with label data (i.e., unique identifier and a JSON object with the label information). The following figure illustrates the database schema:
```
+-------------------+     +---------------------------------+
|       model       |     |             metadata            |
+-------------------+     +---------------------------------+
| id : VARCHAR {PK} |     | id : VARCHAR {PK, FK(model.id)} |
| source : VARCHAR  |     | json : TEXT                     |
| filename : TEXT   |     +---------------------------------+
+-------------------+
```
3.2. The graph folder contains the graph representation of the models.

3.3. The raw-data folder contains the models serialized in XMI.

3.4. The txt folder includes the strings of the models (e.g., to train simple NLP models).

Querying ModelSet via Java JDBC

Once you have downloaded ModelSet, you can use JDBC to query the databases.

For instance, the following code illustrates how to query the database using JDBC:

Connection dataset = DriverManager.getConnection("jdbc:sqlite:/path/to/dbfile");
PreparedStatement stm = dataset.prepareStatement("select mo.id, mo.filename, mm.metadata from models mo join metadata mm on mo.id = mm.id");
stm.execute();

ResultSet rs = stm.getResultSet();
while (rs.next()) {
  String id = rs.getString(1);
  String filename = rs.getString(2);
  String metadata = rs.getString(3);
  System.out.println(id + ": " + metadata);
}

Using ModelSet in Python

To use ModelSet in a typical Python/Jupyter setting, we recommend you to use the modelset-py Python library we have developed. Visit the corresponding repository for more information.

Examples

We provide some examples of how to use ModelSet in the examples repository.

Building ModelSet

Note: these steps are only required if you want to create a new release of ModelSet. If you just want to use ModelSet, you can download the latest release (see Downloading ModelSet section in this file)

To create the ModelSet release, you have to follow these steps:

Execute ./bin/download-data.sh to recover the model files which will be stored in the raw-data folder. These files are not stored here as they have already published in existing GitHub repositories.
Execute ./bin/generate.sh to generate additional artifacts.
Execute ./bin/build.sh to build the ModelSet release package.
The ModelSet release package will have the name modelset.zip.

Citation

If you find this dataset useful, please consider citing its associated paper: https://link.springer.com/article/10.1007/s10270-021-00929-3

@article{lopez2021modelset,
  title   = {{ModelSet: a dataset for machine learning in model-driven engineering}},
  author  = {L{\'o}pez, Jos{\'e} Antonio Hern{\'a}ndez and 
            C{\'a}novas Izquierdo, Javier Luis and 
            Cuadrado, Jes{\'u}s S{\'a}nchez},
  journal = {Softw. Syst. Model.},
  volume  = {21},
  number  = {3},
  pages   = {967--986},
  year    = {2022},
  url     = {https://doi.org/10.1007/s10270-021-00929-3},
}

Contributing

We welcome contributions of all kinds, including extensions to the dataset, new empirical studies, and new features. If you want to contribute to ModelSet, please review our contribution guidelines and our governance model.

Note that we have a code of conduct that we expect project participants to adhere to. Please read it before contributing.

License

This dataset is licensed under the GNU Lesser General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github		.github
bin		bin
datasets		datasets
java-lib/modelset-lib		java-lib/modelset-lib
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

bin

bin

datasets

datasets

java-lib/modelset-lib

java-lib/modelset-lib

.gitignore

.gitignore

CITATION.cff

CITATION.cff

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

GOVERNANCE.md

GOVERNANCE.md

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

ModelSet

Downloading ModelSet

Querying ModelSet via Java JDBC

Using ModelSet in Python

Examples

Building ModelSet

Citation

Contributing

License

About

Releases 5

Contributors 2

Languages

License

modelset/modelset-dataset

Folders and files

Latest commit

History

Repository files navigation

ModelSet

Downloading ModelSet

Querying ModelSet via Java JDBC

Using ModelSet in Python

Examples

Building ModelSet

Citation

Contributing

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages