Name		Name	Last commit message	Last commit date
parent directory ..
CDB content		CDB content
CDB schema		CDB schema
Enrichment-Phase II		Enrichment-Phase II
Enrichment		Enrichment
Figs		Figs
ESTAT_Explore_CDB_tables.ipynb		ESTAT_Explore_CDB_tables.ipynb
README.md		README.md

README.md

Setup the Content Database

You will find in the Script SQL folder various file that help build the content database. You can go to the Database/Interactive SQL tab.

1. Setup the structure

If it is your first instantiation, please use the global script cdb_global_v2.sql

If you are updating an existing database the needed scripts can be find in each specific folder.

2. Static data

Some tables have to be filled in order for the project to work, such as:

Named entities
Modality

3. Statistics Explained Data

Like before, if it is your first instantiation of the database, please use the global script : global script cdb_global_v2.sql If it is an update, the scripts needed can be find in the Statistics Explained folder. Launch the scripts in the following order :

Once the structure is set you can launch the following files to fill the modality’s tables

Once the database is set you can start launching the various spiders.

4. Eurostat glossary

Regarding the structure, if you used the cdb_global_v2.sql file you can go to the data insertion part, if not you can go to the Estat13k folder, and launch the following scripts :

In order to gather the glossary instead of scrapping the data we used the bulkdownload option and created SQL queries from it.

First the modality queries (estat13k_modalities_data.sql) have to be launched.

Then the estat13k_glossary_data.sql. In order to do it use the following Jupyter Notebook : cdb_insert.ipynb

Finally, you can add the last queries : estat13k_stat_and_measurement_unit_data.sql

5. CodeList and datasets

Regarding the structure, if you used the cdb_global_v2.sql file you can go to the data insertion part, if not you can go to the CodeList and datasets folder, and launch the following script :

estat_codelist_datasets.sql

As previously, we did not scrape the following data, we first downloaded the raw data and created SQL queries in order to fill the database.

The first step is to launch : estat_codelist_label_data.sql and then using cdb_insert.ipynb launch each file: estat_dictionnary_code_batchX.sql, X=1,...,5.

At these stage, the codelists and code are all in the content database, however we found that we have to add some code to the time dictionnary in order for our work on the datasets to work. You'll find the elements to add in the estat_dictionnary_code_data_time_addition.sql file

Then you can add some datatsets. Launch first the estat_dataset_label_data.sql file and then the estat_dataset_code_data.sql in order to create the links between datasets and codelists. If the last file is to heavy , the cdb_insert.ipynb file can be used.

6. Taxonomy, Terminology, Topic Model

In each folder with the same name in the Script SQL folder you can find the structure of the needed tables.

7. The CDB schema

Please see CDB schema. File CDB_tables.docx briefly describes the main tables in the Content Database, i.e. those that were actually used in the Use Cases. Other tables which were left unused are not included in this description.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Database

Content Database

CDB content

CDB content

CDB schema

CDB schema

Enrichment-Phase II

Enrichment-Phase II

Enrichment

Enrichment

Figs

Figs

ESTAT_Explore_CDB_tables.ipynb

ESTAT_Explore_CDB_tables.ipynb

README.md

README.md

README.md

Setup the Content Database

1. Setup the structure

2. Static data

3. Statistics Explained Data

4. Eurostat glossary

5. CodeList and datasets

6. Taxonomy, Terminology, Topic Model

7. The CDB schema

Files

Content Database

Directory actions

More options

Directory actions

More options

Latest commit

History

Content Database

Folders and files

parent directory

Setup the Content Database

1. Setup the structure

2. Static data

3. Statistics Explained Data

4. Eurostat glossary

5. CodeList and datasets

6. Taxonomy, Terminology, Topic Model

7. The CDB schema