Knowledge Engineering Project 2021: SemScript

Elisabetta Fedele (@elisabettafedele)
elisabetta.fedele@mail.polimi.it
Filippo Lazzati (@filippolazzati)
filippo.lazzati@mail.polimi.it

1. Knowledge Engineering

1.1 Competency questions

In order to create the ontology, we started from the following competency questions:

Which characters speak in german at least once?
Which characters say a sentence x?
Which are the numbers of the scenes played in a New York taxicab during night?
Which are the pages occupied by the scene number x?
In which part of the day does the scene number x take place?
How many times is the word "Facebook" used?
Which is the starting page of the scene x?
How many times 2 characters x and y are in the same scene?
Which are all the movies inside the dataset?

1.2 Ontology structure

The ontology is written in OWL and its structure is the following:

Movie
Character
ParticularScene
- ActionScene
- CharacterScene
- DualDialogueScene
- TransitionScene
Scene

Where Movie, Character, ParticularScene and Scene are the three main classes and ActionScene, CharacterScene, DualDialogueScene and TransitionScene are subclasses of ParticularScene.

A representation of the classes made by OntoGraf:

Movie represents a movie. It has only the title property. It is as a named graph.

Scene represents a main scene of the movie. A scene has properties about the location and the time in which it is played and other information about the pages it occupies on the screenplay.

Character represents a character of the movie, with its name and the sentences he says during the movie. Furthermore, it has associated a hasModifier property which provides information about the visibility of the character in a scene (see Screenplay) .

Finally, ParticularScene is just a superclass of ActionScene, CharacterScene, DualDialogueScene and TransitionScene. The suffix -scene should not be mis-interpreted. In our model, a ParticularScene is the basic unit of a Scene, and it can be of 4 different types according to what in this unit happens (a dialogue, an action or a transition).

2. Data Linking

The screenplays in pdf format can be downloaded from the following website;
In order to obtain a format that can be easy translated to rdf, we used a pdf to json converter for screen play from SMASH-CUT;
RDF triples can be obtained from Json format through RML rules. There exists a facility language called YARRRML (sub-language of YAML) which allows to write RML rules in a less tricky way. To test the YARRRML rules the online tool Matey has been used. The rules.yaml file contains the YARRRML rules whereas the yarrrml parser library can be used to convert this file to into rules.rml; In order to achieve this, open the prompt and write the following command:

yarrrml-parser -i rules.yaml -o rules.rml
rmlmapper-4.9.0.jar taken from the public repository executes RML rules to generate Linked Data. It has been used to create the RDF triples from the JSON files (screenplays). The command is:

java -jar rmlmapper-4.9.0.jar -m rules.rml -o SemScript/src/main/resources/movies_in_rdf/the_social_network.rdf

through the above command, you can use the rules.rml file to transform the the_social_network.json file (contained in /movies) to the_social_network.rdf file. It should be remarked that rmlmapper provides in output a file with ANSI encoding, while the Jena API's work with the UTF-8 one. Therefore a change should be done. Moreover, the client class works with .ttl files, so the extension of the output file (the_social_network.rdf) has to be changed (to .ttl).

Note: inside the embedded_client_server_multiple_movies there is a server which can be launched and it creates a dataset (without using reasoners) with named graphs: each movie represents a named graph and can be queries with these queries.

3. Semantic Web Application

The competency questions in the competency_questions.txt file have been converted to the SPARQL queries contained in the queries folder. They can be executed in different ways:

run the EmbeddedServer (namely a Fuseki server) class with the proper name of the movie to query, and after run the ClientForEmbeddedServer class which will connect to the EmbeddedServer and run the queries. Both of these classes use the Jena API's. In particular, the EmbeddedServer class builds a Model for the ontology SemScript.owl and one for the data (example: the_social_network.ttl and uses the standard OWL reasoner of Jena to infer new data. The client connects to it on the 3031 port (custom value, since the fuseki server usually runs on the 3030 port) and submits the queries.
another way is to download the fuseki-server on your pc (download here) and after run it (in windows, use the fuseki-server command from CLI inside the downloaded directory). Now a browser can be used to connect to it (localhost:3030) and run queries with UI. 2')an additional method strictly related to the 2) one is to run a Fuseki server but, instead of running the queries with UI, running them by the outSPARQL plugin for intellij (add a SPARQL endpoint in the plugin settings specifying the location, namely port 3030, then open a .rq file and run the query).
a last method is to run the Fuseki server on your pc (maybe using this image) and configure it as specified at 2), and after using the jupyter notebook QueriesSPARQL.ipynb to run the queries.

Some last notes:

you can configure a reasoner on the Fuseki server by using the fuseki_server_config_file.ttl provided (only few words have to be changed).

Tools

Protégé - Ontology Creation
Matey - RDF Mappings Generator
Maven - Dependency Management
IntelliJ - IDE
outSPARQL - Plugin for running queries
Jupyter notebook - IDE
YARRRML parser - YARRRML parser
RML mapper - Linked Data producer
Docker - Appication deployer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resources

Resources

SemScript

SemScript

movies

movies

QueriesSPARQL.ipynb

QueriesSPARQL.ipynb

README.md

README.md

SemScript.owl

SemScript.owl

competency questions.txt

competency questions.txt

fuseki_server_config_file.ttl

fuseki_server_config_file.ttl

rules.yaml

rules.yaml

Repository files navigation

Knowledge Engineering Project 2021: SemScript

Elisabetta Fedele (@elisabettafedele)
elisabetta.fedele@mail.polimi.it

Filippo Lazzati (@filippolazzati)
filippo.lazzati@mail.polimi.it

1. Knowledge Engineering

1.1 Competency questions

1.2 Ontology structure

2. Data Linking

3. Semantic Web Application

Tools

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Resources		Resources
SemScript		SemScript
movies		movies
QueriesSPARQL.ipynb		QueriesSPARQL.ipynb
README.md		README.md
SemScript.owl		SemScript.owl
competency questions.txt		competency questions.txt
fuseki_server_config_file.ttl		fuseki_server_config_file.ttl
rules.yaml		rules.yaml

filippolazzati/SemScript

Folders and files

Latest commit

History

Repository files navigation

Knowledge Engineering Project 2021: SemScript

Elisabetta Fedele (@elisabettafedele) elisabetta.fedele@mail.polimi.it

Filippo Lazzati (@filippolazzati) filippo.lazzati@mail.polimi.it

1. Knowledge Engineering

1.1 Competency questions

1.2 Ontology structure

2. Data Linking

3. Semantic Web Application

Tools

About

Topics

Resources

Stars

Watchers

Forks

Languages

Elisabetta Fedele (@elisabettafedele)
elisabetta.fedele@mail.polimi.it

Filippo Lazzati (@filippolazzati)
filippo.lazzati@mail.polimi.it