Semantics and Object Property Databases #3210
strikaco
started this conversation in
General Discussion
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been spending some time delving into Semantics and Linguistics, and reading the source code of every old DikuMUD I can find in an attempt to create an open-source version of Scribblenauts.
As best I can tell, the backend of Scribblenauts ("Objectnaut") is less a database and more an ontology-- very, very similar to the Prototypes system we have. There is no direct inheritance tree, just a loose hierarchy of concepts that any downstream object can inherit from as they need. It's like if Tags ("Adjectives") were a bundle of properties, and assigning a Tag to something imbued it with those properties-- and removing that Tag removed those properties. Prototypes doesn't quite work this way (and neither do Tags, unfortunately) but this is the general idea. Scripts come close but I don't think that system scales well enough for something like this, so I'll probably end up serializing property sets in the Tag.data field or something, using category to namespace part-of-speech.
The details of my research are boring, but here are a few things I discovered that might be of interest to others:
Semantic Feature Production Norms Dataset
https://huggingface.co/datasets/metaeval/semantic-feature-production-norms/viewer/metaeval--semantic-feature-production-norms/train
What the hell is this?
What sort of Tags might an Ambulance need?
I didn't come up with these-- this dataset helpfully includes them. This downloadable dataset will give you a list of colors, components and other features of thousands of common objects. Anyone could easily parse this into Prototype dicts if they so desired. I'd do it myself but I'm not sold on expressing some things (like number of wheels) as a string, so some discretion needs to be exercised in whether to add such features as attributes or tags.
(Personally, I'm leaning toward loading this into some sort of graph or dictionary, then using it for lookups as needed instead of directly as a Prototype. Then implementation of properties as tags or attributes is left up to the dev, and the library could also be used from Typeclasses too.)
I have no idea what the licensing status of this is so copy at your own risk.
ConceptNet
https://conceptnet.io/
What is this?
"ConceptNet is a freely-available semantic network, designed to help computers understand the meanings of words that people use."
It's like a multilingual version of the above, with even more data. There is no easy csv dump of the data, but you can compile the dataset and run an instance of your own if you wish. Or use the web UI/API.
Using Ambulance as an example again:
https://conceptnet.io/c/en/ambulance
https://api.conceptnet.io/c/en/ambulance
It'll show you things you can do with an Ambulance, related terms, etc. The API is a bit hairy (JSON-LD format) but maybe someone with more patience can achieve something with it.
Semantic Domains
https://semdom.org/v4/1
What the hell is this?
Ever get to thinking what sorts of things you might need to model in a game? Especially one involving different languages, and the words needed to express concepts across them?
You need something like a Swadesh list-- a list of universal concepts that every language has some notion of. The shortest ones range from 100-200 words (man, woman, animal, me, you, disease, etc.)
Semantic Domains is that concept, but extended for professional use. Use it like a checklist of objects and concepts to consider implementing, and how they might be categorized/classified in relation to each other. How do you handle weather? Rain? Wind? The concept of Religion?
This one is way more abstract than the above two. The goal here is to look at the different domains and plan ahead for how you want to model things like Emotions or Marriage Proposals.
schema.org
https://schema.org/docs/full.html
Honorable mention; not as good as Semantic Domains due to mostly being limited to web-related terms, but this could also be a strength for a MUD based around modern/digital/cyberpunk settings. Provides a fairly standardized set of attributes for most digital concepts.
Beta Was this translation helpful? Give feedback.
All reactions