Skip to content

Latest commit

 

History

History

learn-lang-diary

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Diary and Other Texts

A diary of research results, plus an assortment of other theoretical and experimental texts can be found here. In order of importance & relevance.

  • Grammar Induction -- AGI 2022 theoretical paper on how to induce grammatical structure from unstructured sensory data. (10 pages.) Matching slides for talk. Transcript. This repeats much of the talk given at INLP-2021, Explainable Patterns which was very thinly attended.

  • Grammar Experiments -- INLP-2022 report on experimental results from above research program. (10 pages.) Matching slides, Transcript.

  • Neural-Net vs. Symbolic Machine Learning -- (2018) An attempt to build a bridge between these two styles of machine learning. The ideas in this paper are foundational for the work being performed in this git repo. (75 pages.)

  • Stitching -- (2018) An explanation of how word-vectors can be "stitched together" in a non-linear fashion, so as to extract grammatical (syntactic) as well as semantic content in the vectors. Describes one stage of the processing pipline implemented in this git repo. (13 pages.)

  • The Distribution of English Language Word Pairs -- (2009) A report on the statistical distribution of word-pairs for English. These experimental results provided the foundational impetus for this project. (13 pages.)

  • Word Pair Distributions -- (2015-2019) A more formal, more complete version of the above. This extracts results and graphs from the diary, placing them in one single paper, in a more-or-less coherent presentation. Attempts to provide "the last word" on this topic, from the point of view of this project. (31 pages.)

  • Connector Set Distributions -- (2017) Report of experimental results, characterizing the statistical properties and distribution of disjunts ("jigsaw pieces") and their connectors, extracted from text corpora of varying sizes. (55 pages.)

  • Meaning as Inverse Interpretation -- (2019) In model theory, an "interpretation" provides the "semantic" content of a "syntactic" model. This essay argues that humans find "meaning" in the inverse direction: by extracting the syntactic structure from the jumbled, disordered stream of sensory input. (6 pages.)

  • Messaging -- (2021) A loose jumble of ideas connecting communications, grammar, and the statistical mechanics of Link Grammar. Attempts to write down a partition function. (9 pages.)

  • Grammar Evaluation Results -- (2019) Evaluation of the Link Grammar dictionaries obtained from the ULL/Kolonin corpora and learning experiments. (31 pages.)

  • Learning a Lexis -- (2019) Outline of an unwritten paper that was going to describe this project, overall. (5 pages.)

  • Drafts -- (2015-2017) A directory containing assorted short notes and rough drafts. Some of these are extracts from the diary that were requested during specific conversations.

  • Reading List -- A bibliography of notable papers that provide background and inspiration for the research being done here.

  • Entropy -- Collection of basic definitions and notation used elsewhere in these texts. (10 pages.)

The Diary Itself

The actual diary. All the good stuff is in here. Most of these have a short executive summary of what's in them.

Data and Figures

The following directories contain figures, illustrations, data, graphs supporting the various texts. See the diary for a review and explanation of what the datasets are illustrating.

Tools

Scheme scripts used to process data to create graphs. See the diary for a review and explanation of how and when to use these tools. These are very specific to reporting results for the diary, and are not otherwise used in the data processing pipeline.