Skip to content

benvcutilli/InText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bibtex and/or  Biblatex (both can be found at [23177a][4e47d2]) are systems that works with
LaTeX[864888] to allow authors to easily cite within their papers without too much
boilerplate work, in addition to other features (although I believe the two are not that separated
in their citing functionalities in that a lot of it is already built into LaTeX). The goal of InText
is to achieve this but outside the PDF realm and in the plaintext realm (I forget what told me that
I couldn't do this with [23177a][864888][4e47d2]). Therefore, InText follows
[23177a][864888][4e47d2] in these ways:
    1. The user puts all reference data into a single plaintext file (read by [23177a][4e47d2];
       InText uses [448ee7][032ace] format), which the program works off of 1. to create in-text
       citations in addition to 2. as the file to which those in-text citations will point the user
       to to get information about the cited references (however, what's different from
       [23177a][864888][4e47d2] is that the bibliography doesn't end up in the same file as the
       rest of the work). In this file, each reference is given a key whose value is the information
       for that reference; the information can be split up in as many keys and values as needed
       (although [23177a][864888][4e47d2] tends to require specific keys -- such as "title" and
       "author" -- and formats for values in my experience; we instead show all information as-is,
       including keys).
          a. Each reference is used in in-text citations via a marker identifier in the file. The
	     identifier contains no spacing (InText also restricts it to letters and numbers,
	     thereby choosing a subset of what [23177a][4e47d2] allows; it can only be
	     ASCII[0f35a8] because this is the least likely to cause issues for programming
	     language parsers). To point out that an identifier is being used, they are surrounded
	     by unique text ("delimiters") that indicates as such. The whole package is called a
	     "marker" (our terminology, not theirs as far as I know, but "\cite[]{}" of
	     [4e47d2, 3.9.1][23177a] did give us the idea of some kind of marker). These are used
	     where the in-text citations end up so that InText (and Bibtex and LaTeX) knows which
	     reference needs to be cited there. In InText, one marker's key is used to refer to
	     multiple references at one point in a file; in the file described by (1), this is
	     achieved by saying, in the "markers" dictionary ([23177a][4e47d2] doesn't have this)
	     which marker key (which is alphanumeric probably to be similar to other types of keys
	     in this program) relates to which list of references. However, [23177a][864888][4e47d2]
	     marker identifiers are the reference keys found in the .bib file, and the user lists
	     each key in the marker itself.
          b. Markers support locators if needed; however, InText can do multiple locators, one for  
	     each reference used in the marker, while as far as I can tell, [4e47d2, 3.9.1][23177a]
	     can't. InText also (probably uniquely) allows multiple locators for the same reference
	     to be used in a marker simply by having the marker refer to the reference twice, one of
	     which is with the new locator. It follows that using a reference multiple times per
	     marker is allowed as well, whereas in [4e47d2, 3.9.1][23177a] it may not be.
    2. The program doesn't modify the original files; such a program isn't really that feasible
       in the end as it likely will lead to too much confusion, although now that I think about
       it, I may have just thought of a way to do it; we'll see
    3. Bibliographies can contain citations too, and the bibliography itself is written by the
       program. However, the citations may only appear within each object under "references",
       which is on par with [23177a][864888][4e47d2]. What isn't on-par is that
       [23177a][864888][4e47d2], I think, doesn't allow this within the any of the labels
       contained in the reference entry, whereas InText does. That same exception goes for
       [d3fda1][548611].
    4. [23177a][864888][4e47d2] puts everything into a single file, whereas InText effectively
       copies the original files over and puts in-text citations in them
    5. All input (the .tex-[864888] and .bib-file[23177a][4e47d2] equivalents) is
       plaintext
    6. If a reference isn't cited, it doesn't end up mentioned in the "references" file.
    7. The labels shown in the bibliography for each reference are generated by 
       SHA-256[97f69a, page 21]. The input to that function is the data supplied by the user for
       that reference. Hashing keeps the labels consistent in order to avoid Git[29ad7a] or
       something else not picking up changes do to randomly-generated keys changing all the time or
       what-have-you. SHA-256 was chosen beacause SHA-1[97f69a, page 18] is known to have collision
       issues (though very minor, and I forget my reference for all of this). The final in-text
       citations will show these hashes (but shortened, as inspired[29ad7a] by how Git does this
       sometimes, I think) so that the reader knows which references are being cited.
    8. I suppose I should credit both [23177a][864888][4e47d2] and maybe [d3fda1][548611] for giving
       me the idea to create a program that does this whole
       in-text-citation-thing-with-a-list-of-references/cite-like-you're-writing-a-paper thing, but
       they didn't come up with it themselves with the exception of it being a computer program
       instead of someone manually doing everything.
This kind of behavior is also intended to be very [d3fda1][548611]-like, except that that program
is a WYSIWYG editor, whereas we assume we are just working with plaintext. Further,
[d3fda1][548611] is only similar in that the in-text citations (with the same properties as those
found in [23177a][864888][4e47d2]) and bibliography are created and formatted by the program. I am
not quite sure how configuration was done, such as how you put the in-text citation locations into
the file, or what information goes into the bibliography and in what way; I think both were similar
to [23177a][864888][4e47d2] except that, for the latter, Office -- to the best of my knowledge --
presents you with a window to put all the information into fields, but the field titles themselves
cannot be changed (or they might be able to be minimally changed) and for the former, I'm not quite
sure how to user picks which reference needs to be cited at that location. Locators are also
possible in Office.  Also, Office probably doesn't require a full pass of all the files each time
you want to add an in-text citation or create a bibliography, and all citations are visible in the
file you were editing; no new files are created (the WYSIWYG part). Finally, I am not sure if you
can cite within the bibliography itself.

We take a page from Git[29ad7a] (a strong possiblity at least; I don't remember the moment of
inpsiration) as well (but we invert it) by
    9. requiring that all non-reader-facing files (in our case, the ones with the markers) be within
       a folder[cbdcd9, "gitfile" definition] so that nobody sees them. Instead, they are to look 
       at the files with citations in them, whose filesystem tree structure matches that of the
       interior folder, except outside of that folder and placed on the same level as that folder.
       Therefore, the "commits" are actually not within that folder, while the working tree IS
       there, which is the opposite of what Git does. Hence, "inverted". This also prevents the
       issue I previously had where I tried to solve this by having the path of the file take on a
       suffix that InText would recognize. The issue was that this would cause code editors like
       [f10e01] to think the files were in an unrecognized language, which changes
       how the text would be colored (very inconvenient).
   10. To avoid users accidentally modifying the output files made by InText, each of those files
       are hashed (using [97f69a, page 21]) to figure out if they had changed from the last time
       InText was run. This is similar to Git in that I think Git hashes are generated based on the
       file content, so a different hash implies a different state of the repository.

11. Which references and the information about the references themselves show before others in the
    bibliography is really just determined by how [9c4e1f] reads the file that contains
    their information. Same goes for the ordering of said information. It might be exactly the same;
    I'm not sure.

12. In order to put it in text files, [7fb673, "hexdigest()"; version 3.9 documentation only] said
    that it makes sense to call .hexdigest() as it will be readable


To see which parts apply to where in this file, the numbers/letters above are used below to refer
to them.

Everything in here is cited by InText[e26f68]. "^^^" was chosen to indicate markers because I am
not aware of JSON[448ee7][032ace] or Python[023bdc] (used for this project, by the way) using that
for anything. It was previously chosen for the same ASCII reasons as in 1a, but that's not the case
anymore, though I still stuck with it. What was written by a human can be found in "uncited", with
the exception of intextconfiguration.json. A file like "intextconfiguration.json" and the folder
"intext" itself is required by the aforementioned program. Git[29ad7a] is also used for version
control, so the .git folder is generated from that. "references" was generated by InText.


For this program (and as is common in, and a good idea from, other projects), the best documentation
can be found in examples, which is, in this case, our test suite "sample". To run it, use

    python intextconfiguration.json sample/uncited

from this folder (it doesn't matter where this in invoked, as long as you have the paths right in
the arguments). Note that the first parameter's contents are JSON[448ee7][032ace] where the only
types used are objects, strings within those objects, or arrays; nothing else is supported. You need
to run this with Python version 3.12 or newer, or at least that's what I know it works with. No
additional dependencies are required.


On GitHub[4e6ec9], these were the initial options I chose:

  No README
  No Gitignore
  No license
  No description
  Named "InText"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages