Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] Denumberized maps #106

Open
wdoekes opened this issue Jun 5, 2022 · 18 comments
Open

[patch] Denumberized maps #106

wdoekes opened this issue Jun 5, 2022 · 18 comments

Comments

@wdoekes
Copy link
Contributor

wdoekes commented Jun 5, 2022

Hi!

netradiant has a feature called "denumberized maps" where it will not save Entity and Brush number comments in the map file.

I ported it to netradiant-custom. Maybe you'd like to include it.
https://github.com/wdoekes/nrcradiant-deb/blob/main/patches/nrcradiant-denumberized-maps.patch

Cheers,
Walter

@Garux
Copy link
Owner

Garux commented Jun 8, 2022

Thanks, i've seen this commit, but got not very fond of it.
Main purpose is apparently reducing diff noise, while using VCS for map files.
Removing comments for this specific use may be done by tools like sed, which has complexity similar to resaving a map with this option.
Diff has no much value without visual representation anyhow, this would require a tool to extract removed/added/altered primitives.
Committer also likely never had actually mapped, otherwise he would know, how numbering may be useful for debugging issues by checking textual form of primitives. Tools report their numbers, ye. Mapping with this option enabled is not the best idea.
So comments removal is better to happen right before commit, if pretty diffs are wanted. I think editor is not proper tool, which should care about this by having extra dirt in the code.

@wdoekes
Copy link
Contributor Author

wdoekes commented Jun 8, 2022

Hi Garux,

I have indeed seen that the numbers aid in debugging.

Having some kind of pre-commit hook to remove numbers however is a pain. This would help when looking back at previous commits, but not when looking at the current uncommitted state.

Personally however, I would opt to create a small script that outputs a numberized version, while leaving the on-disk file "denumberized". Such a script should be trivial with the current map file layout.

@wdoekes
Copy link
Contributor Author

wdoekes commented Jun 8, 2022

Also:

Committer also likely never had actually mapped, otherwise he would know, [..]

But for newbies to map making, it can be advantageous to see exactly which changes in the map file is done by an edit in de UI. At this point my goal is not making intricate maps, but (a) interacting in map making with others and (b) seeing how all this stuff works.

@eGax
Copy link

eGax commented Jun 8, 2022

@Garux sed is great for this kind of thing.

'Committer also likely never had actually mapped, otherwise he would know, how numbering may be useful for debugging issues by checking textual form of primitives'

That could be yes.

@ensiform
Copy link

ensiform commented Jun 8, 2022

Maybe I'm wrong now but, doesn't the old versions of radiant not strictly keep the same order of entities on save? I know that is what happens with brush faces, they don't always get written in the same order.

@wdoekes
Copy link
Contributor Author

wdoekes commented Jun 8, 2022

@ensiform : your memory serves you correct. GtkRadiant loads brushes and entities in reverse order, causing it to flip on every save.
https://github.com/wdoekes/gtkradiant-deb/blob/main/patches/radiant-load-map-in-saved-order.patch

@wdoekes
Copy link
Contributor Author

wdoekes commented Jun 8, 2022

Here's that numbering "oneliner" in perl:

#!/usr/bin/env perl
sub pe{print "// entity ".$e++."\n";$b=0;};
sub pb{print "// brush ".$b++."\n"};
$d=0;
for(<>){
/^{/ && $d eq 0 && pe;
/^{/ && $d eq 1 && pb;
/^{/ && $d++;
/^}/ && $d--;
print;
}

Removing numbers with sed would've been easy. I did not dare try sed to add them though.

$ head grufcasa.map 

{
"classname" "worldspawn"
{
( -112 120 32 ) ( -112 -16 32 ) ( -112 -16 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -80 120 32 ) ( -112 120 32 ) ( -112 120 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -80 -7 32 ) ( -80 129 32 ) ( -80 129 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -112 -7 32 ) ( -80 -7 32 ) ( -80 -7 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -104 -16 64 ) ( -104 120 64 ) ( -72 120 64 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -80 120 16 ) ( -112 120 16 ) ( -112 -16 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0

If you want to see the numbers, just add them when you need them:

$ ./numbering.pl < grufcasa.map | head

// entity 0
{
"classname" "worldspawn"
// brush 0
{
( -112 120 32 ) ( -112 -16 32 ) ( -112 -16 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -80 120 32 ) ( -112 120 32 ) ( -112 120 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -80 -7 32 ) ( -80 129 32 ) ( -80 129 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0
( -112 -7 32 ) ( -80 -7 32 ) ( -80 -7 16 ) grufcasa/muur 0 0 0 0.5 0.5 0 0 0

@illwieckz
Copy link
Contributor

In such patch the option is disabled by default, meaning the tool will write those debug comments by default, like in 1999.

Main purpose is apparently reducing diff noise, while using VCS for map files. Diff has no much value without visual representation anyhow, this would require a tool to extract removed/added/altered primitives.

Diff isn't about visual representation otherwise there would not been much value about diff indeed. Diff is meant to be produced and read by tools throughout the workflow to solve various issues like merging, rebasing, cherry-picking, or reverting, for example to merge edits done by multiple people on the same tree while working on a decentralized ways. This is meant to enable collaborative work.

90% of the time the user don't have to look at the diff at all, the tools process the changes and deal with the diffs for the user and automatically solves the various problems

With git rebase you can reorder commits or duplicate a commit or a set of commit from one branch to another. If one commit maps with debug comments the unmodified data gets rewritten and will conflict with rewritten unmodified data from other commits being moved before or after another one.

With git merge you can merge two sets of changes that were done by two people on the same map. If Git can't handle the merge itself and you need to do a three-way merge you can use meld that can fix 80% of merge conflicts just by clicking a button. And if it still doesn't work then you can resort at looking at the visual representation of the diff but then you don't need to know what it does, you just have a column on the left saying what author1 has added and removed, a column on the right saying what author2 has added and removed, and you click buttons to delete removed things in the center and to move added things from the right and from the left in the center. This is very high level and you even don't have to know what the content of the diff means. Of course if people modified the same brushes, then you have an harder problem, but for 90% of the change, the tools solve the problems for you. If one commits maps with debug comments one cannot do that, all of the edits may become merge conflicts.

With git cherry-pick you can apply out-of-branch patches. If maps are not numberized, only the actual change is applied, otherwise dozens of thousands of lines may be rewritten.

With git revert you can actually cancel a commit, only rewriting what you want to cancel, not dozens of thousands of lines.

Let's look a this simple commit, it deletes 3 brushes, edits 3 others and adds one:

When maps are denumberized, the diff is 52 lines long: https://pastebin.com/VePdj1Sn

When maps are numberized, the diff is 2938 lines long, for the exact same change: https://pastebin.com/Xu3NhV9E

Those extra 2886 lines (+5550%) contributes in nothing to the data actually being stored.

But the diff is no only 2938 lines long, it spans over 7000 lines. Editing anything else within those 7000 lines and you'll likely get a merge conflict, requiring a manual intervention, something a project should avoid at all costs.

Let's imagine that in the meantime someone else deleted a brush that was stored on the top of the file. Even the lines the quoted patches deleted were already rewritten by the contributions from the other one, with same lines not containing the same content because brushes numbers following added and removed brushes change on every edit.

Let's imagine that in the meantime someone else deleted or added a brush at the bottom of the file. The above patch would not be mergeable because the lines were rewritten.

Let's imagine that in a year one wants to revert that patch? With numberized maps, between the patch and the revert dozens of thousands of lines would have been rewritten multiple dozens of times, tools would have very hard time to do a fully automated revert. That would be a nightmare.

VCS like Git are not only storage systems, they're tools to enable collaborative work (some would say “interacting in map making with others”), to provide builtin mechanisms to solve problems like merging, rebasing, cherry-picking and reverting.

Storing something in Git isn't just like clicking a “Save” button in a document editor, it's about crafting changes for collaborative work and prepare for code exchange and conflict solving.

On that purpose, people working in a team are encouraged to do multiple small commits to make easier to revert/merge/cherry-pick/rebase them by isolating potential conflicts to small chunks. Storing numberized maps in repositories would just make every commit modifying thousands and thousands of lines, even when deleting a single brush.

Some editors (like DarkRadiant) are even starting to integrate Git in Radiant itself.

Outside of the convenience of having good diffs when working among a team of game developers, there is also a problem with file space. Git does not store patches but stores whole files for every commit. Fortunately it also deduplicates some chunks, but if modifying a brush rewrites every following brush comments for thousand and thousand lines, this gets in the way of Git and every workflow based on it or requiring it.

Let's look at some maps:

Map(s) Numberized size Denumberized size Number comments size
Official Unvanquished maps 105.94MB (105942991) 103.93MB (103935599) 2MB (2007392), 1.9%
Thunder Unvanquished map 15.8MB (15791106) 16.1MB (16135600) 344KB (344494), 2.2%
Officials Xonotic maps 75MB (75017019) 73.5MB (73516585) 1.5MB (1500434), 2%
Bromine Xonotic map 6.37MB (6375302) 6.26MB (6260345) 115KB (114957), 1.8%

Along breaking collaborative workflows, it just adds 2% of the file size on every commit, file size that contributes in nothing to the actual data.

With that example of the Thunder map, it's possible that every 3 map commits, 1MB would be wasted for debug metadata that can be reconstructed at any time. And well, only wasting 1MB every three commits would be lucky because I don't know if Git can deduplicate so small chunks. Here we're talking about modifying one line every 8 lines. Plus it would break Git mechanisms as said before.

Deleting a single brush and rewriting thousands of lines and storing the change by wasting 350KB and breaking version control solving tools is a very inefficient workflow.

The need for writing denumberized maps is faced multiple time a day, every time a commit is done, while the need for writing numberized maps is only needed in some corner case, not every day. That's why needing other tools is painful. Even git hooks are not really fine for this as it increases the complexity of the Mapping environment. At the same time getting maps numberized when needed is just a toggle away (if the user choses to disable numbering by default).

So, well, no one is saying debug number comments are useless. What is useless is to store them for every brush on every commit, especially since it breaks collaborative working with VCS tools. On the other hand those useful debug numbers can be regenerated at any time from every existing version of a map just by loading the map in NetRadiant and storing it again, or using a 10 line script like @wdoekes wrotes above, or using some other tools that already exists.

For example the esquirel tool that is part of the Urcheon toolbox can denumberize/renumberize maps this way:

# renumberize map (numberized format is still the default one)
esquirel map --input-map castle.map --output-map castle.map
# denumberize map
esquirel map --input-map castle.map --disable-numbering --output-map castle.map

This tool may look a bit over-powered compared to the simple script @wdoekes proposed but it's part of larger tool meant to do other things like mass-modification of file paths in maps or even edit already compiled bsp (like editing surfaceparms, texture paths, etc.)

In fact denumbering and renumbering maps was one of the very first feature of this tool, this feature was added back in 2015 (7 years ago), it was actually used for validating the denumberized map based workflow over years, The commit for the feature was then added to NetRadiant in 2018 (4 years ago).

Game development teams are in big need for denumberized maps as it is required to make collaboration possible. The Xonotic game map repository stores denumberized maps in repository since more than a decade (example, this denumbering commit is 12 years old). Adding the option to fullfill that need to upstream NetRadiant was long awaited, and having it means contributors don't need any other tool.

Unvanquished also uses denumberized maps in their repositories. Version control and denumberized maps are standard needs for any team project. As long as one is not alone, such one can avoid the requirement for that long. If the tool doesn't produce something compatible with collaborative workflow, users would have to write workarounds in third party tools and will complain about the tools getting in their way.

This comes with many other features that are tailored for collaborative working (multiple people working on the same data repository) and for mass editing of maps (one people working on many data repositories) and driven build of data: denumberized map, pakpath, pk3dir, out-of-tree compilation… Without such features editors and tools get in the way of users.

Anytime one needs those entity/brush comments, they're just a toggle away. Anyone can regenerate them if needed and when needed from every map revision and without any other tool than NetRadiant.

Note that on the same purpose of improving the workflow and specially making NetRadiant more usable for teams the opportunity to add patches to make map writing producing less diff noises may be evaluated, like -90 being 270, -0 being 0, 0.00000000 being 0, etc. Because as stated before diff isn't about visual representation, it's about enabling tools making collaborative work possible.

So, those comments are debug metadata, they can be reconstructed at anytime from any version of a map, one can just load the map in Radiant and save it again to recover the debug metadata. There is nothing being lost when committing denumberized maps in a repository. And it's an option for those who needs it.

Committer also likely never had actually mapped, otherwise he would know, how numbering may be useful for debugging issues by checking textual form of primitives.

Committer is known to be a maintainer of upstream NetRadiant, teaming with Xonotic to improve team-based map editing workflow. He is also project head of the Unvanquished game project, responsible for designing and improving data management and contribution workflows including maps. In such context he is known for editing and maintaining a dozen of maps that are actually released. He is also known for porting, editing and maintaining other dozens of maps (see Interstellar Oasis project), so even if he is not the original author of those maps, he has thousands of hours of mapping of experience over multiple years on more than 30 maps and he then not only knows very well the needs of a collaborative project, but he also knows well the needs for being able to edit more than one file a time, accross time. He is author of the Urcheon software suite which is tailored for mass-building of assets (think about CMake for maps, textures, models, etc.). He is also known to do code exchange among various Radiant editors (GtkRadiant, NetRadiant, DarkRadiant, AARadiant…) tailoring patches like this one on purpose. Also, he isn't new to the scene, one can find his name on that page from 2007 about a student project about generating maps for Tremulous (15 years ago). One may notice Unvanquished is a continuation of Tremulous… Maybe the committer also reads this thread? who knows? He probably doesn't know everything (far from that) but that would be nice if he can share some knowledge or experience… 😉️

@Garux
Copy link
Owner

Garux commented Aug 13, 2022

Collaborative mapping sounds loud, i try to evaluate practical sense. As in git may be just wrong tool for this.
Apparently it serves very well for pinching two brushes in an accomplished map and having history of this.
What if one collaborator create one func_group out of map brushes and other the other, will rebase work in a click?
Does rebase check if brushes conflict in 3D space?
Or lets imagine that real mapping has started and you create one room (with all those mapper's trinkets) and i create the other. Will git handle such collaboration?
As in, will it at least resolve entity targetnames conflicts?

@illwieckz
Copy link
Contributor

Nothing is magic in this world, there are problems that are solved by having better diffs, other problems that are solved with workflow ans methodology, sometime such working methods being made possible by having better diffs.

One can notice similar problems on other topics like software development: when trying to merge two patches from two people modifying the same code in a different way, for example two patches changing the signature of the same function for different purposes. One may have to rewrite from scratch the things to take care of the two needs. But making sure everyone pulls before modifying something, communication about each others' intentions, merging to the same upstream frequently, etc. helps to reduce the amount of unsolvable use case. This isn't specific to mapping. I doubt there is a single topic where all merges can be done blindly, though people don't throw everything away because of that and people appreciate being able to solve most of the other problems. I have some good example in my mind of some piece of software being very hard to merge and requiring to rewrite by hand many of the early commits because those commits were never thought to be be merged elsewhere and Git looked to have just been used as a storage system (the later ones are easier to deal with, thankfully). But despite this I don't throw away Git neither other ways to write code and I don't give up at crafting better commits in every other software project I contribute to. And I don't throw away the idea of merging to begin with.

Those problems are not specific to mapping at all, probably all kind of contribution-based works face their own variant of those problems. For example I can feel the pain of people needing to merge two SVG edits with a text-based format that is not purposed to be diffable the line way. The same would happen with video or audio edition lists, etc. The same happens with every format where editing software usually adds metadata on save like writing in the file the software editing name, the date of the saving, and other random things that do not contribute to the actual data, or even write the exact same data in a different order every time. People can start describing and applying method of working targetting both software and humans to makes things easy.

Speaking of the specific map merging topic, alongside the help of denumberizing maps, some merge problems may also be solved by slightly changing some ways the editor write the map without breaking the format, for example the fact that every added brush is added at the end of the brush list is calling for trouble when two people add brushes, even if, thanks to the nature of the map itself, one can usually solve the merge problem by both adding the new things. One may want to implement another algorithm than “always append new things at the end” to make things easier on that front. That's not for today though.

The ability to write denumberized maps is a strong need expressed for more than a decade by established projects and active people who had time and opportunities to actually discover the need and evaluate the costs and the benefits of some options to fulfill the need. Doing this does not magically provides solutions for every problem but it avoids turning the most basic things into complex problems and it can be used as a basis for method of working that helps reduce the impact of other problems. On the contrary not doing this magically turns everything into a nightmare, like a single brush removal leading to thousands of lines being rewritten.

Unfortunately Radiant makes very strong assumptions that people work alone, do not contribute to someone else project, do not receive contributions from someone else, do not use version control, only work on one map (some other Radiant may even assume people only work on one game), only have one project per game, only need one folder, do not need to have a source data folder that is not built engine folder (and has administrator writing permission to system folders), do not work with library of assets, build in source repository, use the same formats in source and in released game, etc. But step by step we can fix those issues, most of the time without breaking any backward compatibility.

Nothing is perfect then, nothing is magical, but one cannot put “save the world first” as a requirement for “do a first step in the right direction”. The ability to write denumberized maps is a first step that opens many ways and unlock many conveniences. I would be very enthusiastic in working together to make those things better for team-based projects. 👍️

@Garux
Copy link
Owner

Garux commented Aug 14, 2022

Wrt code, better VCS would be intellisense-driven one, which is not trivial to implement and work with.
While map format is simple enough to have specialized tracking of changes and conflicts checking.
Imo for maps git is only fine to keep history.

@illwieckz
Copy link
Contributor

While map format is simple enough to have specialized tracking of changes and conflicts checking.

Only If denumberized.

Imo for maps git is only fine to keep history.

It's never good idea to assume Git is only fine to keep history, whatever the topic, that's the best way to miss opportunities to do useful things, especially when a format is convenient enough to use lines and blocks. And yes, on that side (denumberized) maps are even simpler than code as maps are just lists of things and most of the time even the order doesn't count (ordering is mostly about keeping things in the right block they belong to), so Git is perfect for maps (if denumberized).

@wdoekes
Copy link
Contributor Author

wdoekes commented Aug 14, 2022

@Garux: I'm sure you agree that even if the patch is not useful to you, it could still be included if it benefits others?

[...] extra dirt in the code.

Of course you don't want dirt in the code. But maybe you can be persuaded if the patch was prettier? I myself would rather see an Options struct being passed, than a boolean for every Write option. That might clean things up sufficiently for you to accept it?

@illwieckz
Copy link
Contributor

If someone comes with a nicer implementation then it's nicer (hehe), but I just kindly remind that it's good if if we don't break that much the ability to exchange commits among trees, so it's better if nicer things are not that intrusive. 😉️

@Garux
Copy link
Owner

Garux commented Aug 14, 2022

For specialized map VCS comments aren't a problem, would be ignored natively.
In the case of collaborative coding, i'd say no word about adding this option, even if not liking it much. But here i prefer to defer this, furthermore there were zero requests from mappers.
Googled sed solution for the sake of interest: https://stackoverflow.com/a/26541575/16652886 sed -e 's|//.*||g'

@wdoekes
Copy link
Contributor Author

wdoekes commented Aug 27, 2022

I'm sorry you feel that way.

As for the sed-solution, the correct syntax would be:

sed -e '/^\/\//d'
$ for x in //comment 1 2 //comment 3; do echo "$x"; done | sed -e '/^\/\//d'
1
2
3

But as mentioned ad nauseam already: using perl for the inverse is better.

#!/usr/bin/env perl
sub pe{print "// entity ".$e++."\n";$b=0;};
sub pb{print "// brush ".$b++."\n"};
$d=0;
for(<>){
/^{/ && $d eq 0 && pe;
/^{/ && $d eq 1 && pb;
/^{/ && $d++;
/^}/ && $d--;
print;
}

@Garux
Copy link
Owner

Garux commented Aug 27, 2022

@wdoekes your PR is still pending, it breaks maps saving in the form it is.

@wdoekes
Copy link
Contributor Author

wdoekes commented Aug 27, 2022

You're talking about #108 I assume?

Oh, I had not noticed that it did not work for new files. That's awkward. Apparently I have only tested it on existing maps...

Let me look into that.

[pid 843803] newfstatat(AT_FDCWD, "/home/walter/.q3a/baseq3/maps/bla.map", 0x7ffe324ac3d0, 0) = -1 ENOENT (No such file or directory)
[pid 843803] newfstatat(AT_FDCWD, "/home/walter/.q3a/baseq3/maps/bla.map", 0x7ffe324ac310, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
[pid 843803] access("/home/walter/.q3a/baseq3/maps/bla.map", W_OK) = -1 ENOENT (No such file or directory)
[pid 843803] write(9, "map path is not writeable: \"/hom"..., 67) = 67

vs

[pid 843803] newfstatat(AT_FDCWD, "/home/walter/.q3a/baseq3/maps/grufcasa.map", {st_mode=S_IFREG|0664, st_size=1952800, ...}, 0) = 0
[pid 843803] newfstatat(AT_FDCWD, "/home/walter/.q3a/baseq3/maps/grufcasa.map", {st_mode=S_IFLNK|0777, st_size=51, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 843803] openat(AT_FDCWD, "/home/walter/.q3a/baseq3/maps/grufcasa.map", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 45
[pid 843803] write(9, "Open file /home/walter/.q3a/base"..., 73) = 73

Looks like I incorrectly removed the !file_exists( fullpath.c_str() ) there. Fixing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants