Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation, diff table headers, implement #324, and refactor logic into function #325

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Sternbach-Software
Copy link

@Sternbach-Software Sternbach-Software commented Nov 1, 2022

This change is Reviewable

@Sternbach-Software
Copy link
Author

Fixes #324

@Sternbach-Software
Copy link
Author

The diff table headers add column labels, showing which file is associated with which column, and assigning the title "Similarity score" to the middle column.

simple_node.go Outdated
//if both Ancestry sources, only check if their _APID is the same
if node.Tag().String() == "Source" && tag.String() == "Source" {
found := false
ancestry:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be below the loop? Otherwise a match would cause an infinite loop. This is so fundamental it will need some unit tests for this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? This equality check is only true for sources, so there is no need to run a loop if it is not a source. Hand-checked this on my gedcom and it works great. Looking back, not sure why I didn't just return true in the loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to pass a command line param to SimpleNode.Equals() to toggle on and off the ancestry source matching. How should I pass it to Equals(), other than flags.Lookup("ancestry-source-matching"), which indexes a map every time equals is called... (presumably bad performance)?

node_diff.go Outdated Show resolved Hide resolved
html/diff_page.go Outdated Show resolved Hide resolved
@elliotchance
Copy link
Owner

@shmueldabomb441 just waiting for your code update to rereview

@Sternbach-Software
Copy link
Author

Ok.

@Sternbach-Software
Copy link
Author

I realized that in my GoLand, settings, it runs go fmt every build. Then isn't this the right format?

Sternbach-Software and others added 2 commits November 5, 2022 22:36
…t check

2. Make table responsive by setting viewport to device width and adding table class `table-responsive` (elliotchance#327)
3. Add GEDCOM paths to DiffPage
4. Add how many are only in the right, left, and both GEDCOMs (elliotchance#327)
5. Fix error in node_diff.go documentation
6. Optimize Ancestry source matching and test
Copy link
Author

@Sternbach-Software Sternbach-Software left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d2dc486

@Sternbach-Software
Copy link
Author

@elliotchance do I need to do anything like create another pull request for you to approve d2dc486 ?

@elliotchance
Copy link
Owner

I think there's come confusion here between the comments, so I'll clarify:

  1. There's no need to add a new CLI argument. The matching logic is fine to add to SimpleNode.Equals because that's what it's for.
  2. You need to add a unit test for SimpleNode.Equals to show that it works. I know you tested it manually, I'm not saying it doesn't work but we write unit tests to clarify behavior and prevent regressions in the future. Add a new test case here: https://github.com/elliotchance/gedcom/blob/master/simple_node_test.go#L29

@Sternbach-Software
Copy link
Author

Sternbach-Software commented Nov 7, 2022

  1. I personally wanted to add a flag (even just for myself), because it is potentially less efficient to check for Ancestry sources. If I am not using an Ancestry tree anyway, there is no need to run those double loops (usually only one or two elements are looped over, but still) and all those equality checks and String() calls.

  2. I added automated unit tests to simple_node_test.go in my most recent commit mentioned above. Link to test (it is the last change in the commit, at the bottom of the list on GitHub - sorry I forgot to mention that in the commit message).

I made it a separate test because I couldn't figure out how to integrate it in the already existing function (I tried for a very long time). This is my first time ever being exposed to Go (I think I did pretty well though), and so I did not know how to resolve the errors I was getting. I also didn't 100% understand the logic of the function (I got most of it), which contributed to making it hard to add another case in that function for the Ancestry case. I can add a call in SimpleNode_equals_test() to assert.True(AncestryEquals_Test()) or copy-paste the code from the function if you want, but it would look out of place with the rest of the function.

I agree it makes most sense to have it in the test of the equals method, but because of the above challenges, I would just copy paste it in. Though, I can hear an argument why it may be slightly better to have a separate function for this edge case (I am quite familiar with regression testing, TDD, etc.).

@Sternbach-Software
Copy link
Author

I would like to add a flag to ignore sources altogether in SimpleNode.Equals(), because in my import from ancestry to geni, every single person with a source in Ancestry will inevitably be marked as different from the one in geni, but all I care about are the actual facts (e.g. birth/death date, birth/death location, etc.), because my primary goal on Geni is to connect more people with the World Family Tree, and you rarely need sources like census records for that. So, I don't want to see differring sources, only core GEDCOM data, like BIRT, DEAT, etc.

If I add this, will you accept a PR?

@@ -64,6 +64,51 @@ func TestSimpleNode_Equals(t *testing.T) {
}
}

func TestAncestryNode_Equals(t *testing.T) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This test name implies that there is a type called AncestryNode with a method called Equals. So, its easy to miss when jumping between production code and tests. It's best that you move this test to a subtest to the function above:

func TestSimpleNode_Equals(t *testing.T) {
  // ...

  t.Run("AncestryNode", func(t *testing.T) {
	//test when source ids are the same
	original := GetAncestryIndividual("@S291470533@", "@S291470520@", "@S291470520@", "@S291470520@", "@S291470520@")
	assert.True(t, gedcom.DeepEqualNodes(gedcom.newDocumentFromString(original).Nodes(), gedcom.newDocumentFromString(original).Nodes()))

	//test when source ids are different, but _apid stays the same
	left := gedcom.newDocumentFromString(GetAncestryIndividual("@S222222222@", "@S444444444@", "@S666666666@", "@S888888888@", "@S111111111@"))
	right := gedcom.newDocumentFromString(GetAncestryIndividual("@S333333333@", "@S555555555@", "@S777777777@", "@S999999999@", "@S000000000@"))
	assert.True(t, gedcom.DeepEqualNodes(left.Nodes(), right.Nodes()))
  })
}

@@ -88,6 +89,22 @@ func (node *SimpleNode) Equals(node2 Node) bool {
return false
}

useAncestrySourceMatching := flag.Lookup("ancestry-source-matching").Value.String() //indexes a map CommandLine.formal
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot lookup flags like this. While this might work when using it through the CLI, this can also be used a library. It's also just a leaky abstraction to write it this way. You have three choices:

  1. Make the extra comparison logic permanent. I'm fine with that - but there are still caveats to consider below.
  2. If you want/need to control the logic you will need to provide that through another Equals method with options, for example:
type SimpleNodeEqualsOptions struct {
    // UseAncestrySourceMatching ...
    UseAncestrySourceMatching bool
}

func (node *SimpleNode) Equals(node2 Node) bool {
    return node.EqualsOptions(node2, SimpleNodeEqualsOptions{})
}

func (node *SimpleNode) EqualsOptions(node2 Node, o SimpleNodeEqualsOptions) bool {
    // ...

    if o.UseAncestrySourceMatching && node.Tag().String() == "Source" && tag.String() == "Source" {
    }
}
  1. This is probably the best option for you, use a FilterFunction (
    type FilterFunction func(node Node) (newNode Node, traverseChildren bool)
    ) with a CLI argument.

SimpleNode.Equals only will only apply to nodes that do not have a custom type with an Equals method of their own. This is an important consideration because this is generic logic that you may want to apply to all nodes on top of any logic.

If you want this to apply globally (and be controlled from the CLI) you might find it easier to do preprocessing first instead. That is, based on a CLI flag, you preprocess the files so when the comparison happens the logic has already been normalized.

We do in this already using FilterFunctions. For example the -only-official CLI flag (https://github.com/elliotchance/gedcom/blob/master/filter_flags.go#L57-L58) triggers the files to remove non-official GEDCOM tags so they don't exist for the comparison:

func OfficialTagFilter() FilterFunction {
. That file also contains some other filters examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants