Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Options and descriptions in "Merge Twin" filter/documentation #923

Closed
2 tasks done
StopkaKris opened this issue Apr 19, 2024 · 7 comments
Closed
2 tasks done
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@StopkaKris
Copy link

Is there an existing issue for this?

  • I have searched the existing issues, known issues in release notes, and documentation.

Brief Description of the Documentation Issue or Improvement

There are two questions/discrepancies I have about this filter and documentation.

  1. "Use Seed for Random Generation". Does this refer to the IDs assigned to the parent grains and whether they should be randomized in some specific manner? The default "Seed Value" is 5489 but it is unclear how this number affects the numbering of parent IDs. Is there any reason this should be different from the "Randomize Feature IDs" option in Segment Features (Misorientation) or Segment Features (Scalar) filters? It seem as if that would suffice for this filter.
  2. I am confused by the option to "Use Non-Contiguous Neighbors" in this filter. Does this mean that the filter should not only consider cells that share a face, but also cells that share an edge or vertex? When searching the DREAM.3D NX documentation, only this filter (9.30) and Create Geometry (7.29) return results for the phrase "Non-Contiguous". When the checkbox is selected, the "Required Feature Data" array "Contiguous Neighbor List" changes to "Non-Contiguous Neighbor List", but it is unclear how this is determined nor is there a filter to determine this list. I see that previous versions of DREAM.3D also have this checkbox in Merge Twins so perhaps this was just a copy-and-paste decision over to the NX version. Either way, this should be clarified.

Best regards,
Krzysztof Stopka

Version

DREAM3D NX (version 7)

What section of the documentation did you encounter the discrepancy in? [Further details may be required during triage process]

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@StopkaKris StopkaKris added the documentation Improvements or additions to documentation label Apr 19, 2024
@nyoungbq nyoungbq self-assigned this Apr 19, 2024
@nyoungbq
Copy link
Contributor

Hello Krzysztof,

I will answer your questions to the best of my ability. I do not have a background in material science so I will have to direct you to @imikejackson for specific implications on the output as a result of the randomness in the algorithm.

However, I will do my best to explain the application of the seed in the algorithm itself. The value supplied here will be fed to our random number generator that generates a sequence of numbers. This sequence determines the order in which specific elements of the NeighborList are accessed. Each accessed element is then fed to a grouping algorithm, so the order likely has an impact in the end result and its specific sequence is not well defined thus far in science. My guess is the algorithm utilizes randomness to provide a less biased prediction of the resulting object since the factors involved are either not fully understood or too complex to model.

The default value of 5489 is just the defualt randomness seed supplied by the integrated C++ randomness library. The seed option is exposed so you can generate consistent results, every run where that filter has the same seed and inputs will generate the same output. If you do not care about reproducibility, then you can uncheck Use Seed For Random Generation and we seed the algorithm with a different number every run, (The seed is stored in an array in the data structure regardless of whether it is on or off, so you can reproduce it, but you will need python because the seed is a larger number than the GUI can accept for Seed Value). The actual Seed Value ** is arbitrary and is just there for reproducibility. It should not affect the IDs of the geometry**

As for the Use Non-Contiguous Neighbors, it is well beyond my wheelhouse, all I know is that it affects the way the grouping happens the "Non-Contiguous NeighborList" value gets fed to the grouping algorithm. I will try to dig up a paper for you.

Also, I will update the documentation to be clearer about the impact of the seed value.

I will message Mike to pop in here and give you a better rundown from a materials standpoint.

Nathan

@StopkaKris
Copy link
Author

Hi Nathan,

Thank you for the detailed explanation. To follow up on the random seed, I was examining how the Segment Features (Misorientation) filters works. If the Randomize Feature IDs box is unchecked, then the feature IDs start at 1 on one side of the model and increase sequentially based on the X, Y and Z directions. If Randomize Feature IDs is checked, then the feature IDs generated are in fact, random. However, for multiple runs of a pipeline such as "(08) Small IN100 Full Reconstruction", it produces identical feature IDs. So my question is (and perhaps this is something @imikejackson will have to chime in on): is the purpose of Randomize Feature IDs the ability to assign random feature IDs every time the filter is executed (which is not what it is doing now), or is it the ability to just ensure feature IDs are not sequentially numbered based on grain position in the model?

That leads to my original thoughts on the Use Seed For Random Generation option: is there any reason Merge Twins should not have identical functionality to filters such as Segment Features (Misorientation) when "randomly" assigning feature IDs?

It is also worthwhile to mention that the synthetic filter Pack Primary Phases (in versions before DREAM.3D NX) was the only other filter I recall that had the same aspect of "randomness" when assigning feature IDs, although I realize this was an aspect of generating a unique, novel microstructure every time it was executed (definitely for the "Generate Features" option but probably not for the "Already Have Features" option).

Best regards,
Krzysztof Stopka

@nyoungbq
Copy link
Contributor

Hey Krzysztof,

I looked over the Segment Features (Misorientation) filter, and I think I have an answer for you.

**The actual use of randomization of feature ids is just for visualization because without it a gradient appears when coloring that makes it hard to visually distinguish between individual features (especially in larger datasets). **The ids themselves are solely used for indexing and labeling features the object they do not affect outcomes of the algorithms itself. Thus, they are consistently randomized from run to run via a hardcoded static seed (the default seed: 5489). This is done to just make it simple to analyze changes in the algorithm for the user, so they can visually see the changes in the underlying data from run to run at a given id. The answer to your question about the intention of randomizing the ids is the latter, "the ability to just ensure feature IDs are not sequentially numbered based on grain position in the model".

To answer your other question. Under the covers of both Merge Twins and Segment Features (Misorientation), an identical seed (the default seed: 5489) is used for the randomization of the feature ids. Thus, their randomization should be roughly equivalent, however, looking at the Merge Twins code I can see that the random generator usage is sloppy, and it is quite possible that it is not properly being seeded. If Use Seed for Random Generation is checked the feature ids should be consistent between runs provided the same Seed Value **is supplied. If this is not happening, I will look into it further. **

To answer your original question as clearly as I can: _Is there any reason this should be different from the "Randomize Feature IDs" option in Segment Features (Misorientation) or Segment Features (Scalar) filters? _

The Seed Value in Merge Twins affects the underlying algorithm outcomes, not the feature ids randomization. The actual Seed Value should not have any specific seed that is generally better than others, but some seed values may produce results closer to the ground truth for a specific dataset. The randomness aims to simulate a seemingly complex chaotic system that will likely not be consistent across various different datasets. Thus, experimentation with the seed value is recommended, but not officially necessary as all seeds are just as likely to produce the ground truth from a probability perspective on a grand scale. The Seed Value exists so you can recreate the sequence of execution for the algorithm, if you are getting different results with the same seed from run to run something is wrong. In Segment Features (Misorientation) no randomness is involved in the actual algorithm thus the seed is not exposed as a parameter.

Note: there is no way to disable randomization of ids in the Merge Twins Filter, so they will always be shuffled however the option to disable it in Segment Features (Misorientation) was exposed for one reason or another. Therein lies the confusion I believe

Hopefully that helps clarify, and I will look into cleaning up the Merge Twins code.

Nathan

@StopkaKris
Copy link
Author

Hi Nathan,

That is a great explanation. For the record, the Use Seed for Random Generation did produce consistent feature IDs between consecutive runs with the same Seed Value. Thank you again for your time.

Best regards,
Krzysztof Stopka

@imikejackson
Copy link
Contributor

@StopkaKris Nathan has done a great job of explaining everything. I hope your questions fully answered at this point. We can update the docs a bit with some screen shots of the differences in the randomness with some more details from this issue thread in the docs to better explain.

I am sure when we ported a few of those filters, they are pretty much a "copy & paste" with a whole lot of clean up involved. (APIs between the 2 versions are very different).

To reiterate from @nyoungbq explanation:

Looking deeper into the "MergeTwins" algorithm, the use of a random number to determine the featureParentIds in the getSeed() function probably isn't needed. We can probably just start at the first featureId and work through the list of features. If the order did matter to the algorithm then I would really have to call into question the actual algorithm being used. It was written a very long time ago, by "I'm not sure who". If you check the box to "Randomize the Parent Ids" then the final parent ids values are just getting shuffled exactly like we do in the "Segment Features Misorientation".

I wouldn't know what to tell anyone about getting a "non-contiguous neighbor list". I don't know what filter would create those. My guess is that this became some sort of odd off-shoot from the microtexture codes.

Looking deeper into the version 6.5/6.6 DREAM3D codes, MergeTwins inherits from "GroupFeatures", but then looking at what else inherits from GroupFeatures, I see it is "GroupMicroTextureRegions" and "MergeColonies". So my guess is that "MergeTwins" probably needs to inherit from "SegmentFeatures" instead so that it does not inherit from the "GroupFeatures" which got spinkled with micro-texture related codes and no one noticed.

Well, this was an interesting run down memory lane. We will get the Merge Twins code cleaned up. Have to figure out how to test the codes once we do clean them up though.

@StopkaKris
Copy link
Author

StopkaKris commented Apr 23, 2024

Sounds great, Mike, thanks again!

Best regards,
Krzysztof Stopka

@nyoungbq
Copy link
Contributor

The code was cleaned up in this PR:

#955

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants