Skip to content

Commit 135d1c5

Browse files
committed
Improved documentation of match_instruments and generate_crosswalk_table functions
1 parent 4bf398d commit 135d1c5

File tree

4 files changed

+53
-10
lines changed

4 files changed

+53
-10
lines changed

R/generate_crosswalk_table.R

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,22 @@
2222

2323
#' Generate Crosswalk Table Function
2424
#'
25-
#' Generate a crosswalk table for a list of instruments, given the similarity matrix that came out of the match function.
26-
#' A crosswalk is a list of pairs of variables from different studies that can be harmonised.
25+
#' This function generates a crosswalk table using a list of instruments and a similarity matrix,
26+
#' produced by the \code{\link{match_instruments}} function.
2727
#'
28+
#' @description
29+
#' A crosswalk is a table that lists matched variables from different studies or instruments,
30+
#' enabling data harmonization across datasets.
31+
#'
32+
#' @details
33+
#' A crosswalk is a mapping between conceptually similar items (e.g., survey questions or variables)
34+
#' from different instruments. It is used to identify and align comparable variables across datasets
35+
#' that use different formats or wordings. This is especially useful in meta-analysis, data integration,
36+
#' and comparative research, where consistent constructs need to be analyzed across multiple sources.
37+
#'
38+
#' The similarity matrix passed to this function is usually obtained from \code{\link{match_instruments}}.
2839
#' @param instruments The original list of instruments, each containing a question. The sum of the number of questions in all instruments is the total number of questions which should equal both the width and height of the similarity matrix.
29-
#' @param similarity The cosine similarity matrix from Harmony
40+
#' @param similarity The cosine similarity matrix that is outputed from the \code{\link{match_instruments}} function.
3041
#' @param threshold The minimum threshold that we consider a match. This is applied to the absolute match value. So if a question pair has similarity 0.2 and threshold = 0.5, then that question pair will be excluded. Leave as None if you don't want to apply any thresholding.
3142
#' @param is_allow_within_instrument_matches Defaults to False. If this is set to True, we include crosswalk items that originate from the same instrument, which would otherwise be excluded by default.
3243
#' @param is_enforce_one_to_one Defaults to False. If this is set to True, we force all variables in the crosswalk table to be matched with exactly one other variable.
@@ -61,6 +72,7 @@
6172
#'
6273
#' @export
6374
#' @author Alex Nikic
75+
#' @author Omar Hassoun
6476

6577

6678
generate_crosswalk_table <- function(

R/match_instruments.R

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,19 @@
2727
#'
2828
#' @param instruments A list of instruments to be matched.
2929
#' @param topics A list of topics with which to tag the questions. Default is empty.
30-
#' @param is_negate A boolean value to toggle question negation. Default is TRUE.
30+
#' @param is_negate A boolean indicating whether to apply negation-based preprocessing. Default is TRUE.
31+
#'
32+
#' This option addresses a common limitation in large language model (LLM) embeddings, where antonyms (e.g., "happy" and "sad") may be treated as similar due to contextual overlap.
33+
#' When \code{is_negate = TRUE}, the function prepends negation terms such as "not" or "didn't" to the input sentences and evaluates whether this increases or decreases their cosine similarity.
34+
#' If the similarity increases after negation, the model interprets the sentences as antonyms and returns a negative similarity score.
35+
#'
36+
#' When \code{is_negate = FALSE}, negation is skipped and most similarity values returned will be positive.
37+
#'
38+
#' The Harmony API defaults to \code{is_negate = TRUE}, as some users prefer detecting antonymy through negative similarity values, while others may prefer only positive scores.'
39+
#'
3140
#' @param clustering_algorithm A string value to select the clustering algorithm to use. Must be one of: "affinity_propagation", "kmeans", "deterministic", "hdbscan". Default is "affinity_propagation".
3241
#'
33-
#' @return A list of matched instruments returned from the 'Harmony Data API'.
42+
#' @return A list containing the matched instruments retrieved from the Harmony Data API. The returned object includes attributes such as the similarity matrix, identified clusters, associated cluster topics, and other relevant metadata.
3443
#'
3544
#' @examples
3645
#' \donttest{

man/generate_crosswalk_table.Rd

Lines changed: 17 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/match_instruments.Rd

Lines changed: 10 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)