Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run SERGIO with CRISPRa/i gene perturbations. #13

Open
djinnome opened this issue Feb 7, 2024 · 7 comments
Open

Run SERGIO with CRISPRa/i gene perturbations. #13

djinnome opened this issue Feb 7, 2024 · 7 comments
Assignees

Comments

@djinnome
Copy link
Member

djinnome commented Feb 7, 2024

So far, we can simulate the E. coli regulatory network using SERGIO in a "wild-type" scenario.

To simulate the E. coli regulatory network under a CRISPRa/i perturbation, we need to disconnect the target TF from its regulators, and either increase the production rate (CRISPRa) or decrease the production rate (CRISPRi).

Disconnecting a TF from its regulators essentially makes the TF a master regulator.

Since we have a separate input file for master regulators and a separate intput file for regulators that are targets of other regulators, the requested action is to move the TF from the input_file_targets to input_file_regs.

The file format description for these two files is described here:

SERGIO/SERGIO/sergio.py

Lines 132 to 136 in 501c569

# 2- input_file_taregts: a csv file, one row per targets. Columns: Target Idx, #regulators,
# regIdx1,...,regIdx(#regs), K1,...,K(#regs), coop_state1,...,
# coop_state(#regs)
# 3- input_file_regs: a csv file, one row per master regulators. Columns: Master regulator Idx,
# production_rate1,...,productions_rate(#bins)

Since # 4- input_file_taregts should not contain any line for master regulators
we need to cut the line that refers to the target TF. But when we paste it into input_file_regs, we just need to proved the index of the target TF, and a production rate, one for each "bin". The production rate should be small for CRISPRi and large for CRISPRa compared to the production rate of other TFs.

@djinnome
Copy link
Member Author

djinnome commented Mar 5, 2024

  1. With no noise
  2. Technical noise:
    """""""""""""""""""""""""""""""""""""""
    "" This part is to add technical noise
    """""""""""""""""""""""""""""""""""""""
    def outlier_effect(self, scData, outlier_prob, mean, scale):
    """
    This function
    """
    out_indicator = np.random.binomial(n = 1, p = outlier_prob, size = self.nGenes_)
    outlierGenesIndx = np.where(out_indicator == 1)[0]
    numOutliers = len(outlierGenesIndx)
    #### generate outlier factors ####
    outFactors = np.random.lognormal(mean = mean, sigma = scale, size = numOutliers)
    ##################################
    scData = np.concatenate(scData, axis = 1)
    for i, gIndx in enumerate(outlierGenesIndx):
    scData[gIndx,:] = scData[gIndx,:] * outFactors[i]
    return np.split(scData, self.nBins_, axis = 1)
    def lib_size_effect(self, scData, mean, scale):
    """
    This functions adjusts the mRNA levels in each cell seperately to mimic
    the library size effect. To adjust mRNA levels, cell-specific factors are sampled
    from a log-normal distribution with given mean and scale.
    scData: the simulated data representing mRNA levels (concentrations);
    np.array (#bins * #genes * #cells)
    mean: mean for log-normal distribution
    var: var for log-normal distribution
    returns libFactors ( np.array(nBin, nCell) )
    returns modified single cell data ( np.array(nBin, nGene, nCell) )
    """
    #TODO make sure that having bins does not intefere with this implementation
    ret_data = []
    libFactors = np.random.lognormal(mean = mean, sigma = scale, size = (self.nBins_, self.nSC_))
    for binExprMatrix, binFactors in zip(scData, libFactors):
    normalizFactors = np.sum(binExprMatrix, axis = 0 )
    binFactors = np.true_divide(binFactors, normalizFactors)
    binFactors = binFactors.reshape(1, self.nSC_)
    binFactors = np.repeat(binFactors, self.nGenes_, axis = 0)
    ret_data.append(np.multiply(binExprMatrix, binFactors))
    return libFactors, np.array(ret_data)
    def dropout_indicator(self, scData, shape = 1, percentile = 65):
    """
    This is similar to Splat package
    Input:
    scData can be the output of simulator or any refined version of it
    (e.g. with technical noise)
    shape: the shape of the logistic function
    percentile: the mid-point of logistic functions is set to the given percentile
    of the input scData
    returns: np.array containing binary indactors showing dropouts
    """
    scData = np.array(scData)
    scData_log = np.log(np.add(scData,1))
    log_mid_point = np.percentile(scData_log, percentile)
    prob_ber = np.true_divide (1, 1 + np.exp( -1*shape * (scData_log - log_mid_point) ))
    binary_ind = np.random.binomial( n = 1, p = prob_ber)
    return binary_ind
    def convert_to_UMIcounts (self, scData):
    """
    Input: scData can be the output of simulator or any refined version of it
    (e.g. with technical noise)
    """
    return np.random.poisson (scData)
  3. Generate t-SNE plots of the transcriptomics generated from the pertubation, compare with wild-type

@djinnome
Copy link
Member Author

djinnome commented Mar 5, 2024

CRISPRi of fadR, tyrR, and lacI.

@djinnome
Copy link
Member Author

djinnome commented Mar 5, 2024

convert UMIcounts to anndata format: https://anndata.readthedocs.io/en/latest/

from anndata -> t-SNE

@djinnome
Copy link
Member Author

djinnome commented Mar 5, 2024

The code to map gene indexes to gene names is here:

https://github.com/PNNL-CompBio/SERGIO/blob/MN_ConIndependence-/GRN_Analysis/ParseInteractions.ipynb

@djinnome
Copy link
Member Author

djinnome commented Mar 5, 2024

compare the same 100 genes in MicroSPLIT to the 100 genes in SERGIO.
In other words, filter the MicroSPLIT data to just the 100 genes in the SERGIO 100 gene DAG (including FadR) so that the SERGIO t-SNE can be compared to the MicroSPIT t-SNE

@djinnome
Copy link
Member Author

  • Add wild-type conditions
  • Drop-out only at 15%, 50% and 85%
  • UMI counts to ann-data
  • More cells (~1000)

@yhan0410
Copy link

The code to map gene indexes to gene names is here:

https://github.com/PNNL-CompBio/SERGIO/blob/MN_ConIndependence-/GRN_Analysis/ParseInteractions.ipynb

Genes 'rutA', 'rutB', 'rutE', 'rutD', 'rutC' are not detected in the MicroSPLIT data.
Gene 'gmr' is named 'pdeR' now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants