Skip to content

fomightez/bendit-binder

Repository files navigation

bendit-binder

badge

launch with settings for Nathan: badge

launch with settings for Nathan: badge

tl;dr:
Click any launch bendit badge on this page to run command line-based, standalone version of bend.it inside your browser.


Bend.it: software to predict DNA curvature from DNA sequences.

Bend.it software to predict DNA curvature from DNA sequences combined with the power of the Jupyter ecosystem served via MyBinder.org.

If you are just trying to analyze a few sequences, go to the bend.it Server.

This repository is for running a command line-based, standalone version of bend.it inside your browser in a Jupyter environment provided by MyBinder.org.
Additionally, having bend.it working inside the Jupyter environment with interactive Python adds some convenient features that are illustrated. A utility script for moving command line-based bendIt results into Python is demonstrated and the pipeline illustrates adding in more full-featured, modern and interactive plotting as discussed here.


Software

The bend.it software will be installed already in each active session launched from this repo. The bend.it software is available directly from the authors at the Bend.it® Standalone version page at the Bend.it site.

The software is described in this scientific article, and much of the theory behind the bend.it software is described in this scientific article.

The authors request users cite:

DNA analysis servers: plot.it., bend.it, model.it and IS.
Vlahovicek, K., Kaján and Pongor, S.
Nucleic Acids Res. 2003 31(13), pg. 3686-3687. PMID: 12824394

Clarifying Software Attribution: I, Wayne, am not involved in the bend.it software at all. Those listed above are the developers and distributors of bend.it. See their materials. I simply set up this repository to make the software useable on the command line reproducibly without installation headaches.

I, Wayne, did code a Python-based utility for use with the results from the standalone version of bendIt; it is available here and utilized in the notebooks in this repository to process the results and allow easily converting the results to other forms.

Usage

If you are just trying to analyze a few sequences, go to the the bend.it Server.

This repository is set up to allow running the command line, standalone version of bend.it software after pressing the launch bendit button above or below.

In the notebooks that can be launched, I have added some examples illustrating how to use the program and process the results easily with Python and convert to other forms. Once you have a sense of how it works, it will be easy to open a new session and upload your own sequences and run the cells to let the pipeline process them.

Details accompanying the Methods section of Knutson laboratory publication, Munoff et al., originally utilizing this pipeline

Click here to expand!

Here we provide an environment and pipeline stitched together with Jupyter and Python, and served via MyBinder.org, featuring the standalone program from Vlahovicek et al. (2003) at the core to calculate values for bendability and curvature of the sequences in a high-throughput manner. The implementation has been designed in a manner designed to minimize required human interaction while guiding processing of however many sequences (FASTA format) into associated data, additional metrics, plots, and reports of bendability and curvature all packaged into a single compressed archive. The settings used in the accompanying manuscript were developed in the course of initial efforts using the form available at the bend.it Server (http://pongor.itk.ppke.hu/dna/bend_it.html#/bendit_form). The preferred settings adopted are the default settings in the associated the Jupyter notebook and script. Specifically:

  • For both curvature and bendability, the window size setting is three.
  • 'Consensus' is used with the 'Scale' option, the equivalent of adjusting the curvature parameter on the bend.it Server form to 'Consensus scale (DNase I + nucleosome positioning data)'.
  • The 'B' settings is used for the complexity flag corresponding to the equivalent of selecting 'Bendability' from under the dropdown under 'Plot options' on the the bend.it Server form. The other choices from that drop-down on the form are 'G+C content' or 'Complexity'.

Accommodating short sequences in conjunction with the standalone version of bend.it software & assessment of the implementation:

  • The standalone version of bend.it software fails with a segmentation fault when provided with a short sequence (~ 75 bp) directly for some reason. The online bend.it Server doesn't exhibit this behavior and processes 75 bp sequences just fine. To work past this issue so that the standalone version of bend.it software could still be used for ease in scaling up processing of hundreds of sequences, the sequences to examine were repeated in multiples of 21 to exceed a size of 1041 bp in total length of the sequence input. This route to avoid the segmentation fault has been verified to give the same results as the bend.it server to within a ten-thousandth of a decimal with numerous sequences. In fact, I never saw more than a 0.0001 difference, with any difference being a very rare occurrence, and most importantly the tiny difference was only ever observed associated with the predicted curvature value for the last position.

  • In the course of limited testing, noted nearly 850 sequences (of length 75 bp) and associated files & reports all processed in about 14 minutes with the standard pipeline.

ADDITIONAL FEATURES BUILT IN:
  • Runs right in any browser with no software installation necessary thanks to MyBinder.org serving temporary active Jupyter session with an environment defined by configuration files included in this repository. (See the MyBinder site for more information about Binder/MyBinder.)

  • Constant upstream and downstream sequences flanking all the sequences to be processed can be specified. The provided sequences are thus considered variable 'cassettes' in the analysis. (The option exists though for setting the either 'constant' flanking sequence to an empty string.)

  • Allows for easy dragging and dropping of files from local computer into the session to allow for adding input sequences in FASTA format.

  • processes any number of sequences, even if provided in multiple files, as long as the sequences are provided in FASTA format and the extension for the file(s) matches typical FASTA extension.

  • Seaborn plots meant to supplant the default gnuplot plots made in the course of the standalone version of bend.it software running. The Seaborn plots aesthetically match the Excel versions of the plot used in the initial stages for visualization before scaling became imperative. Besides being more visually refined than the default gnuplot plots, the Seaborn plots offer advanced portability/storage and customization options in the Python/Jupyter ecosystem.

  • Because the dataframe underlying each plot is saved in both a compressed and tabular text form (tab-delimited), plots can be generated on-demand later, or the collected data used with other plotting software.

  • A notebook that serves as a 'Guide to accessing and reviewing the bendit results' is included that covers what the default pipeline produces and how to access it. Furthermore, it touches on how to use Jupyter/Python to conveniently view and further process all the data.

  • There's several settings for customizing what gets archived at the conclusion of the runs.

  • Upon update of the main branch, GitHub Actions automatically generate git branches with certain settings differing from the defaults. With the branches made, the environment with those settings in place can be specifically launched via MyBinder. For example, there is a lightweight setting option where the archive is streamlined to not include several of the typical items that add up to a lot of space in the archive, such as a notebook with all the Seaborn plots, separate image files from each plot, and the raw gnuplots. Additionally, in the lightweight setting the logs are scrubbed of the records of the underlying steps collected in the course of processing each individual sequence at run time.

  • A notebook showing how to run the standalone version of bend.it software in the launched sessions is also present in the repository.

  • The jupyter-archive extension is present to make it convenient to make archives of entire directories using the JupyterLab interface, without need to write and execute a command.

Customization of the pipeline

The main notebook exposes the ability to adjust certain settings and values before the processing is triggered so that the standalone bendIt program will be run with settings different from the defaults offered by the implementation at bendit-binder. Changes to the notebook settings will be limited only to runs in the current session and not be configured that way in runs in future sessions unless additional steps are taken to propagate any changes to the notebook. However, because only a limited amount of the variety of options that the standalone bendIt program accommodates are exposed this way, users may quickly find that they will require fine-grained adjustment ability. And so I'll discuss specifics of propagating customization to future sessions below in the context the associated script and ask readers bear in mind that the process to make the changes persistent will be much the same for settings available in the notebook except the altered file will be the notebook (index.ipynb) with the .ipynb extension and not the script with the .py extension, bendIt_analysis.py.

If the settings offered directly in the notebook don't allow for addressing the specific adjustment you seek, a level of very specific customization of the settings used in the standalone bendIt program runs can be achieved by editing the associated script, bendIt_analysis.py. The key section of the script controlling calling the bendIt standalone program to analyze each sequence is the following around the 700th line of the script:

os.system(f"bendIt -s {fasta_file_name_for_merged} -o \
   {output_file_suffix} -c {curvature_window_size} -b \
   {other_metric_reporting_window_size} -g \
   {report_with_curvature_settings_corrspndnce[report_with_curvature]} \
   --xmin {curvature_window_size} --xmax {end_range}")

Everything in the notebook before that is leading up to that call and everything after is dealing with formatting the results in the desired way. Users looking for fine-grain control therefore want to edit that block of code or adjust the settings for the arguments named there. For example, in the current implementation both 'Curvature window size', equivalent of --cwindows flag in the bendIt standalone options, and the 'Bendability/G+C content/Complexity window size', equivalent to --bwindows flag in the bendIt standalone options, share the same value inherited from window_size that is set in the notebook that calls this script as part of the pipeline. Either of those window sizes could be uncoupled from the other by editing the code in that block or somewhere earlier in the script, such as where curvature_window_size,other_metric_reporting_window_size =(window_size,window_size) occurs.

Similarly, the 'Scale' flag could be added into the code block shown above. Note that because the Usage shows it's the default within the run of standalone bendIt software if not supplied when called, 'Consensus' is used for the equivalent of the curvature parameter. And so the 'Scale' flag would have to be added in if any other option was to be specified.

The full listing of options/flags that can be adjusted are shown in the Usage block for the standalone bendIt software.

Customization of the bendIt_analysis.py script code will either be temporary for that current session or could be propagated to an alternative launch source so as to become the new default settings for specific launches.

The temporary nature of the served sessions means that users won't have to worry about reverting after trying things. They do need to be careful to note though anything useful worked out in those trials if they want to have that change become propagated in an alternative offering of the environment and pipeline.

To make propagated changes, users can fork the GitHub repository and make changes to the code in the fork. Then point the Binder launch URL at the fork and the MyBinder service will build and launch a session with the customized code persisting in that session and any subsequent sessions launched from the specific URL. Custom launch badges may be made to signal the difference.


Technical Reminder for Those Modifying this Repository

Click to expand!

Because of the GitHub Actions to update branches after a push, you'll notice executing a git pull right after any push from local will yield recent changes. I try to execute a git pull shortly after each push to keep the local version consistent with the GitHub version; however, it isn't necessary. I find that if you push changes to the main branch without pulling first, fortunately, it won't cause a block, or even warning, that remote has unincorporated changes.

Service Providing Active Sessions in Your Browser

This repository is set up to make use of the Binder service offered by MyBinder.org. See their site for more information about Binder.


badge

launch with settings for Nathan: badge

launch with settings for Nathan: badge

About

Bend.it software to predict DNA curvature combined with Jupyter ecosystem served via MyBinder.org

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published