By continuing to use this website, you consent to the use of cookeies, which are used to ensure you get the best experience.

Machine learning solutions for genomic signal extraction and analysis

Create Project Learn More

Genomics technologies, such as ATAC-seq, ChIP-seq, and DNase-seq, have revolutionised molecular biology, generating a complete genome’s worth of signal in a single assay. The challengeis no longer data generation, it's effectively and reproducibly extracting biological meaning from such massively complex datasets. While other tools approach this problem with simple statistical tests, our novel machine learning model uses a convolutional neural network, local genomic enrichments measurements, and Poisson-based significance testing from multiple viewpoints, all integrated using a multilayer perceptron to give a probability of being a true biological signal. We hand-labelled 499Mb of genomic data, built 5,000 models, and tested with over 100 unique users from labs around the world. And because it’s built on the powerful MLV visualisation software, results can easily be visualised and shared with collaborators or reviewers. The culmination of these efforts is a peak caller that can extract more from your data, create interactive charts, and improve interpretability - all while simplifying the analysis process.

Create A Project

Create a Project

Find and Score Peaks

INPUTS:

bigWig coverage file

Find genomic locations of enriched regions using an uploaded coverage track, and score them using LanceOtron’s deep neural network

Start

Find and Score Peaks with Inputs

INPUTS:

bigWig coverage file

bigWig input file

Find genomic locations of enriched regions, score them using LanceOtron’s deep neural network, and calculate significance of enrichment beyond a control input track

Start

Score Peaks

INPUTS:

bigWig coverage file

Bed file of called peaks

Upload a list of genomic locations, and associated coverage track, to be scored using LanceOtron’s deep neural network

Start

Preparing bigWig Files

All LanceOtron modules require a bigwig file to supply the model with coverage data. We recommend directly converting BAM files to bigwigs with deepTools] using the following command:
bamCoverage --bam filename.bam.sorted -o filename.bw --extendReads -bs 1 --normalizeUsing RPKM
The options used in this command are important, as they affect the shape of peaks and therefore the neural network's assessment. Extending the reads out to the fragment length represents a more accurate picture of coverage (N.B for paired end sequencing the extension length is automatically determined, single end tracks will require the user to specify the `--extendReads` length), as does using a bin size of 1 (the `--bs` flag). We recommend RPKM normalisation, as this was also used for the training data.

VIDEOS

Watch a video demonstrating the basic functionality of LanceOtron

Introduction to LanceOtron

Find and Score Peaks with Input

Tutorial - Exploring Peak Calls

Featured Projects

CTCF ChIP-seq in primary spleen cells

Find and Score Peaks with Inputs using data from ENCODE experiment ENCSR692ILH. The control input track has been overlaid and thumbnail images were created, allowing users to quickly scan the quality of the peak call. Intersections have also been calculated between GenoSTAN annotated promoters and enhancers.

View

H3K27ac ChIP-seq in 22Rv1 cells

Score Peaks project from ENCODE experiment ENCSR391NPE. Here two replicates were originally peak called using the MACS2 peak caller, and only regions present in both calls were used. Despite the extensive quality control measures carried out, numerous false positives remain.

Create A Project

Find and Score Peaks

INPUTS:

Find and Score Peaks with Inputs

INPUTS:

Score Peaks

INPUTS:

Preparing bigWig Files

VIDEOS

Introduction to LanceOtron

Find and Score Peaks with Input

Tutorial - Exploring Peak Calls

Featured Projects

CTCF ChIP-seq in primary spleen cells

H3K27ac ChIP-seq in 22Rv1 cells

1

USERS

1

PROJECTS

1

JOBS