Benchmarking database performance for genomic data

08/16/2020
by   Matloob Khushi, et al.
0

Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2022

Natural Image Stitching Using Depth Maps

Natural image stitching (NIS) aims to create one natural-looking mosaic ...
research
06/12/2018

Performance evaluation for CRUD operations in asynchronously replicated document oriented database

NoSQL databases are becoming increasingly popular as more developers see...
research
12/11/2018

Dockerization Impacts in Database Performance Benchmarking

Docker seems to be an attractive solution for cloud database benchmarkin...
research
07/31/2019

Simultaneous Iris and Periocular Region Detection Using Coarse Annotations

In this work, we propose to detect the iris and periocular regions simul...
research
04/11/2020

In-Machine-Learning Database: Reimagining Deep Learning with Old-School SQL

In-database machine learning has been very popular, almost being a clich...
research
09/10/2018

A collection of database industrial techniques and optimization approaches of database operations

Databases play an essential role in our society today. Databases are emb...
research
12/10/2021

A Label Correction Algorithm Using Prior Information for Automatic and Accurate Geospatial Object Recognition

Thousands of scanned historical topographic maps contain valuable inform...

Please sign up or login with your details

Forgot password? Click here to reset