Dudchenko juicebox youtube1/18/2024 ![]() NOTE 1: The input BAM could either sorted by read names ( samtools sort with -n option) or not. ![]() This parameter has to be set when using pipe in as input where file name extension detection will fail. Accepted values include BAM, BED, PA5 and BIN (case insensitive). YaHS also provides the parameter -file-type to allow specification of input file format. ![]() YaHS determines the input file format from file name extensions (case insensitive). If you have run YaHS and need to rerun it, the BIN file in the output directory could be reused to save some time - although might be just a few minutes. This is to save running time as multiple rounds of file IO are needed during the scaffolding process. If the input file is not BIN format, the first step of YaHS is to convert them to BIN format. The BIN format is a binary format specific to YaHS. All the information after the seventh column are ignored. The sixth and seventh column will be treated as mapping quality scores for the first and second reads respectively if they are numeric values. Each line in a pair format file should at least contain five columns containing the mapping information of a read pair, i.e., the read pair name, the contig name the first read mapped to, the mapping position the of the first read, the contig name the second read mapped to, and the mapping position of the second read. YaHS also accepts pair format input (PA5). There is no need to convert the BAM format to BED format unless you want to compare YaHS to other tools. The BED format file can be generated from the BAM file with bedtools bamtobed for example. Each read pair should be placed in two consecutive lines (i.e., the BED file should be sorted by the first column). All the information after the fifth column are ignored. The fifth column will be treated as the mapping quality score if they are numeric values. The first and second read from a read pair is optionally marked by '/1' and '/2' suffix to the read name. For BED format input, each line should contain at least four columns, i.e., the contig name the read mapped to, the start position of the alignment, the end position of the alignment and the read name. Several tools are available out there for marking duplicates such as bammarkduplicates2 from biobambam2 and MarkDuplicates from Picard. ![]() The BAM file is recommened to mark PCR/optical duplicates before feeding to YaHS. There are many pipelines available to generate the alignment file such as the Arima Genomics' mapping pipeline, the Omni-C's mapping pipeline and the HiC-Pro. YaHS has two required inputs: a FASTA format file with contig sequences which need to be indexed (with samtools faidx for example) and a BAM/BED/BIN/PA5 file with the alignment results of Hi-C reads to the contigs. Then type make in the source code directory to compile. Download the source code from this repo or with git clone. You need to have a C compiler, GNU make and zlib development files installed. See the poster presented in the Bioversity Genemics 2021 conference for more information. It is also super fast - takes less than 5 minutes to reconstruct the human genome from an assembly of 5,483 contigs with ~45X Hi-C data. Compared to other Hi-C scaffolding tools, it usually generates more contiguous scaffolds - especially with a higher N90 and L90 statistics. YaHS has been tested in a wide range of genome assemblies. It relies on a new algothrim for contig joining detection which considers the topological distribution of Hi-C signals aiming to distingush real interaction signals from mapping nosies. YaHS is a scaffolding tool using Hi-C data. YaHS: yet another Hi-C scaffolding tool Overview
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |