NGS
As I am curious about genomic and bioinformatics related research. I took Genomic Data Science Sepcialization courses offered by Johns Hopkins University on Coursera.
Courses include Introduction to Genomics Technologies, Command Line Tools for Genomics, Python for Genomics and Algorithms for DNA Sequencing. Certificates are listed at the end of the page.
Course Notes
Introduction to Genomics
- A summary of genomic-related topics, containing notes I took from the Introduction to Genomics course from JHU, and Applied Computational Genomics course from Dr. Aaron Quinlan at U of Utah, with a splash of my understandings on these topics.
Common File Formats and Software in Genomics
A review of common formats in genomics: SAM/BAM, BED, GFF3/GFF/GTF, VCF/BCF, FastA/Q, as well as 3 course projects (adapted for better clarity).
The course material came out in 2014 and some of the software it uses are obsolete now - it made me realize how fast things are moving in computational genomics, where the ability to learn and keep up with new algorithms and tools are crucial.
Python for Genomics Data Science
- The course gives a brief introduction to python, and some broad advice of coding styles.
- I keep a python note for quick look ups.
Algorithms for DNA Sequencing
The course covers the math and practical implementation of algorithms for DNA sequence alignment and assembly. It was hands down the most fun I’ve had in the course series.
- Naive exact matching algorithm & coding project: simple and straight forward.
- Boyer-Moore matching algorithm & coding project: the art of giving up.
- Indexing the genome: index assisted matching with sorted list & hash table; subsequences vs substrings.
- Approximate matching: hamming and edit distance; pigeonhole principle.
- Coding example: kmer, sorted list, binary search, pigeonhole principle and subsequence.
- Edit distance: recursive algorithm: calculate edit distance by reducing the problem to smaller sub-problems of edit distance, and calculate recursively.
- Smith-Waterman dynamic programming: remembering the answer of the smaller sub-problems to massively reduce repetitive calculations.