CMSC423: Bioinformatic Algorithms, Databases, and Tools (Fall 2012)

Essential details

Time: TR 11-12:15
Location: CSIC 1121

Instructor: Todd Treangen (treangen at cs) x5-7395
Office hours: Tuesdays 12:30-2pm or by appointment
Office address: AVW 3223
Alternate office (by appointment): 3120B Biomolecular Sciences Building (bldg #296).
Building is usually locked. Call me from the intercom and I'll buzz you in.

Todd Treangen, PhD

TA: Milad Gholami (mgholami at cs)
TA office hours: Mondays and Wednesdays 9-11am
TA office: AVW 1112


Lecture schedule



CS Distinguished Lecture: Efficient communication and storage vs accurate variant calls in high throughput sequencing: two sides of the same coin


Computers have revolutionized modern biological research, by providing biologists with the means to manage and analyze the large amounts of data generated through high-throughput experiments. This course provides a practical introduction to the main algorithms, databases, and tools used in bioinformatics, at the same time providing insight into the biological problems being addressed. The course will cover public databases such as Genbank and PDB, software tools such as BLAST, and their underlying theory and algorithms. Students will learn to perform a number of useful tasks in analyzing sequence data and managing bioinformatic databases, with a focus on problems of current relevance in biological research.

You will also learn new algorithms that can apply to other areas of computer science, not just bioinformatics: clustering, string matching, basic machine learning, etc.

This course is designed to complement BSCI 348S, Comparative Bioinformatics.


CMSC 351 or permission of instructor (Note: CMSC 351 will not be waived for CS students). Programming expertise is a must. No background in biology is required. If you are uncertain about meeting these requirements please contact me.

Recommended Textbooks:

Computational Genome Analysis

An Introduction to Bioinformatics Algorithms

Neil C. Jones and Pavel A. Pevzner

1st ed. 2004.

ISBN: 978-0-262-10106-6

Computational Genome Analysis

Algorithms on Strings, Trees, and Sequences

Dan Gusfield

1st ed. 1997.

ISBN: 978-0-521-58519-4

Computational Genome Analysis

Biological Sequence Analysis

Richard Durbin, Sean R. Eddy , and Anders Krogh

1st ed. 1998.

ISBN: 978-0-521-62971-3

Computational Genome Analysis

Analysis of Phylogenetics and Evolution with R

Emmanuel Paradis

2nd ed. 2012.

ISBN: 978-1-4614-1742-2

Course topics

The course will cover the following main areas. A detailed lecture schedule is provided here.

  • Introduction to molecular biology

  • Bioinformatic databases

  • Sequence alignment: exact and inexact string matching, multiple sequence alignments

  • Phylogenetic tree construction

  • Gene prediction and annotation

  • Genome Sequencing and Assembly.

  • Metagenomics

Coursework and grading

Regular homework assignments will consist of a combination of one or more of the following: (i) exercises from the textbook; (ii) small programming assignments; (iii) "discovery" exercises using publicly available bioinformatics tools.  In addition, all students must complete a programming project.

The final grades will be a combination of the grades for the homework, project, midterm and final exam. In addition, participation in the class will be taken into account for extra credit. The breakdown of you final grade is shown below.

Homework - 10 %
Projects - 30 %
Midterm - 25%
Final - 35%

Unless otherwise indicated in class, most assignments will be given out on Thursdays of each week and will be expected in by the beginning of the Tuesday class. Remember, the TA office hours are on Mondays and Wednesdays so stop by to see Milad if you have any questions about your assignments.

Assignments submitted late will not be accepted! If for reasons outside your control you will not be able to submit an assignment on time, see me as soon as possible to discuss an alternate deadline.

Attendance policy

This course follows the University's attendance policy. In short, if you will miss class for any reason you should let me know in advance, unless this is not possible (e.g. sudden illness). In any case, please let me know as soon as you are aware that will not be able to attend a class (e-mail is OK). I will work with you to help you catch up on homework or exams if you have to miss any of the lectures.

Academic integrity

I expect that the students taking this class fully adhere to the Code of Academic Integrity. Please read this document in full if you have not already done so. The University of Maryland, College Park has a nationally recognized Code of Academic Integrity, administered by the Student Honor Council. This Code sets standards for academic integrity at Maryland for all undergraduate and graduate students. As a student you are responsible for upholding these standards for this course. It is very important for you to be aware of the consequences of cheating, fabrication, facilitation, and plagiarism. For more information on the Code of Academic Integrity or the Student Honor Council, please visit

To further exhibit your commitment to academic integrity, remember to sign the Honor Pledge on all examinations and assignments: "I pledge on my honor that I have not given or received any unauthorized assistance on this examination (assignment)."

Grades server

Bioinformatics seminars

Job/internship postings