University of Maryland Nathan Edwards
Center for Bioinformatics and Computational Biology
Home Research Teaching Publications

Research
Proteomics
Tools
Data


Research

Proteomics

Most of my research effort goes into the field of proteomics, the determination of what proteins, and how much of them, are present in a particular biological sample, using mass spectrometry. Working with the Fenselau lab, I hope to help deliver on the promise of mass spectrometry based proteomics, making it a reliable high-throughput bioinformatics protocol like gene expression, genotyping and sequencing.

I'm currently working to construct amino-acid sequence databases that better reflect the sequences of peptides observed in peptide identification workflows. Tandem mass spectra resist identification by sequence databases search engines for a variety of reasons, one of which is that current protein sequence databases do not contain the sequences of all the peptides observed experimentally. My research tackles this problem by integrating putative peptide sequences from genomic data sources and by compressing the resulting sequences so that search times remain feasible. See the Peptide Sequence Databases project page for more information.

I'm also working on bioinformatics tools for rapid microorganism identification by mass spectrometry. Together with Fernando Pineda, at John's Hopkins Bloomberg School of Public Health, I run the Rapid Microorganism Identification DataBase (RMIDB), which matches protein and peptide masses derived from Bacterial genome sequences with the peaks of mass spectra. The RMIDB permits the user to define arbitrary subsets of the universe of bacterial proteins corresponding to proteins known to be naturally abundant, or selected for by sample preparation. The RMIDB captures Pfam and TIGRFAM protein family annotations; species, genus, and organism annotations; UniProt keyword annotations; and initial Methionine loss post-translational modification for each biomarker. Both intact protein and tryptic peptide biomarkers are supported. Access to the current prototype is limited to the University of Maryland campus by default, but off-campus access is available by request.

Tools

PCR Match, Primer Match

A set of tools for quickly finding a large number of short nucleotide sequences in large genomic databases. Many options for constraining acceptable alignments and input/output formats. Automatically optimizes the sequence search strategy for search parameters. See the Primer Match project page for more information.

PyMsXML

A python script that converts vendor specific mass spectrometry file formats to open XML file formats. Applied Biosystems Q-Star, Mariner, Voyager, and 4700 are supported, as are the mzXML and mzData XML file formats. See the PyMsXML project page for more information.

PyNISTPL

A set of python scripts and a module that interfaces with the NIST MS search engine to provide easy to use command line searching and construction of peptide spectrum libraries. See the PyNISTPL project page for more information.

Protein Insertion for Mascot Searches

A set of tools to insert protein information into the results of a Mascot search. When searching a sequence database, such as a compressed amino-acid sequence database, that provides no protein context, the protein information for each peptide sequence must be inserted after the fact. These tools parse the Mascot result file, search for the peptide sequences in a sequence database containing the protein information, and output a new Mascot result file with the protein information inserted. See the Sequence Database Compression for Peptide Identification project page for more information.

BitTorrent for clusters

When staging large input files to the many nodes of a cluster, the traditional network copy approach (scp, rcp, NFS) quickly overwhelms the machine holding the original copy of the input file. BitTorrent, a peer-to-peer file distribution protocol is naturally suited for file distribution when the file is large, and many clients want the file simultaneously - precisely the scenario when a staging large input files for a distributed cluster job. See the BitTorrent for clusters project page for more information.

Data

Compressed Amino-acid Sequence Databases

The sequence database compression for peptide identification algorithm applied to popular sequence databases used by tandem mass spectrometry search engines are available for download.

Incoming

To send me data, please use the incoming ftp directory: ftp://ftp.umiacs.umd.edu/incoming/nedwards. You will not be able to see this directories' contents, or download its files, but you can do an FTP put.

.......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........

University of Maryland     UM Home | Directories | Search | Admissions | Calendar
Original created by John Fuetsch
Questions and comments to Nathan Edwards