| Research |
 |
 |
Proteomics
Most of my research effort goes into the field of proteomics, the
determination of what proteins, and how much of them, are present in a
particular biological sample, using mass spectrometry. Working with
the Fenselau
lab, I hope to help deliver on the promise of mass spectrometry
based proteomics, making it a reliable high-throughput bioinformatics
protocol like gene expression, genotyping and sequencing.
I'm currently working to construct amino-acid sequence databases that
better reflect the sequences of peptides observed in peptide
identification workflows. Tandem mass spectra resist identification by
sequence databases search engines for a variety of reasons, one of
which is that current protein sequence databases do not contain the
sequences of all the peptides observed experimentally. My research
tackles this problem by integrating putative peptide sequences from
genomic data sources and by compressing the resulting sequences so
that search times remain feasible. See the Peptide Sequence
Databases project page for more information.
I'm also working on bioinformatics tools for rapid
microorganism identification by mass spectrometry. Together with
Fernando
Pineda, at John's Hopkins Bloomberg School of Public Health, I run
the Rapid Microorganism Identification DataBase
(RMIDB), which matches protein and peptide masses derived from
Bacterial genome sequences with the peaks of mass spectra. The
RMIDB permits the user to define arbitrary subsets of the universe
of bacterial proteins corresponding to proteins known to be naturally
abundant, or selected for by sample preparation. The RMIDB captures Pfam
and TIGRFAM protein family annotations; species, genus, and organism
annotations; UniProt keyword annotations; and initial Methionine loss
post-translational modification for each biomarker. Both intact protein
and tryptic peptide biomarkers are supported. Access to the current
prototype is limited to the University of Maryland campus by default,
but off-campus access is available by request.
Tools
PCR Match, Primer Match
A set of tools for quickly finding a large number of short nucleotide
sequences in large genomic databases. Many options for constraining
acceptable alignments and input/output formats. Automatically
optimizes the sequence search strategy for search parameters.
See the Primer Match project page for more information.
PyMsXML
A python script that converts vendor specific mass spectrometry file formats to open XML file formats. Applied Biosystems Q-Star, Mariner, Voyager, and 4700 are supported, as are the mzXML and mzData XML file formats.
See the PyMsXML project page for more information.
PyNISTPL
A set of python scripts and a module that interfaces with the NIST MS
search engine to provide easy to use command line searching and
construction of peptide spectrum libraries.
See the PyNISTPL project page for more information.
Protein Insertion for Mascot Searches
A set of tools to insert protein information into the results of a
Mascot search. When searching a sequence database, such as a
compressed amino-acid sequence database, that provides no protein
context, the protein information for each peptide sequence must be
inserted after the fact. These tools parse the Mascot result file,
search for the peptide sequences in a sequence database containing the
protein information, and output a new Mascot result file with the
protein information inserted.
See the Sequence Database Compression
for Peptide Identification project page for more information.
BitTorrent for clusters
When staging large input files to the many nodes of a cluster, the
traditional network copy approach (scp, rcp, NFS) quickly overwhelms
the machine holding the original copy of the input file. BitTorrent, a
peer-to-peer file distribution protocol is naturally suited for file
distribution when the file is large, and many clients want the file
simultaneously - precisely the scenario when a staging large input
files for a distributed cluster job.
See the BitTorrent for clusters project page for more information.
Data
Compressed Amino-acid Sequence Databases
The sequence database compression for peptide identification algorithm
applied to popular sequence databases used by tandem mass spectrometry
search engines are available for download.
Incoming
To send me data, please use the incoming ftp directory: ftp://ftp.umiacs.umd.edu/incoming/nedwards. You will not be able to see this directories' contents, or download its files, but you can do an FTP put.
|