Census

From Proteomics Wiki
Jump to: navigation, search

Census is a comprehensive quantitative analysis tool for mass spectrometry based proteomics.


Contents

GENERAL INFORMATION

High-throughput mass spectrometry data from shotgun proteomics experiments necessitate an efficient and automated way to analyze large amounts of quantitative data. We introduced a software tool called Census that facilitates automated quantitative analysis using either stable isotope labeling or an isotope free strategy. Using high-resolution and high mass accuracy data from an LTQ-Orbitrap hybrid mass spectrometer as input for Census, we were able to quantify roughly three times as many peptides as our previous software package (i.e., RelEx). While some of the increase can be attributed to the benefits inherent to the instrumentation, improvements in Census are also responsible. One of the reasons for the increase in accurately quantified peptides is that Census minimizes the contributions of interfering peaks and chemical noise by taking advantage of the high mass accuracy of the Orbitrap using a small mass accuracy tolerance for each isotopic peak. In addition, a dynamic peak finding algorithm is employed. This algorithm makes use of database search results for improved accuracy and quantification efficiency. Finally, a weighted means of the peptides are calculated to determine the protein ratios.

Requirement

Software requirement

Java version (JDK) version 6 or later. See http://java.sun.com/
Census has been developed in Java and is, therefore, operating system independent. It can be deployed on a personnel Windows desktop using a graphical user interface or onto a high performance server and run from the console making it accessible to multiple users.

Spectral file formats

MS1/MS2 files (McDonald, W.H. et al. Rapid Commun. Mass Spectrom. 18, pp2162-2168 (2004)
Download RawXtractor to generate MS1/MS2 files. RawXtractor currently supports only Thermo Fisher instruments only.
mzXML. See more information at ISB web site. There are scripts available to generate mzXML from various instrument types.
Protein/Peptide search output formats

DTASelect output format
pepXML

census-out.txt file header

PLINE

  LOCUS: Protein accession from protein database file
  AVERAGE_RATIO: average ratio of peptide ratios
  AVERAGE_RATIO_REV: reverse of average ratio
  STANDARD_DEVIATION: standard deviation of peptide ratios
  STANDARD_DEVIATION_REV: standard deviation of reverse peptide ratios.
  WEIGHTED_AVERAGE: weight average of peptides based on R scores
  PEPTIDE_NUM: number of peptides
  SPEC_COUNT: spectral count
  LSPEC_COUNT: spectral count from light peptides
  HSPEC_COUNT: spectral count from heavy peptides
  AREA_RATIO: average ratio from peptide peak area
  DESCRIPTION: protein description

SLINE

  UNIQUE: unique peptide or not
  SEQUENCE: peptide sequence
  RATIO: peptide ratio based on linear correlation.  Ratio is light one divided by heavy one.
  REV_SLOPE_RATIO: reverse ratio based on linear correlation (LR).  This value may not be exactly 1/R due to the reverse slope formula in LR
  PROBABILITY_SCORE: probability score based on LR
  REGRESSION_FACTOR: regression score (r)
  DETERMINANT_FACTOR: r x r
  XCorr: correlation score from search engine
  deltaCN: delta mass from search engine
  SAM_INT: light peptide peak area from reconstructed chromatogram
  REF_INT: heavy peptide peak area from reconstructed chromatogram
  AREA_RATIO: SAM_INT/REF_INF      
  PROFILE_SCORE: fitting score comparing peak and gaussian distribution
  FILE_NAME: spectral (raw) file name
  SCAN: scan number
  CS: charge state
  BEST_ENRICH_CORR: the best correlation score among all possible enrichment modeling (15N enrichment calculation only)
  BEST_ENRICH_DELCN: the difference between the best and second best correlation scores (15N enrichment calculation only)
  CORR_ONE_PLUS: the correlation score of enrichment value right after the best (15N enrichment calculation only)
  CORR_ONE_MINUS: the correlation score of enrichment value right before the best (15N enrichment calculation only)  
  ENRICHMENT: a.p.e. value  (15N enrichment calculation only)
  ENRICHMENT_MR: 15N remaining ratio(15N enrichment calculation only)

Composite score

1. Traditionally, Census generates a determinant score (regression) for each peptide for quality control. It also generates profile scores for singleton peptides.
2. The idea of the composite score is to combine quality peptides (determinant score, R^2, bigger than 0.5) and singleton peptides.
3. For each protein, Census calculates the median intensity value of all associated peptides using a leave-one-out procedure. If the left out peptide intensity is lower than 20% of the median value, then the peptide is considered as an outlier.
4. Census calculates the peak area ratio for each peptide and assigns the median of these ratios to the protein.

Citation

The Census citation
A quantitative analysis software tool for mass spectrometry.based proteomics. Sung Kyu Park, John D Venable, Tao Xu, John R Yates III, Nature Methods, 2008, 5, 319-322
pubmed


Census home page

Please visit Census web page at http://fields.scripps.edu/census