# Census

Census is a comprehensive quantitative analysis tool for mass spectrometry based proteomics.

## Contents |

## GENERAL INFORMATION

High-throughput mass spectrometry data from shotgun proteomics experiments necessitate an efficient and automated way to analyze large amounts of quantitative data. We introduced a software tool called Census that facilitates automated quantitative analysis using either stable isotope labeling or an isotope free strategy. Using high-resolution and high mass accuracy data from an LTQ-Orbitrap hybrid mass spectrometer as input for Census, we were able to quantify roughly three times as many peptides as our previous software package (i.e., RelEx). While some of the increase can be attributed to the benefits inherent to the instrumentation, improvements in Census are also responsible. One of the reasons for the increase in accurately quantified peptides is that Census minimizes the contributions of interfering peaks and chemical noise by taking advantage of the high mass accuracy of the Orbitrap using a small mass accuracy tolerance for each isotopic peak. In addition, a dynamic peak finding algorithm is employed. This algorithm makes use of database search results for improved accuracy and quantification efficiency. Finally, a weighted means of the peptides are calculated to determine the protein ratios.

## Requirement

Software requirement

Java version (JDK) version 6 or later. See http://java.sun.com/

Census has been developed in Java and is, therefore, operating system independent. It can be deployed on a personnel Windows desktop using a graphical user interface or onto a high performance server and run from the console making it accessible to multiple users.

Spectral file formats

MS1/MS2 files (McDonald, W.H. et al. Rapid Commun. Mass Spectrom. 18, pp2162-2168 (2004)

Download RawXtractor to generate MS1/MS2 files. RawXtractor currently supports only Thermo Fisher instruments only.

mzXML. See more information at ISB web site. There are scripts available to generate mzXML from various instrument types.

Protein/Peptide search output formats

DTASelect output format

pepXML

## census-out.txt file header

### PLINE

LOCUS: Protein accession from protein database file AVERAGE_RATIO: average ratio of peptide ratios AVERAGE_RATIO_REV: reverse of average ratio STANDARD_DEVIATION: standard deviation of peptide ratios STANDARD_DEVIATION_REV: standard deviation of reverse peptide ratios. WEIGHTED_AVERAGE: weight average of peptides based on R scores PEPTIDE_NUM: number of peptides SPEC_COUNT: spectral count LSPEC_COUNT: spectral count from light peptides HSPEC_COUNT: spectral count from heavy peptides AREA_RATIO: average ratio from peptide peak area DESCRIPTION: protein description

### SLINE

UNIQUE: unique peptide or not SEQUENCE: peptide sequence RATIO: peptide ratio based on linear correlation. Ratio is light one divided by heavy one. REV_SLOPE_RATIO: reverse ratio based on linear correlation (LR). This value may not be exactly 1/R due to the reverse slope formula in LR PROBABILITY_SCORE: probability score based on LR REGRESSION_FACTOR: regression score (r) DETERMINANT_FACTOR: r x r XCorr: correlation score from search engine deltaCN: delta mass from search engine SAM_INT: light peptide peak area from reconstructed chromatogram REF_INT: heavy peptide peak area from reconstructed chromatogram AREA_RATIO: SAM_INT/REF_INF PROFILE_SCORE: fitting score comparing peak and gaussian distribution FILE_NAME: spectral (raw) file name SCAN: scan number CS: charge state BEST_ENRICH_CORR: the best correlation score among all possible enrichment modeling (15N enrichment calculation only) BEST_ENRICH_DELCN: the difference between the best and second best correlation scores (15N enrichment calculation only) CORR_ONE_PLUS: the correlation score of enrichment value right after the best (15N enrichment calculation only) CORR_ONE_MINUS: the correlation score of enrichment value right before the best (15N enrichment calculation only) ENRICHMENT: a.p.e. value (15N enrichment calculation only) ENRICHMENT_MR: 15N remaining ratio(15N enrichment calculation only)

## Composite score

1. Traditionally, Census generates a determinant score (regression) for each peptide for quality control. It also generates profile scores for singleton peptides.

2. The idea of the composite score is to combine quality peptides (determinant score, R^2, bigger than 0.5) and singleton peptides.

3. For each protein, Census calculates the median intensity value of all associated peptides using a leave-one-out procedure. ~~If the left out peptide intensity is lower than 20% of the median value, then the peptide is considered as an outlier.~~

4. Census calculates the peak area ratio for each peptide and assigns the median of these ratios to the protein.

## Citation

The Census citation

A quantitative analysis software tool for mass spectrometry.based proteomics. Sung Kyu Park, John D Venable, Tao Xu, John R Yates III, Nature Methods, 2008, 5, 319-322

pubmed

## Census home page

Please visit Census web page at http://fields.scripps.edu/census