티스토리 뷰

Group the TCGA somatic mutation file according to each patient

  • I use COAD(colon adenocarcinoma) data here.
  • The file format is MAF(mutation annotation format).
  • The MAF file contains 'Tumor_Sample_Barcode' column.
    The barcode is represented as follows:

TCGA-SiteID-PatientID-SampleID-PortionID-PlateID-CenterID

TCGA barcode

  • SiteID (or TSS) means "tissue source site", which has the information about hospitals or institutes and related study name.
    (Refer to the "Code Tables Report" for more details.)
  • A patient (or participant) is identified by the first three words: TCGA-02-0001 in the above example.
  • Refer to the "TCGA barcode" page for more details.

In [1]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
In [2]:
print( "pandas: %s"%pd.__version__ )
print( "numpy: %s"%np.__version__ )
pandas: 0.17.0
numpy: 1.10.1
In [3]:
df = pd.read_table("hgsc.bcm.edu__Illumina_Genome_Analyzer_DNA_Sequencing_level2.maf")
In [4]:
df.columns
Out[4]:
Index([u'Hugo_Symbol', u'Entrez_Gene_Id', u'Center', u'Ncbi_Build', u'Chrom',
       u'Start_Position', u'End_Position', u'Strand',
       u'Variant_Classification', u'Variant_Type', u'Reference_Allele',
       u'Tumor_Seq_Allele1', u'Tumor_Seq_Allele2', u'Dbsnp_Rs',
       u'Dbsnp_Val_Status', u'Tumor_Sample_Barcode',
       u'Matched_Norm_Sample_Barcode', u'Match_Norm_Seq_Allele1',
       u'Match_Norm_Seq_Allele2', u'Tumor_Validation_Allele1',
       u'Tumor_Validation_Allele2', u'Match_Norm_Validation_Allele1',
       u'Match_Norm_Validation_Allele2', u'Verification_Status',
       u'Validation_Status', u'Mutation_Status', u'Sequencing_Phase',
       u'Sequence_Source', u'Validation_Method', u'Score', u'Bam_File',
       u'Sequencer', u'Tumor_Sample_UUID', u'Matched_Norm_Sample_UUID',
       u'File_Name', u'Archive_Name', u'Line_Number'],
      dtype='object')

We need mis-sense mutations only. So, drop the rows which contain nucleotide insertion or deletion.


In [5]:
df.replace('-', np.nan, inplace=True)
C:\Users\dwlee\Anaconda\envs\py27\lib\site-packages\pandas\core\common.py:449: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask = arr == x
In [6]:
df.dropna(how='any', inplace=True)
In [7]:
(df == np.nan).any()
Out[7]:
Hugo_Symbol                      False
Entrez_Gene_Id                   False
Center                           False
Ncbi_Build                       False
Chrom                            False
Start_Position                   False
End_Position                     False
Strand                           False
Variant_Classification           False
Variant_Type                     False
Reference_Allele                 False
Tumor_Seq_Allele1                False
Tumor_Seq_Allele2                False
Dbsnp_Rs                         False
Dbsnp_Val_Status                 False
Tumor_Sample_Barcode             False
Matched_Norm_Sample_Barcode      False
Match_Norm_Seq_Allele1           False
Match_Norm_Seq_Allele2           False
Tumor_Validation_Allele1         False
Tumor_Validation_Allele2         False
Match_Norm_Validation_Allele1    False
Match_Norm_Validation_Allele2    False
Verification_Status              False
Validation_Status                False
Mutation_Status                  False
Sequencing_Phase                 False
Sequence_Source                  False
Validation_Method                False
Score                            False
Bam_File                         False
Sequencer                        False
Tumor_Sample_UUID                False
Matched_Norm_Sample_UUID         False
File_Name                        False
Archive_Name                     False
Line_Number                      False
dtype: bool

To identify patient ID, we can utilize a simple regular expression.


In [8]:
import re
p = re.compile(r'(TCGA-\w+?-\w+?)-')
def find_patient(barcode):
    m = p.search(barcode)
    if m is not None:
        return m.group(1)
    else:
        np.nan
# end of def
list_patients = df.Tumor_Sample_Barcode.apply(find_patient)
print list_patients
9         TCGA-AM-5820
20        TCGA-AM-5820
40        TCGA-AM-5820
44        TCGA-AM-5820
53        TCGA-AM-5820
55        TCGA-AM-5820
61        TCGA-AM-5821
63        TCGA-CA-6718
65        TCGA-AM-5821
69        TCGA-AM-5821
80        TCGA-A6-6140
81        TCGA-AM-5820
84        TCGA-CM-6674
87        TCGA-F4-6856
92        TCGA-D5-6930
93        TCGA-D5-6930
99        TCGA-AM-5821
100       TCGA-AM-5820
103       TCGA-AM-5821
113       TCGA-AM-5820
127       TCGA-AD-6889
142       TCGA-AM-5821
146       TCGA-AM-5821
149       TCGA-AM-5821
154       TCGA-AZ-5403
160       TCGA-AM-5821
165       TCGA-CK-4950
166       TCGA-G4-6298
171       TCGA-AA-3697
172       TCGA-AA-3712
              ...     
114324    TCGA-AM-5820
114332    TCGA-AM-5821
114333    TCGA-AM-5821
114334    TCGA-AM-5821
114335    TCGA-AM-5821
114336    TCGA-AM-5821
114337    TCGA-AM-5821
114342    TCGA-AM-5821
114353    TCGA-AM-5821
114360    TCGA-CK-4952
114365    TCGA-AM-5820
114369    TCGA-AA-3662
114370    TCGA-AA-3713
114371    TCGA-AM-5821
114372    TCGA-AY-6386
114373    TCGA-D5-6926
114387    TCGA-AM-5821
114402    TCGA-AM-5821
114403    TCGA-AM-5821
114409    TCGA-AY-6386
114434    TCGA-G4-6588
114443    TCGA-AZ-6601
114444    TCGA-DM-A0XD
114448    TCGA-AZ-6598
114458    TCGA-AM-5820
114460    TCGA-AM-5820
114461    TCGA-CM-6680
114462    TCGA-AM-5820
114464    TCGA-AM-5820
114466    TCGA-AU-6004
Name: Tumor_Sample_Barcode, dtype: object

We can check how many patients are included in this COAD data.


In [9]:
num_patient_data = len(list_patients)
num_patients = len(set(list_patients))
print "Number of barcodes for patients", num_patient_data
print "Number of patients: ", num_patients
print "Is there any redundant information for a single patient: ", num_patient_data != num_patients
Number of barcodes for patients 22167
Number of patients:  214
Is there any redundant information for a single patient:  True

To process "groupy-by" operation, we add a new column for the patient ID.


In [10]:
df['Patient'] = list_patients
In [11]:
df[['Hugo_Symbol', 'Patient']][:10]
Out[11]:
Hugo_Symbol Patient
9 A1CF TCGA-AM-5820
20 A2M TCGA-AM-5820
40 A2ML1 TCGA-AM-5820
44 A2ML1 TCGA-AM-5820
53 A2ML1 TCGA-AM-5820
55 A2ML1 TCGA-AM-5820
61 A4GALT TCGA-AM-5821
63 A4GALT TCGA-CA-6718
65 A4GALT TCGA-AM-5821
69 A4GNT TCGA-AM-5821

As we can see the above, a single gene (e.g., A2ML1) of a patient can harbor multiple mutations.


In [12]:
gb = df.groupby('Patient')
In [13]:
gb['Hugo_Symbol'].count()
Out[13]:
Patient
TCGA-A6-2671      6
TCGA-A6-2675      8
TCGA-A6-4105      8
TCGA-A6-5656     14
TCGA-A6-5657      3
TCGA-A6-5659     16
TCGA-A6-5660     17
TCGA-A6-5661     65
TCGA-A6-5662      6
TCGA-A6-5664     14
TCGA-A6-5665     70
TCGA-A6-5666      9
TCGA-A6-5667    111
TCGA-A6-6137     11
TCGA-A6-6138      8
TCGA-A6-6140    389
TCGA-A6-6141    120
TCGA-A6-6142      5
TCGA-A6-6648      7
TCGA-A6-6649     13
TCGA-A6-6650     27
TCGA-A6-6651     26
TCGA-A6-6652      9
TCGA-A6-6653     65
TCGA-A6-6654     17
TCGA-A6-6780     74
TCGA-A6-6781    119
TCGA-A6-6782     24
TCGA-AA-3489      9
TCGA-AA-3492     93
               ... 
TCGA-F4-6809     12
TCGA-F4-6854     16
TCGA-F4-6855     44
TCGA-F4-6856     70
TCGA-G4-6293     69
TCGA-G4-6294     27
TCGA-G4-6295      8
TCGA-G4-6297    158
TCGA-G4-6298    215
TCGA-G4-6299     11
TCGA-G4-6302    233
TCGA-G4-6303     12
TCGA-G4-6304     21
TCGA-G4-6306      7
TCGA-G4-6307      6
TCGA-G4-6309     40
TCGA-G4-6310     11
TCGA-G4-6311    165
TCGA-G4-6314     11
TCGA-G4-6315     12
TCGA-G4-6317     13
TCGA-G4-6320     71
TCGA-G4-6321    274
TCGA-G4-6322      9
TCGA-G4-6323      9
TCGA-G4-6586     56
TCGA-G4-6588     91
TCGA-G4-6625      8
TCGA-G4-6626     17
TCGA-G4-6628     75
Name: Hugo_Symbol, dtype: int64
In [14]:
gb['Hugo_Symbol'].describe()
Out[14]:
Patient             
TCGA-A6-2671  count           6
              unique          5
              top          NEFH
              freq            2
TCGA-A6-2675  count           8
              unique          8
              top         PSKH2
              freq            1
TCGA-A6-4105  count           8
              unique          7
              top          IRF5
              freq            2
TCGA-A6-5656  count          14
              unique         13
              top         THAP8
              freq            2
TCGA-A6-5657  count           3
              unique          3
              top          KRAS
              freq            1
TCGA-A6-5659  count          16
              unique         16
              top          KRAS
              freq            1
TCGA-A6-5660  count          17
              unique         17
              top       CACNA1B
              freq            1
TCGA-A6-5661  count          65
              unique         65
                         ...   
TCGA-G4-6321  top          SSPO
              freq            3
TCGA-G4-6322  count           9
              unique          9
              top         MASP2
              freq            1
TCGA-G4-6323  count           9
              unique          9
              top          KRAS
              freq            1
TCGA-G4-6586  count          56
              unique         54
              top        DYRK1B
              freq            2
TCGA-G4-6588  count          91
              unique         91
              top       SLCO6A1
              freq            1
TCGA-G4-6625  count           8
              unique          8
              top          KRAS
              freq            1
TCGA-G4-6626  count          17
              unique         17
              top         ZNF99
              freq            1
TCGA-G4-6628  count          75
              unique         74
              top           TTN
              freq            2
dtype: object

Now, we make a sub directory, named "patient", and save DataFrames for each patient in the MAF file format.


In [15]:
!mkdir patient
In [16]:
!ls
candra_input_tcga_coad.txt
group_tcga_somuts_each_patient.ipynb
hgsc.bcm.edu__Illumina_Genome_Analyzer_DNA_Sequencing_level2.maf
parse_tcga_somuts.html
parse_tcga_somuts.ipynb
patient
In [17]:
for name, group in gb:
    group.to_csv("patient/tcga_coad_somuts_%s.maf"%(name), sep='\t', index=False)
    
In [18]:
!ls patient
tcga_coad_somuts_TCGA-A6-2671.maf
tcga_coad_somuts_TCGA-A6-2675.maf
tcga_coad_somuts_TCGA-A6-4105.maf
tcga_coad_somuts_TCGA-A6-5656.maf
tcga_coad_somuts_TCGA-A6-5657.maf
tcga_coad_somuts_TCGA-A6-5659.maf
tcga_coad_somuts_TCGA-A6-5660.maf
tcga_coad_somuts_TCGA-A6-5661.maf
tcga_coad_somuts_TCGA-A6-5662.maf
tcga_coad_somuts_TCGA-A6-5664.maf
tcga_coad_somuts_TCGA-A6-5665.maf
tcga_coad_somuts_TCGA-A6-5666.maf
tcga_coad_somuts_TCGA-A6-5667.maf
tcga_coad_somuts_TCGA-A6-6137.maf
tcga_coad_somuts_TCGA-A6-6138.maf
tcga_coad_somuts_TCGA-A6-6140.maf
tcga_coad_somuts_TCGA-A6-6141.maf
tcga_coad_somuts_TCGA-A6-6142.maf
tcga_coad_somuts_TCGA-A6-6648.maf
tcga_coad_somuts_TCGA-A6-6649.maf
tcga_coad_somuts_TCGA-A6-6650.maf
tcga_coad_somuts_TCGA-A6-6651.maf
tcga_coad_somuts_TCGA-A6-6652.maf
tcga_coad_somuts_TCGA-A6-6653.maf
tcga_coad_somuts_TCGA-A6-6654.maf
tcga_coad_somuts_TCGA-A6-6780.maf
tcga_coad_somuts_TCGA-A6-6781.maf
tcga_coad_somuts_TCGA-A6-6782.maf
tcga_coad_somuts_TCGA-AA-3489.maf
tcga_coad_somuts_TCGA-AA-3492.maf
tcga_coad_somuts_TCGA-AA-3502.maf
tcga_coad_somuts_TCGA-AA-3510.maf
tcga_coad_somuts_TCGA-AA-3511.maf
tcga_coad_somuts_TCGA-AA-3655.maf
tcga_coad_somuts_TCGA-AA-3660.maf
tcga_coad_somuts_TCGA-AA-3662.maf
tcga_coad_somuts_TCGA-AA-3663.maf
tcga_coad_somuts_TCGA-AA-3697.maf
tcga_coad_somuts_TCGA-AA-3712.maf
tcga_coad_somuts_TCGA-AA-3713.maf
tcga_coad_somuts_TCGA-AD-5900.maf
tcga_coad_somuts_TCGA-AD-6548.maf
tcga_coad_somuts_TCGA-AD-6888.maf
tcga_coad_somuts_TCGA-AD-6889.maf
tcga_coad_somuts_TCGA-AD-6890.maf
tcga_coad_somuts_TCGA-AD-6895.maf
tcga_coad_somuts_TCGA-AD-6899.maf
tcga_coad_somuts_TCGA-AD-6901.maf
tcga_coad_somuts_TCGA-AD-6963.maf
tcga_coad_somuts_TCGA-AD-6964.maf
tcga_coad_somuts_TCGA-AD-6965.maf
tcga_coad_somuts_TCGA-AM-5820.maf
tcga_coad_somuts_TCGA-AM-5821.maf
tcga_coad_somuts_TCGA-AU-3779.maf
tcga_coad_somuts_TCGA-AU-6004.maf
tcga_coad_somuts_TCGA-AY-5543.maf
tcga_coad_somuts_TCGA-AY-6196.maf
tcga_coad_somuts_TCGA-AY-6197.maf
tcga_coad_somuts_TCGA-AY-6386.maf
tcga_coad_somuts_TCGA-AZ-4315.maf
tcga_coad_somuts_TCGA-AZ-4323.maf
tcga_coad_somuts_TCGA-AZ-4615.maf
tcga_coad_somuts_TCGA-AZ-4616.maf
tcga_coad_somuts_TCGA-AZ-4682.maf
tcga_coad_somuts_TCGA-AZ-5403.maf
tcga_coad_somuts_TCGA-AZ-5407.maf
tcga_coad_somuts_TCGA-AZ-6598.maf
tcga_coad_somuts_TCGA-AZ-6599.maf
tcga_coad_somuts_TCGA-AZ-6600.maf
tcga_coad_somuts_TCGA-AZ-6601.maf
tcga_coad_somuts_TCGA-AZ-6603.maf
tcga_coad_somuts_TCGA-AZ-6605.maf
tcga_coad_somuts_TCGA-AZ-6607.maf
tcga_coad_somuts_TCGA-CA-5796.maf
tcga_coad_somuts_TCGA-CA-5797.maf
tcga_coad_somuts_TCGA-CA-6716.maf
tcga_coad_somuts_TCGA-CA-6717.maf
tcga_coad_somuts_TCGA-CA-6718.maf
tcga_coad_somuts_TCGA-CA-6719.maf
tcga_coad_somuts_TCGA-CK-4947.maf
tcga_coad_somuts_TCGA-CK-4948.maf
tcga_coad_somuts_TCGA-CK-4950.maf
tcga_coad_somuts_TCGA-CK-4952.maf
tcga_coad_somuts_TCGA-CK-5912.maf
tcga_coad_somuts_TCGA-CK-5913.maf
tcga_coad_somuts_TCGA-CK-5914.maf
tcga_coad_somuts_TCGA-CK-5915.maf
tcga_coad_somuts_TCGA-CK-5916.maf
tcga_coad_somuts_TCGA-CM-4743.maf
tcga_coad_somuts_TCGA-CM-4744.maf
tcga_coad_somuts_TCGA-CM-4746.maf
tcga_coad_somuts_TCGA-CM-4747.maf
tcga_coad_somuts_TCGA-CM-4748.maf
tcga_coad_somuts_TCGA-CM-4750.maf
tcga_coad_somuts_TCGA-CM-4752.maf
tcga_coad_somuts_TCGA-CM-5341.maf
tcga_coad_somuts_TCGA-CM-5344.maf
tcga_coad_somuts_TCGA-CM-5348.maf
tcga_coad_somuts_TCGA-CM-5349.maf
tcga_coad_somuts_TCGA-CM-5860.maf
tcga_coad_somuts_TCGA-CM-5861.maf
tcga_coad_somuts_TCGA-CM-5862.maf
tcga_coad_somuts_TCGA-CM-5863.maf
tcga_coad_somuts_TCGA-CM-5864.maf
tcga_coad_somuts_TCGA-CM-5868.maf
tcga_coad_somuts_TCGA-CM-6161.maf
tcga_coad_somuts_TCGA-CM-6162.maf
tcga_coad_somuts_TCGA-CM-6163.maf
tcga_coad_somuts_TCGA-CM-6164.maf
tcga_coad_somuts_TCGA-CM-6165.maf
tcga_coad_somuts_TCGA-CM-6166.maf
tcga_coad_somuts_TCGA-CM-6168.maf
tcga_coad_somuts_TCGA-CM-6169.maf
tcga_coad_somuts_TCGA-CM-6170.maf
tcga_coad_somuts_TCGA-CM-6171.maf
tcga_coad_somuts_TCGA-CM-6172.maf
tcga_coad_somuts_TCGA-CM-6674.maf
tcga_coad_somuts_TCGA-CM-6675.maf
tcga_coad_somuts_TCGA-CM-6676.maf
tcga_coad_somuts_TCGA-CM-6677.maf
tcga_coad_somuts_TCGA-CM-6678.maf
tcga_coad_somuts_TCGA-CM-6679.maf
tcga_coad_somuts_TCGA-CM-6680.maf
tcga_coad_somuts_TCGA-D5-5537.maf
tcga_coad_somuts_TCGA-D5-5538.maf
tcga_coad_somuts_TCGA-D5-5539.maf
tcga_coad_somuts_TCGA-D5-5540.maf
tcga_coad_somuts_TCGA-D5-5541.maf
tcga_coad_somuts_TCGA-D5-6529.maf
tcga_coad_somuts_TCGA-D5-6531.maf
tcga_coad_somuts_TCGA-D5-6532.maf
tcga_coad_somuts_TCGA-D5-6533.maf
tcga_coad_somuts_TCGA-D5-6534.maf
tcga_coad_somuts_TCGA-D5-6535.maf
tcga_coad_somuts_TCGA-D5-6536.maf
tcga_coad_somuts_TCGA-D5-6537.maf
tcga_coad_somuts_TCGA-D5-6538.maf
tcga_coad_somuts_TCGA-D5-6539.maf
tcga_coad_somuts_TCGA-D5-6540.maf
tcga_coad_somuts_TCGA-D5-6541.maf
tcga_coad_somuts_TCGA-D5-6898.maf
tcga_coad_somuts_TCGA-D5-6920.maf
tcga_coad_somuts_TCGA-D5-6922.maf
tcga_coad_somuts_TCGA-D5-6924.maf
tcga_coad_somuts_TCGA-D5-6926.maf
tcga_coad_somuts_TCGA-D5-6927.maf
tcga_coad_somuts_TCGA-D5-6928.maf
tcga_coad_somuts_TCGA-D5-6929.maf
tcga_coad_somuts_TCGA-D5-6930.maf
tcga_coad_somuts_TCGA-D5-6931.maf
tcga_coad_somuts_TCGA-D5-6932.maf
tcga_coad_somuts_TCGA-D5-7000.maf
tcga_coad_somuts_TCGA-DM-A0X9.maf
tcga_coad_somuts_TCGA-DM-A0XD.maf
tcga_coad_somuts_TCGA-DM-A0XF.maf
tcga_coad_somuts_TCGA-DM-A1D0.maf
tcga_coad_somuts_TCGA-DM-A1D4.maf
tcga_coad_somuts_TCGA-DM-A1D6.maf
tcga_coad_somuts_TCGA-DM-A1D7.maf
tcga_coad_somuts_TCGA-DM-A1D8.maf
tcga_coad_somuts_TCGA-DM-A1D9.maf
tcga_coad_somuts_TCGA-DM-A1DA.maf
tcga_coad_somuts_TCGA-DM-A1DB.maf
tcga_coad_somuts_TCGA-DM-A1HA.maf
tcga_coad_somuts_TCGA-DM-A282.maf
tcga_coad_somuts_TCGA-DM-A285.maf
tcga_coad_somuts_TCGA-DM-A28C.maf
tcga_coad_somuts_TCGA-DM-A28F.maf
tcga_coad_somuts_TCGA-DM-A28G.maf
tcga_coad_somuts_TCGA-DM-A28H.maf
tcga_coad_somuts_TCGA-DM-A28K.maf
tcga_coad_somuts_TCGA-DM-A28M.maf
tcga_coad_somuts_TCGA-F4-6459.maf
tcga_coad_somuts_TCGA-F4-6460.maf
tcga_coad_somuts_TCGA-F4-6461.maf
tcga_coad_somuts_TCGA-F4-6463.maf
tcga_coad_somuts_TCGA-F4-6569.maf
tcga_coad_somuts_TCGA-F4-6570.maf
tcga_coad_somuts_TCGA-F4-6703.maf
tcga_coad_somuts_TCGA-F4-6704.maf
tcga_coad_somuts_TCGA-F4-6805.maf
tcga_coad_somuts_TCGA-F4-6806.maf
tcga_coad_somuts_TCGA-F4-6807.maf
tcga_coad_somuts_TCGA-F4-6808.maf
tcga_coad_somuts_TCGA-F4-6809.maf
tcga_coad_somuts_TCGA-F4-6854.maf
tcga_coad_somuts_TCGA-F4-6855.maf
tcga_coad_somuts_TCGA-F4-6856.maf
tcga_coad_somuts_TCGA-G4-6293.maf
tcga_coad_somuts_TCGA-G4-6294.maf
tcga_coad_somuts_TCGA-G4-6295.maf
tcga_coad_somuts_TCGA-G4-6297.maf
tcga_coad_somuts_TCGA-G4-6298.maf
tcga_coad_somuts_TCGA-G4-6299.maf
tcga_coad_somuts_TCGA-G4-6302.maf
tcga_coad_somuts_TCGA-G4-6303.maf
tcga_coad_somuts_TCGA-G4-6304.maf
tcga_coad_somuts_TCGA-G4-6306.maf
tcga_coad_somuts_TCGA-G4-6307.maf
tcga_coad_somuts_TCGA-G4-6309.maf
tcga_coad_somuts_TCGA-G4-6310.maf
tcga_coad_somuts_TCGA-G4-6311.maf
tcga_coad_somuts_TCGA-G4-6314.maf
tcga_coad_somuts_TCGA-G4-6315.maf
tcga_coad_somuts_TCGA-G4-6317.maf
tcga_coad_somuts_TCGA-G4-6320.maf
tcga_coad_somuts_TCGA-G4-6321.maf
tcga_coad_somuts_TCGA-G4-6322.maf
tcga_coad_somuts_TCGA-G4-6323.maf
tcga_coad_somuts_TCGA-G4-6586.maf
tcga_coad_somuts_TCGA-G4-6588.maf
tcga_coad_somuts_TCGA-G4-6625.maf
tcga_coad_somuts_TCGA-G4-6626.maf
tcga_coad_somuts_TCGA-G4-6628.maf
In [19]:
df = pd.read_table("patient/tcga_coad_somuts_TCGA-A6-2671.maf")
df
Out[19]:
Hugo_Symbol Entrez_Gene_Id Center Ncbi_Build Chrom Start_Position End_Position Strand Variant_Classification Variant_Type ... Validation_Method Score Bam_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID File_Name Archive_Name Line_Number Patient
0 BRE 9577 hgsc.bcm.edu 37 2 28464192 28464192 + Silent SNP ... none . . Illumina HiSeq 98a2eb02-7bcf-4134-ab7e-943391710e98 03be77e5-a923-4d2a-b134-ed11e2ff1190 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 10879 TCGA-A6-2671
1 EMR3 84658 hgsc.bcm.edu 37 19 14749131 14749131 + Missense_Mutation SNP ... none . . Illumina HiSeq 98a2eb02-7bcf-4134-ab7e-943391710e98 03be77e5-a923-4d2a-b134-ed11e2ff1190 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 30975 TCGA-A6-2671
2 FCRL2 127943 hgsc.bcm.edu 37 1 157737148 157737148 + Silent SNP ... none . . Illumina HiSeq 98a2eb02-7bcf-4134-ab7e-943391710e98 03be77e5-a923-4d2a-b134-ed11e2ff1190 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 35669 TCGA-A6-2671
3 NEFH 4744 hgsc.bcm.edu 37 22 29885567 29885567 + Silent SNP ... none . . Illumina HiSeq 98a2eb02-7bcf-4134-ab7e-943391710e98 03be77e5-a923-4d2a-b134-ed11e2ff1190 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 64467 TCGA-A6-2671
4 NEFH 4744 hgsc.bcm.edu 37 22 29885594 29885594 + Silent SNP ... none . . Illumina HiSeq 98a2eb02-7bcf-4134-ab7e-943391710e98 03be77e5-a923-4d2a-b134-ed11e2ff1190 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 64552 TCGA-A6-2671
5 TTN 7273 hgsc.bcm.edu 37 2 179611406 179611406 + Nonsense_Mutation SNP ... none . . Illumina HiSeq 98a2eb02-7bcf-4134-ab7e-943391710e98 03be77e5-a923-4d2a-b134-ed11e2ff1190 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 104304 TCGA-A6-2671

6 rows × 38 columns

In [20]:
df = pd.read_table("patient/tcga_coad_somuts_TCGA-A6-6653.maf")
df
Out[20]:
Hugo_Symbol Entrez_Gene_Id Center Ncbi_Build Chrom Start_Position End_Position Strand Variant_Classification Variant_Type ... Validation_Method Score Bam_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID File_Name Archive_Name Line_Number Patient
0 ADTRP 0 hgsc.bcm.edu 37 6 11766527 11766527 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 2779 TCGA-A6-6653
1 ANAPC5 51433 hgsc.bcm.edu 37 12 121764935 121764935 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 4413 TCGA-A6-6653
2 ANXA11 311 hgsc.bcm.edu 37 10 81917431 81917431 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 5272 TCGA-A6-6653
3 ATP11A 23250 hgsc.bcm.edu 37 13 113514616 113514616 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 8226 TCGA-A6-6653
4 BRAF 673 hgsc.bcm.edu 37 7 140453136 140453136 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 10691 TCGA-A6-6653
5 C12orf51 283450 hgsc.bcm.edu 37 12 112607395 112607395 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 11718 TCGA-A6-6653
6 CAB39L 81617 hgsc.bcm.edu 37 13 49951208 49951208 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 14146 TCGA-A6-6653
7 CARS2 79587 hgsc.bcm.edu 37 13 111329326 111329326 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 15266 TCGA-A6-6653
8 CCDC77 84318 hgsc.bcm.edu 37 12 520944 520944 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 16497 TCGA-A6-6653
9 CDK5RAP2 55755 hgsc.bcm.edu 37 9 123215894 123215894 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 18039 TCGA-A6-6653
10 CHPF2 0 hgsc.bcm.edu 37 7 150932583 150932583 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 19424 TCGA-A6-6653
11 CLASP1 23332 hgsc.bcm.edu 37 2 122187744 122187744 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 19933 TCGA-A6-6653
12 CLMN 79789 hgsc.bcm.edu 37 14 95669909 95669909 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 20345 TCGA-A6-6653
13 COL6A3 1293 hgsc.bcm.edu 37 2 238283448 238283448 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 21878 TCGA-A6-6653
14 COL9A2 1298 hgsc.bcm.edu 37 1 40773119 40773119 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 22032 TCGA-A6-6653
15 DCLK3 85443 hgsc.bcm.edu 37 3 36778711 36778711 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 25371 TCGA-A6-6653
16 E2F7 144455 hgsc.bcm.edu 37 12 77417804 77417804 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 29690 TCGA-A6-6653
17 EGFR 1956 hgsc.bcm.edu 37 7 55224338 55224338 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 30246 TCGA-A6-6653
18 ELMOD3 84173 hgsc.bcm.edu 37 2 85598245 85598245 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 30754 TCGA-A6-6653
19 ENPP6 133121 hgsc.bcm.edu 37 4 185138760 185138760 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 31139 TCGA-A6-6653
20 EPB41L1 2036 hgsc.bcm.edu 37 20 34761778 34761778 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 31294 TCGA-A6-6653
21 EXOC3L2 90332 hgsc.bcm.edu 37 19 45731019 45731019 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 32451 TCGA-A6-6653
22 FBXL20 84961 hgsc.bcm.edu 37 17 37420535 37420535 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 35245 TCGA-A6-6653
23 FLG 2312 hgsc.bcm.edu 37 1 152282228 152282228 + Nonsense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 36411 TCGA-A6-6653
24 FYCO1 79443 hgsc.bcm.edu 37 3 46010078 46010078 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 37900 TCGA-A6-6653
25 HEATR2 54919 hgsc.bcm.edu 37 7 810217 810217 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 42884 TCGA-A6-6653
26 ITPKB 3707 hgsc.bcm.edu 37 1 226924291 226924291 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 47883 TCGA-A6-6653
27 KEL 3792 hgsc.bcm.edu 37 7 142640916 142640916 + Nonsense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 49644 TCGA-A6-6653
28 KIAA1549 57670 hgsc.bcm.edu 37 7 138564331 138564331 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 50490 TCGA-A6-6653
29 NLRP11 204801 hgsc.bcm.edu 37 19 56320376 56320376 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 65599 TCGA-A6-6653
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
35 PCDHA12 56137 hgsc.bcm.edu 37 5 140256920 140256920 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 71962 TCGA-A6-6653
36 PROX1 5629 hgsc.bcm.edu 37 1 214170717 214170717 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 79051 TCGA-A6-6653
37 PTPRZ1 5803 hgsc.bcm.edu 37 7 121653419 121653419 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 80729 TCGA-A6-6653
38 RAG1 5896 hgsc.bcm.edu 37 11 36595451 36595451 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 81561 TCGA-A6-6653
39 RANGAP1 5905 hgsc.bcm.edu 37 22 41652732 41652732 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 81767 TCGA-A6-6653
40 RHO 6010 hgsc.bcm.edu 37 3 129247785 129247785 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 83417 TCGA-A6-6653
41 RNF213 57674 hgsc.bcm.edu 37 17 78346491 78346491 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 84105 TCGA-A6-6653
42 RYR1 6261 hgsc.bcm.edu 37 19 38942433 38942433 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 85467 TCGA-A6-6653
43 SCN2A 6326 hgsc.bcm.edu 37 2 166245158 166245158 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 86578 TCGA-A6-6653
44 SEC16A 9919 hgsc.bcm.edu 37 9 139354291 139354291 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 87111 TCGA-A6-6653
45 SESN2 83667 hgsc.bcm.edu 37 1 28599898 28599898 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 87801 TCGA-A6-6653
46 SLC13A2 9058 hgsc.bcm.edu 37 17 26817481 26817481 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 89372 TCGA-A6-6653
47 SLC6A15 55117 hgsc.bcm.edu 37 12 85255750 85255750 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 91105 TCGA-A6-6653
48 SLC6A19 340024 hgsc.bcm.edu 37 5 1201813 1201813 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 91142 TCGA-A6-6653
49 SLC7A2 6542 hgsc.bcm.edu 37 8 17401161 17401161 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 91252 TCGA-A6-6653
50 SMARCE1 6605 hgsc.bcm.edu 37 17 38792665 38792665 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 92011 TCGA-A6-6653
51 SMG6 23293 hgsc.bcm.edu 37 17 2203354 2203354 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 92178 TCGA-A6-6653
52 TATDN2 9797 hgsc.bcm.edu 37 3 10320651 10320651 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 96994 TCGA-A6-6653
53 TFIP11 24144 hgsc.bcm.edu 37 22 26888033 26888033 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 98464 TCGA-A6-6653
54 TGM3 7053 hgsc.bcm.edu 37 20 2297850 2297850 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 98634 TCGA-A6-6653
55 TMEM130 222865 hgsc.bcm.edu 37 7 98457888 98457888 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 99785 TCGA-A6-6653
56 TMEM8A 58986 hgsc.bcm.edu 37 16 427080 427080 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 100439 TCGA-A6-6653
57 TNFRSF9 3604 hgsc.bcm.edu 37 1 7995117 7995117 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 100888 TCGA-A6-6653
58 TRPM4 54795 hgsc.bcm.edu 37 19 49692315 49692315 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 102978 TCGA-A6-6653
59 TSC22D4 81628 hgsc.bcm.edu 37 7 100074938 100074938 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 103228 TCGA-A6-6653
60 TTN 7273 hgsc.bcm.edu 37 2 179464365 179464365 + Missense_Mutation SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 104116 TCGA-A6-6653
61 USP26 83844 hgsc.bcm.edu 37 X 132160291 132160291 + Missense_Mutation SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 106224 TCGA-A6-6653
62 WFIKKN2 124857 hgsc.bcm.edu 37 17 48918218 48918218 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 108388 TCGA-A6-6653
63 WFS1 7466 hgsc.bcm.edu 37 4 6303727 6303727 + Silent SNP ... none . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 108401 TCGA-A6-6653
64 ZNF217 7764 hgsc.bcm.edu 37 20 52193041 52193041 + Silent SNP ... Illumina . . Illumina HiSeq 5682a4fe-0500-42a7-9dff-de34773d728c 831f15d4-35c1-4352-9e8c-c19247abc7d6 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 111090 TCGA-A6-6653

65 rows × 38 columns

In [21]:
df = pd.read_table("patient/tcga_coad_somuts_TCGA-G4-6625.maf")
df
Out[21]:
Hugo_Symbol Entrez_Gene_Id Center Ncbi_Build Chrom Start_Position End_Position Strand Variant_Classification Variant_Type ... Validation_Method Score Bam_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID File_Name Archive_Name Line_Number Patient
0 BAGE 574 hgsc.bcm.edu 37 21 11097574 11097574 + Missense_Mutation SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 9424 TCGA-G4-6625
1 CASZ1 54897 hgsc.bcm.edu 37 1 10713765 10713765 + Silent SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 15476 TCGA-G4-6625
2 CBX8 57332 hgsc.bcm.edu 37 17 77768947 77768947 + Silent SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 15654 TCGA-G4-6625
3 KRAS 3845 hgsc.bcm.edu 37 12 25398285 25398285 + Missense_Mutation SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 52052 TCGA-G4-6625
4 RALY 22913 hgsc.bcm.edu 37 20 32664866 32664866 + Missense_Mutation SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 81682 TCGA-G4-6625
5 SLC35E4 339665 hgsc.bcm.edu 37 22 31032920 31032920 + Silent SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 90439 TCGA-G4-6625
6 TP53 7157 hgsc.bcm.edu 37 17 7578190 7578190 + Missense_Mutation SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 101567 TCGA-G4-6625
7 ZNF579 163033 hgsc.bcm.edu 37 19 56090076 56090076 + Silent SNP ... none . . Illumina HiSeq 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf b865dc34-9bb7-425a-ad9e-4838cce2e088 hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 112735 TCGA-G4-6625

8 rows × 38 columns


We can see the number of mutations are different among the patients.


댓글