티스토리 뷰
Group the TCGA somatic mutation file according to each patient¶
- I use COAD(colon adenocarcinoma) data here.
- The file format is MAF(mutation annotation format).
- The MAF file contains 'Tumor_Sample_Barcode' column.
The barcode is represented as follows:
- SiteID (or TSS) means "tissue source site", which has the information about hospitals or institutes and related study name.
(Refer to the "Code Tables Report" for more details.) - A patient (or participant) is identified by the first three words: TCGA-02-0001 in the above example.
- Refer to the "TCGA barcode" page for more details.
In [1]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
In [2]:
print( "pandas: %s"%pd.__version__ )
print( "numpy: %s"%np.__version__ )
pandas: 0.17.0 numpy: 1.10.1
In [3]:
df = pd.read_table("hgsc.bcm.edu__Illumina_Genome_Analyzer_DNA_Sequencing_level2.maf")
In [4]:
df.columns
Out[4]:
Index([u'Hugo_Symbol', u'Entrez_Gene_Id', u'Center', u'Ncbi_Build', u'Chrom', u'Start_Position', u'End_Position', u'Strand', u'Variant_Classification', u'Variant_Type', u'Reference_Allele', u'Tumor_Seq_Allele1', u'Tumor_Seq_Allele2', u'Dbsnp_Rs', u'Dbsnp_Val_Status', u'Tumor_Sample_Barcode', u'Matched_Norm_Sample_Barcode', u'Match_Norm_Seq_Allele1', u'Match_Norm_Seq_Allele2', u'Tumor_Validation_Allele1', u'Tumor_Validation_Allele2', u'Match_Norm_Validation_Allele1', u'Match_Norm_Validation_Allele2', u'Verification_Status', u'Validation_Status', u'Mutation_Status', u'Sequencing_Phase', u'Sequence_Source', u'Validation_Method', u'Score', u'Bam_File', u'Sequencer', u'Tumor_Sample_UUID', u'Matched_Norm_Sample_UUID', u'File_Name', u'Archive_Name', u'Line_Number'], dtype='object')
We need mis-sense mutations only. So, drop the rows which contain nucleotide insertion or deletion.¶
In [5]:
df.replace('-', np.nan, inplace=True)
C:\Users\dwlee\Anaconda\envs\py27\lib\site-packages\pandas\core\common.py:449: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison mask = arr == x
In [6]:
df.dropna(how='any', inplace=True)
In [7]:
(df == np.nan).any()
Out[7]:
Hugo_Symbol False Entrez_Gene_Id False Center False Ncbi_Build False Chrom False Start_Position False End_Position False Strand False Variant_Classification False Variant_Type False Reference_Allele False Tumor_Seq_Allele1 False Tumor_Seq_Allele2 False Dbsnp_Rs False Dbsnp_Val_Status False Tumor_Sample_Barcode False Matched_Norm_Sample_Barcode False Match_Norm_Seq_Allele1 False Match_Norm_Seq_Allele2 False Tumor_Validation_Allele1 False Tumor_Validation_Allele2 False Match_Norm_Validation_Allele1 False Match_Norm_Validation_Allele2 False Verification_Status False Validation_Status False Mutation_Status False Sequencing_Phase False Sequence_Source False Validation_Method False Score False Bam_File False Sequencer False Tumor_Sample_UUID False Matched_Norm_Sample_UUID False File_Name False Archive_Name False Line_Number False dtype: bool
In [8]:
import re
p = re.compile(r'(TCGA-\w+?-\w+?)-')
def find_patient(barcode):
m = p.search(barcode)
if m is not None:
return m.group(1)
else:
np.nan
# end of def
list_patients = df.Tumor_Sample_Barcode.apply(find_patient)
print list_patients
9 TCGA-AM-5820 20 TCGA-AM-5820 40 TCGA-AM-5820 44 TCGA-AM-5820 53 TCGA-AM-5820 55 TCGA-AM-5820 61 TCGA-AM-5821 63 TCGA-CA-6718 65 TCGA-AM-5821 69 TCGA-AM-5821 80 TCGA-A6-6140 81 TCGA-AM-5820 84 TCGA-CM-6674 87 TCGA-F4-6856 92 TCGA-D5-6930 93 TCGA-D5-6930 99 TCGA-AM-5821 100 TCGA-AM-5820 103 TCGA-AM-5821 113 TCGA-AM-5820 127 TCGA-AD-6889 142 TCGA-AM-5821 146 TCGA-AM-5821 149 TCGA-AM-5821 154 TCGA-AZ-5403 160 TCGA-AM-5821 165 TCGA-CK-4950 166 TCGA-G4-6298 171 TCGA-AA-3697 172 TCGA-AA-3712 ... 114324 TCGA-AM-5820 114332 TCGA-AM-5821 114333 TCGA-AM-5821 114334 TCGA-AM-5821 114335 TCGA-AM-5821 114336 TCGA-AM-5821 114337 TCGA-AM-5821 114342 TCGA-AM-5821 114353 TCGA-AM-5821 114360 TCGA-CK-4952 114365 TCGA-AM-5820 114369 TCGA-AA-3662 114370 TCGA-AA-3713 114371 TCGA-AM-5821 114372 TCGA-AY-6386 114373 TCGA-D5-6926 114387 TCGA-AM-5821 114402 TCGA-AM-5821 114403 TCGA-AM-5821 114409 TCGA-AY-6386 114434 TCGA-G4-6588 114443 TCGA-AZ-6601 114444 TCGA-DM-A0XD 114448 TCGA-AZ-6598 114458 TCGA-AM-5820 114460 TCGA-AM-5820 114461 TCGA-CM-6680 114462 TCGA-AM-5820 114464 TCGA-AM-5820 114466 TCGA-AU-6004 Name: Tumor_Sample_Barcode, dtype: object
In [9]:
num_patient_data = len(list_patients)
num_patients = len(set(list_patients))
print "Number of barcodes for patients", num_patient_data
print "Number of patients: ", num_patients
print "Is there any redundant information for a single patient: ", num_patient_data != num_patients
Number of barcodes for patients 22167 Number of patients: 214 Is there any redundant information for a single patient: True
In [10]:
df['Patient'] = list_patients
In [11]:
df[['Hugo_Symbol', 'Patient']][:10]
Out[11]:
Hugo_Symbol | Patient | |
---|---|---|
9 | A1CF | TCGA-AM-5820 |
20 | A2M | TCGA-AM-5820 |
40 | A2ML1 | TCGA-AM-5820 |
44 | A2ML1 | TCGA-AM-5820 |
53 | A2ML1 | TCGA-AM-5820 |
55 | A2ML1 | TCGA-AM-5820 |
61 | A4GALT | TCGA-AM-5821 |
63 | A4GALT | TCGA-CA-6718 |
65 | A4GALT | TCGA-AM-5821 |
69 | A4GNT | TCGA-AM-5821 |
In [12]:
gb = df.groupby('Patient')
In [13]:
gb['Hugo_Symbol'].count()
Out[13]:
Patient TCGA-A6-2671 6 TCGA-A6-2675 8 TCGA-A6-4105 8 TCGA-A6-5656 14 TCGA-A6-5657 3 TCGA-A6-5659 16 TCGA-A6-5660 17 TCGA-A6-5661 65 TCGA-A6-5662 6 TCGA-A6-5664 14 TCGA-A6-5665 70 TCGA-A6-5666 9 TCGA-A6-5667 111 TCGA-A6-6137 11 TCGA-A6-6138 8 TCGA-A6-6140 389 TCGA-A6-6141 120 TCGA-A6-6142 5 TCGA-A6-6648 7 TCGA-A6-6649 13 TCGA-A6-6650 27 TCGA-A6-6651 26 TCGA-A6-6652 9 TCGA-A6-6653 65 TCGA-A6-6654 17 TCGA-A6-6780 74 TCGA-A6-6781 119 TCGA-A6-6782 24 TCGA-AA-3489 9 TCGA-AA-3492 93 ... TCGA-F4-6809 12 TCGA-F4-6854 16 TCGA-F4-6855 44 TCGA-F4-6856 70 TCGA-G4-6293 69 TCGA-G4-6294 27 TCGA-G4-6295 8 TCGA-G4-6297 158 TCGA-G4-6298 215 TCGA-G4-6299 11 TCGA-G4-6302 233 TCGA-G4-6303 12 TCGA-G4-6304 21 TCGA-G4-6306 7 TCGA-G4-6307 6 TCGA-G4-6309 40 TCGA-G4-6310 11 TCGA-G4-6311 165 TCGA-G4-6314 11 TCGA-G4-6315 12 TCGA-G4-6317 13 TCGA-G4-6320 71 TCGA-G4-6321 274 TCGA-G4-6322 9 TCGA-G4-6323 9 TCGA-G4-6586 56 TCGA-G4-6588 91 TCGA-G4-6625 8 TCGA-G4-6626 17 TCGA-G4-6628 75 Name: Hugo_Symbol, dtype: int64
In [14]:
gb['Hugo_Symbol'].describe()
Out[14]:
Patient TCGA-A6-2671 count 6 unique 5 top NEFH freq 2 TCGA-A6-2675 count 8 unique 8 top PSKH2 freq 1 TCGA-A6-4105 count 8 unique 7 top IRF5 freq 2 TCGA-A6-5656 count 14 unique 13 top THAP8 freq 2 TCGA-A6-5657 count 3 unique 3 top KRAS freq 1 TCGA-A6-5659 count 16 unique 16 top KRAS freq 1 TCGA-A6-5660 count 17 unique 17 top CACNA1B freq 1 TCGA-A6-5661 count 65 unique 65 ... TCGA-G4-6321 top SSPO freq 3 TCGA-G4-6322 count 9 unique 9 top MASP2 freq 1 TCGA-G4-6323 count 9 unique 9 top KRAS freq 1 TCGA-G4-6586 count 56 unique 54 top DYRK1B freq 2 TCGA-G4-6588 count 91 unique 91 top SLCO6A1 freq 1 TCGA-G4-6625 count 8 unique 8 top KRAS freq 1 TCGA-G4-6626 count 17 unique 17 top ZNF99 freq 1 TCGA-G4-6628 count 75 unique 74 top TTN freq 2 dtype: object
Now, we make a sub directory, named "patient", and save DataFrames for each patient in the MAF file format.¶
In [15]:
!mkdir patient
In [16]:
!ls
candra_input_tcga_coad.txt group_tcga_somuts_each_patient.ipynb hgsc.bcm.edu__Illumina_Genome_Analyzer_DNA_Sequencing_level2.maf parse_tcga_somuts.html parse_tcga_somuts.ipynb patient
In [17]:
for name, group in gb:
group.to_csv("patient/tcga_coad_somuts_%s.maf"%(name), sep='\t', index=False)
In [18]:
!ls patient
tcga_coad_somuts_TCGA-A6-2671.maf tcga_coad_somuts_TCGA-A6-2675.maf tcga_coad_somuts_TCGA-A6-4105.maf tcga_coad_somuts_TCGA-A6-5656.maf tcga_coad_somuts_TCGA-A6-5657.maf tcga_coad_somuts_TCGA-A6-5659.maf tcga_coad_somuts_TCGA-A6-5660.maf tcga_coad_somuts_TCGA-A6-5661.maf tcga_coad_somuts_TCGA-A6-5662.maf tcga_coad_somuts_TCGA-A6-5664.maf tcga_coad_somuts_TCGA-A6-5665.maf tcga_coad_somuts_TCGA-A6-5666.maf tcga_coad_somuts_TCGA-A6-5667.maf tcga_coad_somuts_TCGA-A6-6137.maf tcga_coad_somuts_TCGA-A6-6138.maf tcga_coad_somuts_TCGA-A6-6140.maf tcga_coad_somuts_TCGA-A6-6141.maf tcga_coad_somuts_TCGA-A6-6142.maf tcga_coad_somuts_TCGA-A6-6648.maf tcga_coad_somuts_TCGA-A6-6649.maf tcga_coad_somuts_TCGA-A6-6650.maf tcga_coad_somuts_TCGA-A6-6651.maf tcga_coad_somuts_TCGA-A6-6652.maf tcga_coad_somuts_TCGA-A6-6653.maf tcga_coad_somuts_TCGA-A6-6654.maf tcga_coad_somuts_TCGA-A6-6780.maf tcga_coad_somuts_TCGA-A6-6781.maf tcga_coad_somuts_TCGA-A6-6782.maf tcga_coad_somuts_TCGA-AA-3489.maf tcga_coad_somuts_TCGA-AA-3492.maf tcga_coad_somuts_TCGA-AA-3502.maf tcga_coad_somuts_TCGA-AA-3510.maf tcga_coad_somuts_TCGA-AA-3511.maf tcga_coad_somuts_TCGA-AA-3655.maf tcga_coad_somuts_TCGA-AA-3660.maf tcga_coad_somuts_TCGA-AA-3662.maf tcga_coad_somuts_TCGA-AA-3663.maf tcga_coad_somuts_TCGA-AA-3697.maf tcga_coad_somuts_TCGA-AA-3712.maf tcga_coad_somuts_TCGA-AA-3713.maf tcga_coad_somuts_TCGA-AD-5900.maf tcga_coad_somuts_TCGA-AD-6548.maf tcga_coad_somuts_TCGA-AD-6888.maf tcga_coad_somuts_TCGA-AD-6889.maf tcga_coad_somuts_TCGA-AD-6890.maf tcga_coad_somuts_TCGA-AD-6895.maf tcga_coad_somuts_TCGA-AD-6899.maf tcga_coad_somuts_TCGA-AD-6901.maf tcga_coad_somuts_TCGA-AD-6963.maf tcga_coad_somuts_TCGA-AD-6964.maf tcga_coad_somuts_TCGA-AD-6965.maf tcga_coad_somuts_TCGA-AM-5820.maf tcga_coad_somuts_TCGA-AM-5821.maf tcga_coad_somuts_TCGA-AU-3779.maf tcga_coad_somuts_TCGA-AU-6004.maf tcga_coad_somuts_TCGA-AY-5543.maf tcga_coad_somuts_TCGA-AY-6196.maf tcga_coad_somuts_TCGA-AY-6197.maf tcga_coad_somuts_TCGA-AY-6386.maf tcga_coad_somuts_TCGA-AZ-4315.maf tcga_coad_somuts_TCGA-AZ-4323.maf tcga_coad_somuts_TCGA-AZ-4615.maf tcga_coad_somuts_TCGA-AZ-4616.maf tcga_coad_somuts_TCGA-AZ-4682.maf tcga_coad_somuts_TCGA-AZ-5403.maf tcga_coad_somuts_TCGA-AZ-5407.maf tcga_coad_somuts_TCGA-AZ-6598.maf tcga_coad_somuts_TCGA-AZ-6599.maf tcga_coad_somuts_TCGA-AZ-6600.maf tcga_coad_somuts_TCGA-AZ-6601.maf tcga_coad_somuts_TCGA-AZ-6603.maf tcga_coad_somuts_TCGA-AZ-6605.maf tcga_coad_somuts_TCGA-AZ-6607.maf tcga_coad_somuts_TCGA-CA-5796.maf tcga_coad_somuts_TCGA-CA-5797.maf tcga_coad_somuts_TCGA-CA-6716.maf tcga_coad_somuts_TCGA-CA-6717.maf tcga_coad_somuts_TCGA-CA-6718.maf tcga_coad_somuts_TCGA-CA-6719.maf tcga_coad_somuts_TCGA-CK-4947.maf tcga_coad_somuts_TCGA-CK-4948.maf tcga_coad_somuts_TCGA-CK-4950.maf tcga_coad_somuts_TCGA-CK-4952.maf tcga_coad_somuts_TCGA-CK-5912.maf tcga_coad_somuts_TCGA-CK-5913.maf tcga_coad_somuts_TCGA-CK-5914.maf tcga_coad_somuts_TCGA-CK-5915.maf tcga_coad_somuts_TCGA-CK-5916.maf tcga_coad_somuts_TCGA-CM-4743.maf tcga_coad_somuts_TCGA-CM-4744.maf tcga_coad_somuts_TCGA-CM-4746.maf tcga_coad_somuts_TCGA-CM-4747.maf tcga_coad_somuts_TCGA-CM-4748.maf tcga_coad_somuts_TCGA-CM-4750.maf tcga_coad_somuts_TCGA-CM-4752.maf tcga_coad_somuts_TCGA-CM-5341.maf tcga_coad_somuts_TCGA-CM-5344.maf tcga_coad_somuts_TCGA-CM-5348.maf tcga_coad_somuts_TCGA-CM-5349.maf tcga_coad_somuts_TCGA-CM-5860.maf tcga_coad_somuts_TCGA-CM-5861.maf tcga_coad_somuts_TCGA-CM-5862.maf tcga_coad_somuts_TCGA-CM-5863.maf tcga_coad_somuts_TCGA-CM-5864.maf tcga_coad_somuts_TCGA-CM-5868.maf tcga_coad_somuts_TCGA-CM-6161.maf tcga_coad_somuts_TCGA-CM-6162.maf tcga_coad_somuts_TCGA-CM-6163.maf tcga_coad_somuts_TCGA-CM-6164.maf tcga_coad_somuts_TCGA-CM-6165.maf tcga_coad_somuts_TCGA-CM-6166.maf tcga_coad_somuts_TCGA-CM-6168.maf tcga_coad_somuts_TCGA-CM-6169.maf tcga_coad_somuts_TCGA-CM-6170.maf tcga_coad_somuts_TCGA-CM-6171.maf tcga_coad_somuts_TCGA-CM-6172.maf tcga_coad_somuts_TCGA-CM-6674.maf tcga_coad_somuts_TCGA-CM-6675.maf tcga_coad_somuts_TCGA-CM-6676.maf tcga_coad_somuts_TCGA-CM-6677.maf tcga_coad_somuts_TCGA-CM-6678.maf tcga_coad_somuts_TCGA-CM-6679.maf tcga_coad_somuts_TCGA-CM-6680.maf tcga_coad_somuts_TCGA-D5-5537.maf tcga_coad_somuts_TCGA-D5-5538.maf tcga_coad_somuts_TCGA-D5-5539.maf tcga_coad_somuts_TCGA-D5-5540.maf tcga_coad_somuts_TCGA-D5-5541.maf tcga_coad_somuts_TCGA-D5-6529.maf tcga_coad_somuts_TCGA-D5-6531.maf tcga_coad_somuts_TCGA-D5-6532.maf tcga_coad_somuts_TCGA-D5-6533.maf tcga_coad_somuts_TCGA-D5-6534.maf tcga_coad_somuts_TCGA-D5-6535.maf tcga_coad_somuts_TCGA-D5-6536.maf tcga_coad_somuts_TCGA-D5-6537.maf tcga_coad_somuts_TCGA-D5-6538.maf tcga_coad_somuts_TCGA-D5-6539.maf tcga_coad_somuts_TCGA-D5-6540.maf tcga_coad_somuts_TCGA-D5-6541.maf tcga_coad_somuts_TCGA-D5-6898.maf tcga_coad_somuts_TCGA-D5-6920.maf tcga_coad_somuts_TCGA-D5-6922.maf tcga_coad_somuts_TCGA-D5-6924.maf tcga_coad_somuts_TCGA-D5-6926.maf tcga_coad_somuts_TCGA-D5-6927.maf tcga_coad_somuts_TCGA-D5-6928.maf tcga_coad_somuts_TCGA-D5-6929.maf tcga_coad_somuts_TCGA-D5-6930.maf tcga_coad_somuts_TCGA-D5-6931.maf tcga_coad_somuts_TCGA-D5-6932.maf tcga_coad_somuts_TCGA-D5-7000.maf tcga_coad_somuts_TCGA-DM-A0X9.maf tcga_coad_somuts_TCGA-DM-A0XD.maf tcga_coad_somuts_TCGA-DM-A0XF.maf tcga_coad_somuts_TCGA-DM-A1D0.maf tcga_coad_somuts_TCGA-DM-A1D4.maf tcga_coad_somuts_TCGA-DM-A1D6.maf tcga_coad_somuts_TCGA-DM-A1D7.maf tcga_coad_somuts_TCGA-DM-A1D8.maf tcga_coad_somuts_TCGA-DM-A1D9.maf tcga_coad_somuts_TCGA-DM-A1DA.maf tcga_coad_somuts_TCGA-DM-A1DB.maf tcga_coad_somuts_TCGA-DM-A1HA.maf tcga_coad_somuts_TCGA-DM-A282.maf tcga_coad_somuts_TCGA-DM-A285.maf tcga_coad_somuts_TCGA-DM-A28C.maf tcga_coad_somuts_TCGA-DM-A28F.maf tcga_coad_somuts_TCGA-DM-A28G.maf tcga_coad_somuts_TCGA-DM-A28H.maf tcga_coad_somuts_TCGA-DM-A28K.maf tcga_coad_somuts_TCGA-DM-A28M.maf tcga_coad_somuts_TCGA-F4-6459.maf tcga_coad_somuts_TCGA-F4-6460.maf tcga_coad_somuts_TCGA-F4-6461.maf tcga_coad_somuts_TCGA-F4-6463.maf tcga_coad_somuts_TCGA-F4-6569.maf tcga_coad_somuts_TCGA-F4-6570.maf tcga_coad_somuts_TCGA-F4-6703.maf tcga_coad_somuts_TCGA-F4-6704.maf tcga_coad_somuts_TCGA-F4-6805.maf tcga_coad_somuts_TCGA-F4-6806.maf tcga_coad_somuts_TCGA-F4-6807.maf tcga_coad_somuts_TCGA-F4-6808.maf tcga_coad_somuts_TCGA-F4-6809.maf tcga_coad_somuts_TCGA-F4-6854.maf tcga_coad_somuts_TCGA-F4-6855.maf tcga_coad_somuts_TCGA-F4-6856.maf tcga_coad_somuts_TCGA-G4-6293.maf tcga_coad_somuts_TCGA-G4-6294.maf tcga_coad_somuts_TCGA-G4-6295.maf tcga_coad_somuts_TCGA-G4-6297.maf tcga_coad_somuts_TCGA-G4-6298.maf tcga_coad_somuts_TCGA-G4-6299.maf tcga_coad_somuts_TCGA-G4-6302.maf tcga_coad_somuts_TCGA-G4-6303.maf tcga_coad_somuts_TCGA-G4-6304.maf tcga_coad_somuts_TCGA-G4-6306.maf tcga_coad_somuts_TCGA-G4-6307.maf tcga_coad_somuts_TCGA-G4-6309.maf tcga_coad_somuts_TCGA-G4-6310.maf tcga_coad_somuts_TCGA-G4-6311.maf tcga_coad_somuts_TCGA-G4-6314.maf tcga_coad_somuts_TCGA-G4-6315.maf tcga_coad_somuts_TCGA-G4-6317.maf tcga_coad_somuts_TCGA-G4-6320.maf tcga_coad_somuts_TCGA-G4-6321.maf tcga_coad_somuts_TCGA-G4-6322.maf tcga_coad_somuts_TCGA-G4-6323.maf tcga_coad_somuts_TCGA-G4-6586.maf tcga_coad_somuts_TCGA-G4-6588.maf tcga_coad_somuts_TCGA-G4-6625.maf tcga_coad_somuts_TCGA-G4-6626.maf tcga_coad_somuts_TCGA-G4-6628.maf
In [19]:
df = pd.read_table("patient/tcga_coad_somuts_TCGA-A6-2671.maf")
df
Out[19]:
Hugo_Symbol | Entrez_Gene_Id | Center | Ncbi_Build | Chrom | Start_Position | End_Position | Strand | Variant_Classification | Variant_Type | ... | Validation_Method | Score | Bam_File | Sequencer | Tumor_Sample_UUID | Matched_Norm_Sample_UUID | File_Name | Archive_Name | Line_Number | Patient | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | BRE | 9577 | hgsc.bcm.edu | 37 | 2 | 28464192 | 28464192 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 98a2eb02-7bcf-4134-ab7e-943391710e98 | 03be77e5-a923-4d2a-b134-ed11e2ff1190 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 10879 | TCGA-A6-2671 |
1 | EMR3 | 84658 | hgsc.bcm.edu | 37 | 19 | 14749131 | 14749131 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 98a2eb02-7bcf-4134-ab7e-943391710e98 | 03be77e5-a923-4d2a-b134-ed11e2ff1190 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 30975 | TCGA-A6-2671 |
2 | FCRL2 | 127943 | hgsc.bcm.edu | 37 | 1 | 157737148 | 157737148 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 98a2eb02-7bcf-4134-ab7e-943391710e98 | 03be77e5-a923-4d2a-b134-ed11e2ff1190 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 35669 | TCGA-A6-2671 |
3 | NEFH | 4744 | hgsc.bcm.edu | 37 | 22 | 29885567 | 29885567 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 98a2eb02-7bcf-4134-ab7e-943391710e98 | 03be77e5-a923-4d2a-b134-ed11e2ff1190 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 64467 | TCGA-A6-2671 |
4 | NEFH | 4744 | hgsc.bcm.edu | 37 | 22 | 29885594 | 29885594 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 98a2eb02-7bcf-4134-ab7e-943391710e98 | 03be77e5-a923-4d2a-b134-ed11e2ff1190 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 64552 | TCGA-A6-2671 |
5 | TTN | 7273 | hgsc.bcm.edu | 37 | 2 | 179611406 | 179611406 | + | Nonsense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 98a2eb02-7bcf-4134-ab7e-943391710e98 | 03be77e5-a923-4d2a-b134-ed11e2ff1190 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 104304 | TCGA-A6-2671 |
6 rows × 38 columns
In [20]:
df = pd.read_table("patient/tcga_coad_somuts_TCGA-A6-6653.maf")
df
Out[20]:
Hugo_Symbol | Entrez_Gene_Id | Center | Ncbi_Build | Chrom | Start_Position | End_Position | Strand | Variant_Classification | Variant_Type | ... | Validation_Method | Score | Bam_File | Sequencer | Tumor_Sample_UUID | Matched_Norm_Sample_UUID | File_Name | Archive_Name | Line_Number | Patient | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ADTRP | 0 | hgsc.bcm.edu | 37 | 6 | 11766527 | 11766527 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 2779 | TCGA-A6-6653 |
1 | ANAPC5 | 51433 | hgsc.bcm.edu | 37 | 12 | 121764935 | 121764935 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 4413 | TCGA-A6-6653 |
2 | ANXA11 | 311 | hgsc.bcm.edu | 37 | 10 | 81917431 | 81917431 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 5272 | TCGA-A6-6653 |
3 | ATP11A | 23250 | hgsc.bcm.edu | 37 | 13 | 113514616 | 113514616 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 8226 | TCGA-A6-6653 |
4 | BRAF | 673 | hgsc.bcm.edu | 37 | 7 | 140453136 | 140453136 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 10691 | TCGA-A6-6653 |
5 | C12orf51 | 283450 | hgsc.bcm.edu | 37 | 12 | 112607395 | 112607395 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 11718 | TCGA-A6-6653 |
6 | CAB39L | 81617 | hgsc.bcm.edu | 37 | 13 | 49951208 | 49951208 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 14146 | TCGA-A6-6653 |
7 | CARS2 | 79587 | hgsc.bcm.edu | 37 | 13 | 111329326 | 111329326 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 15266 | TCGA-A6-6653 |
8 | CCDC77 | 84318 | hgsc.bcm.edu | 37 | 12 | 520944 | 520944 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 16497 | TCGA-A6-6653 |
9 | CDK5RAP2 | 55755 | hgsc.bcm.edu | 37 | 9 | 123215894 | 123215894 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 18039 | TCGA-A6-6653 |
10 | CHPF2 | 0 | hgsc.bcm.edu | 37 | 7 | 150932583 | 150932583 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 19424 | TCGA-A6-6653 |
11 | CLASP1 | 23332 | hgsc.bcm.edu | 37 | 2 | 122187744 | 122187744 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 19933 | TCGA-A6-6653 |
12 | CLMN | 79789 | hgsc.bcm.edu | 37 | 14 | 95669909 | 95669909 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 20345 | TCGA-A6-6653 |
13 | COL6A3 | 1293 | hgsc.bcm.edu | 37 | 2 | 238283448 | 238283448 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 21878 | TCGA-A6-6653 |
14 | COL9A2 | 1298 | hgsc.bcm.edu | 37 | 1 | 40773119 | 40773119 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 22032 | TCGA-A6-6653 |
15 | DCLK3 | 85443 | hgsc.bcm.edu | 37 | 3 | 36778711 | 36778711 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 25371 | TCGA-A6-6653 |
16 | E2F7 | 144455 | hgsc.bcm.edu | 37 | 12 | 77417804 | 77417804 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 29690 | TCGA-A6-6653 |
17 | EGFR | 1956 | hgsc.bcm.edu | 37 | 7 | 55224338 | 55224338 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 30246 | TCGA-A6-6653 |
18 | ELMOD3 | 84173 | hgsc.bcm.edu | 37 | 2 | 85598245 | 85598245 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 30754 | TCGA-A6-6653 |
19 | ENPP6 | 133121 | hgsc.bcm.edu | 37 | 4 | 185138760 | 185138760 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 31139 | TCGA-A6-6653 |
20 | EPB41L1 | 2036 | hgsc.bcm.edu | 37 | 20 | 34761778 | 34761778 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 31294 | TCGA-A6-6653 |
21 | EXOC3L2 | 90332 | hgsc.bcm.edu | 37 | 19 | 45731019 | 45731019 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 32451 | TCGA-A6-6653 |
22 | FBXL20 | 84961 | hgsc.bcm.edu | 37 | 17 | 37420535 | 37420535 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 35245 | TCGA-A6-6653 |
23 | FLG | 2312 | hgsc.bcm.edu | 37 | 1 | 152282228 | 152282228 | + | Nonsense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 36411 | TCGA-A6-6653 |
24 | FYCO1 | 79443 | hgsc.bcm.edu | 37 | 3 | 46010078 | 46010078 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 37900 | TCGA-A6-6653 |
25 | HEATR2 | 54919 | hgsc.bcm.edu | 37 | 7 | 810217 | 810217 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 42884 | TCGA-A6-6653 |
26 | ITPKB | 3707 | hgsc.bcm.edu | 37 | 1 | 226924291 | 226924291 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 47883 | TCGA-A6-6653 |
27 | KEL | 3792 | hgsc.bcm.edu | 37 | 7 | 142640916 | 142640916 | + | Nonsense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 49644 | TCGA-A6-6653 |
28 | KIAA1549 | 57670 | hgsc.bcm.edu | 37 | 7 | 138564331 | 138564331 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 50490 | TCGA-A6-6653 |
29 | NLRP11 | 204801 | hgsc.bcm.edu | 37 | 19 | 56320376 | 56320376 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 65599 | TCGA-A6-6653 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
35 | PCDHA12 | 56137 | hgsc.bcm.edu | 37 | 5 | 140256920 | 140256920 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 71962 | TCGA-A6-6653 |
36 | PROX1 | 5629 | hgsc.bcm.edu | 37 | 1 | 214170717 | 214170717 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 79051 | TCGA-A6-6653 |
37 | PTPRZ1 | 5803 | hgsc.bcm.edu | 37 | 7 | 121653419 | 121653419 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 80729 | TCGA-A6-6653 |
38 | RAG1 | 5896 | hgsc.bcm.edu | 37 | 11 | 36595451 | 36595451 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 81561 | TCGA-A6-6653 |
39 | RANGAP1 | 5905 | hgsc.bcm.edu | 37 | 22 | 41652732 | 41652732 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 81767 | TCGA-A6-6653 |
40 | RHO | 6010 | hgsc.bcm.edu | 37 | 3 | 129247785 | 129247785 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 83417 | TCGA-A6-6653 |
41 | RNF213 | 57674 | hgsc.bcm.edu | 37 | 17 | 78346491 | 78346491 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 84105 | TCGA-A6-6653 |
42 | RYR1 | 6261 | hgsc.bcm.edu | 37 | 19 | 38942433 | 38942433 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 85467 | TCGA-A6-6653 |
43 | SCN2A | 6326 | hgsc.bcm.edu | 37 | 2 | 166245158 | 166245158 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 86578 | TCGA-A6-6653 |
44 | SEC16A | 9919 | hgsc.bcm.edu | 37 | 9 | 139354291 | 139354291 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 87111 | TCGA-A6-6653 |
45 | SESN2 | 83667 | hgsc.bcm.edu | 37 | 1 | 28599898 | 28599898 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 87801 | TCGA-A6-6653 |
46 | SLC13A2 | 9058 | hgsc.bcm.edu | 37 | 17 | 26817481 | 26817481 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 89372 | TCGA-A6-6653 |
47 | SLC6A15 | 55117 | hgsc.bcm.edu | 37 | 12 | 85255750 | 85255750 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 91105 | TCGA-A6-6653 |
48 | SLC6A19 | 340024 | hgsc.bcm.edu | 37 | 5 | 1201813 | 1201813 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 91142 | TCGA-A6-6653 |
49 | SLC7A2 | 6542 | hgsc.bcm.edu | 37 | 8 | 17401161 | 17401161 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 91252 | TCGA-A6-6653 |
50 | SMARCE1 | 6605 | hgsc.bcm.edu | 37 | 17 | 38792665 | 38792665 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 92011 | TCGA-A6-6653 |
51 | SMG6 | 23293 | hgsc.bcm.edu | 37 | 17 | 2203354 | 2203354 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 92178 | TCGA-A6-6653 |
52 | TATDN2 | 9797 | hgsc.bcm.edu | 37 | 3 | 10320651 | 10320651 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 96994 | TCGA-A6-6653 |
53 | TFIP11 | 24144 | hgsc.bcm.edu | 37 | 22 | 26888033 | 26888033 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 98464 | TCGA-A6-6653 |
54 | TGM3 | 7053 | hgsc.bcm.edu | 37 | 20 | 2297850 | 2297850 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 98634 | TCGA-A6-6653 |
55 | TMEM130 | 222865 | hgsc.bcm.edu | 37 | 7 | 98457888 | 98457888 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 99785 | TCGA-A6-6653 |
56 | TMEM8A | 58986 | hgsc.bcm.edu | 37 | 16 | 427080 | 427080 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 100439 | TCGA-A6-6653 |
57 | TNFRSF9 | 3604 | hgsc.bcm.edu | 37 | 1 | 7995117 | 7995117 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 100888 | TCGA-A6-6653 |
58 | TRPM4 | 54795 | hgsc.bcm.edu | 37 | 19 | 49692315 | 49692315 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 102978 | TCGA-A6-6653 |
59 | TSC22D4 | 81628 | hgsc.bcm.edu | 37 | 7 | 100074938 | 100074938 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 103228 | TCGA-A6-6653 |
60 | TTN | 7273 | hgsc.bcm.edu | 37 | 2 | 179464365 | 179464365 | + | Missense_Mutation | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 104116 | TCGA-A6-6653 |
61 | USP26 | 83844 | hgsc.bcm.edu | 37 | X | 132160291 | 132160291 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 106224 | TCGA-A6-6653 |
62 | WFIKKN2 | 124857 | hgsc.bcm.edu | 37 | 17 | 48918218 | 48918218 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 108388 | TCGA-A6-6653 |
63 | WFS1 | 7466 | hgsc.bcm.edu | 37 | 4 | 6303727 | 6303727 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 108401 | TCGA-A6-6653 |
64 | ZNF217 | 7764 | hgsc.bcm.edu | 37 | 20 | 52193041 | 52193041 | + | Silent | SNP | ... | Illumina | . | . | Illumina HiSeq | 5682a4fe-0500-42a7-9dff-de34773d728c | 831f15d4-35c1-4352-9e8c-c19247abc7d6 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 111090 | TCGA-A6-6653 |
65 rows × 38 columns
In [21]:
df = pd.read_table("patient/tcga_coad_somuts_TCGA-G4-6625.maf")
df
Out[21]:
Hugo_Symbol | Entrez_Gene_Id | Center | Ncbi_Build | Chrom | Start_Position | End_Position | Strand | Variant_Classification | Variant_Type | ... | Validation_Method | Score | Bam_File | Sequencer | Tumor_Sample_UUID | Matched_Norm_Sample_UUID | File_Name | Archive_Name | Line_Number | Patient | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | BAGE | 574 | hgsc.bcm.edu | 37 | 21 | 11097574 | 11097574 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 9424 | TCGA-G4-6625 |
1 | CASZ1 | 54897 | hgsc.bcm.edu | 37 | 1 | 10713765 | 10713765 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 15476 | TCGA-G4-6625 |
2 | CBX8 | 57332 | hgsc.bcm.edu | 37 | 17 | 77768947 | 77768947 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 15654 | TCGA-G4-6625 |
3 | KRAS | 3845 | hgsc.bcm.edu | 37 | 12 | 25398285 | 25398285 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 52052 | TCGA-G4-6625 |
4 | RALY | 22913 | hgsc.bcm.edu | 37 | 20 | 32664866 | 32664866 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 81682 | TCGA-G4-6625 |
5 | SLC35E4 | 339665 | hgsc.bcm.edu | 37 | 22 | 31032920 | 31032920 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 90439 | TCGA-G4-6625 |
6 | TP53 | 7157 | hgsc.bcm.edu | 37 | 17 | 7578190 | 7578190 | + | Missense_Mutation | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 101567 | TCGA-G4-6625 |
7 | ZNF579 | 163033 | hgsc.bcm.edu | 37 | 19 | 56090076 | 56090076 | + | Silent | SNP | ... | none | . | . | Illumina HiSeq | 9fe0746c-9ad2-4cea-85fd-9023d9f3a2bf | b865dc34-9bb7-425a-ad9e-4838cce2e088 | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1.somatic.maf | hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.Level_2.1.5.0 | 112735 | TCGA-G4-6625 |
8 rows × 38 columns
We can see the number of mutations are different among the patients.¶
'Python > 요리 방법' 카테고리의 다른 글
CanDrA input file 생성을 위한 TCGA somatic mutation 파일 파싱하기 (0) | 2015.12.21 |
---|---|
bisect 모듈의 insort 함수 (0) | 2015.06.19 |
Cython 간단한 예제 (0) | 2015.02.28 |
OpenBLAS를 이용하여 numpy와 scipy 설치 (0) | 2014.06.10 |
몬티홀(Monty Hall) 문제 코드 (0) | 2014.02.17 |
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
링크
TAG
- matrix multiplication
- Visual C++
- 설치
- armadillo c++
- GSX 1200 pro
- ctypes
- destructor
- Python
- GSX 1000 pro
- C++
- PyQt
- QPrinter.Letter
- pandas
- 볼륨 낮춤
- MSVC++
- cython
- CanDrA
- dll
- tensorflow
- how to solve it
- volume dial
- TCGA
- TensorBoard
- Item 9
- Accelerated C++
- 볼륨 조절
- 이상한 문자
- structure
- QT
- QPrinter.A4
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
글 보관함