specific gene list_TCGA data와 matching하기

head(rowData(colon_data))
DataFrame with 6 rows and 10 columns
                     source     type     score     phase            gene_id      gene_type
                   <factor> <factor> <numeric> <integer>        <character>    <character>
ENSG00000000003.15   HAVANA     gene        NA        NA ENSG00000000003.15 protein_coding
ENSG00000000005.6    HAVANA     gene        NA        NA  ENSG00000000005.6 protein_coding
ENSG00000000419.13   HAVANA     gene        NA        NA ENSG00000000419.13 protein_coding
ENSG00000000457.14   HAVANA     gene        NA        NA ENSG00000000457.14 protein_coding
ENSG00000000460.17   HAVANA     gene        NA        NA ENSG00000000460.17 protein_coding
ENSG00000000938.13   HAVANA     gene        NA        NA ENSG00000000938.13 protein_coding
                     gene_name       level     hgnc_id          havana_gene
                   <character> <character> <character>          <character>
ENSG00000000003.15      TSPAN6           2  HGNC:11858 OTTHUMG00000022002.2
ENSG00000000005.6         TNMD           2  HGNC:17757 OTTHUMG00000022001.2
ENSG00000000419.13        DPM1           2   HGNC:3005 OTTHUMG00000032742.2
ENSG00000000457.14       SCYL3           2  HGNC:19285 OTTHUMG00000035941.6
ENSG00000000460.17    C1orf112           2  HGNC:25565 OTTHUMG00000035821.9
ENSG00000000938.13         FGR           2   HGNC:3697 OTTHUMG00000003516.3

gene_names <- rowData(colon_data)$gene_name
matched_genes <- significant_gene_names[significant_gene_names %in% gene_names]
print(matched_genes)

expr_data_filtered <- assay(colon_data)[rowData(colon_data)$gene_name %in% matching_genes_upper, , drop = FALSE]
print(expr_data_filtered)



댓글

이 블로그의 인기 게시물

#single cell sequencing 기초 분석 - #1 R 설치 및 package 설치

리눅스 기초 #10 GATK calling을 사용하기 위하여, reference file indexing하는 방법

Single cell 분석을 위한 package 소개