Skip to content
Snippets Groups Projects

transposed dataframe in loom_to_pandas, added docs, test for main-methods,...

Merged Paul Kuehnel requested to merge dev into main
6 files
+ 173
23
Compare changes
  • Side-by-side
  • Inline
Files
6
+ 123
1
@@ -24,6 +24,128 @@
pass
return pandas_data_frame
#### 1.2.2 Aufbau des Loom-Files
##### Spaltenattribute:
Das Loom-File `AllelicExpressionPatterns-mouse-brain-SS2.loom`
enthält folgende Spaltenattribute (`ds.ca[:][:]`):
ACCUMULATION_LEVEL : ['All Reads' 'All Reads' 'All Reads' 'All Reads']
ALIGNED_READS : [28986068. 20683055. 23186931. 30404657.]
AT_DROPOUT : [33.606559 33.648426 31.444722 31.786082]
Aligned 0 time : [3858339. 3420164. 3943858. 5300103.]
Aligned 1 time : [3004200. 1998692. 2355992. 3128781.]
Aligned >1 times : [8369360. 5468656. 6224005. 8085766.]
BAD_CYCLES.UNPAIRED : [0. 0. 0. 0.]
CODING_BASES : [22092807. 15526779. 17208608. 25332878.]
CORRECT_STRAND_READS : [0. 0. 0. 0.]
CellID : ['0b6988c1-c071-4ab5-bf8a-d5b64216dfba'
'aa51e744-6d4d-4ca4-af86-447b96cf41e6'
'ad82581a-49cc-45c1-88b1-530b04967bbb'
'cd102dbd-687f-4b9e-8d54-e4d67e4e5230']
ESTIMATED_LIBRARY_SIZE : ['' '' '' '']
GC_DROPOUT : [0.007767 0.007839 0.008992 0.00765 ]
GC_NC_0_19 : [0.300196 0.416359 0.262513 0.417735]
GC_NC_20_39 : [0.430949 0.43365 0.448884 0.466797]
GC_NC_40_59 : [1.217783 1.220995 1.229549 1.203061]
GC_NC_60_79 : [6.157629 5.993127 5.527977 5.838945]
GC_NC_80_100 : [1.397801 1.402828 1.086991 1.446325]
IGNORED_READS : [0. 0. 0. 0.]
INCORRECT_STRAND_READS : [0. 0. 0. 0.]
INTERGENIC_BASES : [2.66441294e+08 1.73987218e+08 2.19471711e+08 2.70343529e+08]
INTRONIC_BASES : [2.51751308e+08 1.60297885e+08 2.06044305e+08 2.47486347e+08]
MEAN_READ_LENGTH.UNPAIRED : [43. 43. 43. 43.]
MEDIAN_3PRIME_BIAS : [0. 0. 0.097329 0. ]
MEDIAN_5PRIME_BIAS : [0. 0. 0.075903 0. ]
MEDIAN_5PRIME_TO_3PRIME_BIAS : [0. 0. 0. 0.]
MEDIAN_CV_COVERAGE : [1.660858 1.693836 1.737947 1.635959]
NUM_R1_TRANSCRIPT_STRAND_READS : [490303. 377463. 409884. 583749.]
NUM_R2_TRANSCRIPT_STRAND_READS : [488136. 370909. 405170. 573631.]
NUM_UNEXPLAINED_READS : [31919. 24297. 27091. 36571.]
Overall alignment rate : [0.7467 0.6859 0.6851 0.6791]
PCT_ADAPTER.UNPAIRED : [1.7e-05 4.5e-05 4.0e-06 5.1e-05]
PCT_CHIMERAS.UNPAIRED : [0. 0. 0. 0.]
PCT_CODING_BASES : [0.038417 0.040993 0.036428 0.04321 ]
PCT_CORRECT_STRAND_READS : [0. 0. 0. 0.]
PCT_INTERGENIC_BASES : [0.463312 0.459357 0.464585 0.461125]
PCT_INTRONIC_BASES : [0.437768 0.423215 0.436162 0.422138]
PCT_MRNA_BASES : [0.082171 0.094531 0.084057 0.095176]
PCT_PF_READS.UNPAIRED : [1. 1. 1. 1.]
PCT_PF_READS_ALIGNED.UNPAIRED : [0.881757 0.813683 0.881304 0.830194]
PCT_PF_READS_IMPROPER_PAIRS.UNPAIRED : [0. 0. 0. 0.]
PCT_R1_TRANSCRIPT_STRAND_READS : [0.501107 0.504379 0.502892 0.504371]
PCT_R2_TRANSCRIPT_STRAND_READS : [0.498893 0.495621 0.497108 0.495629]
PCT_READS_ALIGNED_IN_PAIRS.UNPAIRED : [0. 0. 0. 0.]
PCT_RIBOSOMAL_BASES : [0.016861 0.023054 0.015303 0.02171 ]
PCT_USABLE_BASES : [0.072148 0.076479 0.073736 0.078576]
PCT_UTR_BASES : [0.043754 0.053538 0.047629 0.051966]
PERCENT_DUPLICATION : [0.492939 0.49767 0.403599 0.501733]
PF_ALIGNED_BASES : [5.75079892e+08 3.78762240e+08 4.72403450e+08 5.86269510e+08]
PF_ALIGNED_BASES.UNPAIRED : [5.75079892e+08 3.78762240e+08 4.72403450e+08 5.86269510e+08]
PF_BASES : [6.54971657e+08 4.68163016e+08 5.38525765e+08 7.10129950e+08]
PF_HQ_ALIGNED_BASES.UNPAIRED : [4.38357134e+08 2.76406611e+08 3.67474488e+08 4.39738733e+08]
PF_HQ_ALIGNED_Q20_BASES.UNPAIRED : [4.33645315e+08 2.73014984e+08 3.63416370e+08 4.34145083e+08]
PF_HQ_ALIGNED_READS.UNPAIRED : [10239973. 6467597. 8587383. 10286865.]
PF_HQ_ERROR_RATE.UNPAIRED : [0.001803 0.001784 0.001877 0.001872]
PF_HQ_MEDIAN_MISMATCHES.UNPAIRED : [0. 0. 0. 0.]
PF_INDEL_RATE.UNPAIRED : [8.3e-05 8.5e-05 8.7e-05 8.7e-05]
PF_MISMATCH_RATE.UNPAIRED : [0.001636 0.001611 0.001726 0.001699]
PF_NOISE_READS.UNPAIRED : [0. 0. 0. 0.]
PF_READS.UNPAIRED : [15231899. 10887512. 12523855. 16514650.]
PF_READS_ALIGNED.UNPAIRED : [13430833. 8858988. 11037322. 13710365.]
PF_READS_IMPROPER_PAIRS.UNPAIRED : [0. 0. 0. 0.]
READS_ALIGNED_IN_PAIRS.UNPAIRED : [0. 0. 0. 0.]
READS_USED : ['ALL' 'ALL' 'ALL' 'ALL']
READ_PAIRS_EXAMINED : [0. 0. 0. 0.]
READ_PAIR_DUPLICATES : [0. 0. 0. 0.]
READ_PAIR_OPTICAL_DUPLICATES : [0. 0. 0. 0.]
RIBOSOMAL_BASES : [ 9696629. 8731924. 7229332. 12727871.]
SECONDARY_OR_SUPPLEMENTARY_RDS : [15555235. 11824067. 12149609. 16694292.]
STRAND_BALANCE.UNPAIRED : [0.506264 0.510764 0.506729 0.509339]
TOTAL_CLUSTERS : [30787134. 22711579. 24673464. 33208942.]
TOTAL_READS.UNPAIRED : [15231899. 10887512. 12523855. 16514650.]
Total reads : [15231899. 10887512. 12523855. 16514650.]
UNMAPPED_READS : [1801066. 2028524. 1486533. 2804285.]
UNPAIRED_READS_EXAMINED : [13430833. 8858988. 11037322. 13710365.]
UNPAIRED_READ_DUPLICATES : [6620584. 4408852. 4454653. 6878943.]
UTR_BASES : [25162010. 20277998. 22500309. 30465947.]
WINDOW_SIZE : [100. 100. 100. 100.]
alignable reads : [11373560. 7467348. 8579997. 11214547.]
cell_names : ['0b6988c1-c071-4ab5-bf8a-d5b64216dfba'
'aa51e744-6d4d-4ca4-af86-447b96cf41e6'
'ad82581a-49cc-45c1-88b1-530b04967bbb'
'cd102dbd-687f-4b9e-8d54-e4d67e4e5230']
filtered reads : [0. 0. 0. 0.]
input_id : ['0b6988c1-c071-4ab5-bf8a-d5b64216dfba'
'aa51e744-6d4d-4ca4-af86-447b96cf41e6'
'ad82581a-49cc-45c1-88b1-530b04967bbb'
'cd102dbd-687f-4b9e-8d54-e4d67e4e5230']
input_id_metadata_field : ['sequencing_process_provenance_document_id'
'sequencing_process_provenance_document_id'
'sequencing_process_provenance_document_id'
'sequencing_process_provenance_document_id']
input_name : ['SRX1461161' 'SRX1461159' 'SRX1461158' 'SRX1461160']
input_name_metadata_field : ['sequencing_input_biomaterial_core_biomaterial_id'
'sequencing_input_biomaterial_core_biomaterial_id'
'sequencing_input_biomaterial_core_biomaterial_id'
'sequencing_input_biomaterial_core_biomaterial_id']
multiple mapped : [2966138. 2188525. 2354858. 3104526.]
strand : [1. 1. 1. 1.]
total alignments : [38685001. 25821562. 29374024. 37701360.]
total reads : [15231899. 10887512. 12523855. 16514650.]
unalignable reads : [3858339. 3420164. 3943858. 5300103.]
uncertain reads : [8369360. 5468656. 6224005. 8085766.]
unique aligned : [8407422. 5278823. 6225139. 8110021.]
##### Zeilenattribute:
Das Loom-File `AllelicExpressionPatterns-mouse-brain-SS2.loom`
enthält folgende Zeilenattribute (`ds.ra[:][:]`):
Gene : ['ENSMUSG00000000001.4' 'ENSMUSG00000000003.15' 'ENSMUSG00000000028.15' ... 'ENSMUSG00000118391.1' 'ENSMUSG00000118392.1' 'ENSMUSG00000118393.1']
ensembl_ids : ['ENSMUSG00000000001.4' 'ENSMUSG00000000003.15' 'ENSMUSG00000000028.15' ... 'ENSMUSG00000118391.1' 'ENSMUSG00000118392.1' 'ENSMUSG00000118393.1']
gene_names : ['ENSMUSG00000000001.4' 'ENSMUSG00000000003.15' 'ENSMUSG00000000028.15' ... 'ENSMUSG00000118391.1' 'ENSMUSG00000118392.1' 'ENSMUSG00000118393.1']
## 2. Auto Encoder
* reinlesen
@@ -38,4 +160,4 @@ Ziel: Visualisierung des Clusters
* z.B. t-SNE und uMap was macht Sinn, auf wie viele Dimensionen wollen wir runter ? Glaube auf 2
* Clusteranalyse dann auf 2-dimensionalem anwenden ?
* welche.. kMeans z.B. ?
* [Dokument zu Clusteralgorithmen](https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/LLWA03/fgml/final/Kirchner.pdf)
* [Dokument zu Clusteralgorithmen](https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/LLWA03/fgml/final/Kirchner.pdf)
\ No newline at end of file
Loading