================================================================================ Output files by directory ================================================================================ Images ============ The rule graph (located in `images/`) of the workflow: .. figure:: images/rulegraph.png :alt: Rule graph of the GPSW workflow (dPSI analysis) Rule graph of the GPSW workflow (dPSI analysis) Logs ====== The `logs/` directory contains the log files of the workflow. .. dropdown:: Directory structure :icon: info :color: primary .. code-block:: text logs/ ├── bowtie2 │ └── index.log ├── calculate_psi │ └── Test_vs_Control │ └── hit-th1.25_prop_th0.4_pen_th4.log ├── count │ ├── aggregate_counts.log │ ├── Control_1.log │ ├── Control_2.log │ ├── Control_3.log │ ├── Control_4.log │ ├── Control_5.log │ ├── Control_6.log │ ├── Test_1.log │ ├── Test_2.log │ ├── Test_3.log │ ├── Test_4.log │ ├── Test_5.log │ └── Test_6.log ├── create_fasta.log ├── cutadapt │ ├── Control_1.log │ ├── Control_2.log │ ├── Control_3.log │ ├── Control_4.log │ ├── Control_5.log │ ├── Control_6.log │ ├── Test_1.log │ ├── Test_2.log │ ├── Test_3.log │ ├── Test_4.log │ ├── Test_5.log │ └── Test_6.log ├── fastqc │ ├── Control_1.log │ ├── Control_2.log │ ├── Control_3.log │ ├── Control_4.log │ ├── Control_5.log │ ├── Control_6.log │ ├── Test_1.log │ ├── Test_2.log │ ├── Test_3.log │ ├── Test_4.log │ ├── Test_5.log │ └── Test_6.log ├── missed-rgrnas.log ├── multiqc │ └── multiqc.log ├── plot-alignment-rate.log ├── plot-coverage.log ├── plot_histograms │ └── hit-th1.25_prop_th0.4_pen_th4_Test_vs_Control.log ├── plot_psi │ ├── dotplot_hit-th1.25_prop_th0.4_pen_th4_Test_vs_Control.log │ └── hit-th1.25_prop_th0.4_pen_th4_Test_vs_Control.log └── snakemake └── 2025-07-16_11-49-52_snakemake.log 10 directories, 48 files z-score calculation log -------------------------- An important and informative log file is the `logs/calculate_psi/Test_vs_Control/hit-th1.25_prop_th0.4_pen_th4.log` file, which contains information about the normalisation and filtering steps of the barcodes and the subsequent z-score calculation. This file is generated by the `calculate_psi` rule and contains the following information: .. code-block:: text INFO:2025-07-16 11:50:29:Filtering data for Test vs Control INFO:2025-07-16 11:50:29: Barcodes present pre-filtering: 135057 INFO:2025-07-16 11:50:29: Barcodes with no counts in any sample: 18557 INFO:2025-07-16 11:50:29: Largest sample: Test_4 with 147375.0 reads INFO:2025-07-16 11:50:29: Barcodes with low counts in both reference and test condition: 97854 INFO:2025-07-16 11:50:29: Barcodes with no counts for Test in any bin: 0 INFO:2025-07-16 11:50:29: ORFs removed that have only one barcode after filtering: 0 INFO:2025-07-16 11:50:29: ORFs removed with less than 2 barcodes after filtering: 0 INFO:2025-07-16 11:50:29: Number of barcodes present post-filtering: 14559 INFO:2025-07-16 11:50:29: Marking twin peaked barcodes in: INFO:2025-07-16 11:50:29: Test INFO:2025-07-16 11:50:33: Control INFO:2025-07-16 11:50:38: Barcodes marked as having twin peaks: 4062 INFO:2025-07-16 11:50:38: ORFs removed with less than 2 barcodes after removing barcodes with twin peaks: 2632 INFO:2025-07-16 11:50:38:Computing PSI values for Test vs Control INFO:2025-07-16 11:50:39:Calculating z-scores INFO:2025-07-16 11:50:39:Correcting z-scores for number of barcodes INFO:2025-07-16 11:50:39: Median number of good barcodes: 3.0 INFO:2025-07-16 11:50:39:Correcting z-scores for intra ORF variability INFO:2025-07-16 11:50:39:Correcting z-scores for deltaPSI INFO:2025-07-16 11:50:39:Scaling z-scores INFO:2025-07-16 11:50:39:Calculating proportions of reads in bins INFO:2025-07-16 11:50:39:Writing barcode-level results to results/psi/hit-th1.25_prop_th0.4_pen_th4/Test_vs_Control_barcode.summary.csv INFO:2025-07-16 11:50:39:Calling hits INFO:2025-07-16 11:50:39: Number of stabilised ORFs in Test_vs_Control: 5 INFO:2025-07-16 11:50:39: Number of destabilised ORFs in Test_vs_Control: 13 INFO:2025-07-16 11:50:39:Ranking hits INFO:2025-07-16 11:50:39:Writing ranked results to results/psi/hit-th1.25_prop_th0.4_pen_th4/Test_vs_Control_gene.summary.csv INFO:2025-07-16 11:50:39:Done Resources ========== The `resources/` directory contains the following files/directory: - `.csv`: a CSV file with the ORF metadata, including the ORF ID, gene name, and other relevant information (placed by the user). - `.fasta`: a FASTA file with the ORF sequences (generated by GPSW). - `bowtie2_index/`: a directory containing the Bowtie2 index files for the ORF sequences (generated by GPSW). Results ============== The output files of the workflow are stored in the `results/` directory. The structure of the output files is as follows: .. dropdown:: Directory structure :icon: info :color: primary .. code-block:: text results/ ├── count │ └── counts-aggregated.tsv ├── psi │ └── hit-th1.25_prop_th0.4_pen_th4 │ ├── Test_vs_Control_barcode.summary.csv │ └── Test_vs_Control_gene.summary.csv ├── psi_plots │ └── hit-th1.25_prop_th0.4_pen_th4 │ ├── Test_vs_Control │ │ ├── destabilised │ │ │ ├── AP2M1_IOH21478.pdf │ │ │ ├── C3orf36_D52948.pdf │ │ │ ├── C6orf201_U11005.pdf │ │ │ ├── CDK16_D4804.pdf │ │ │ ├── CXorf40B_IOH9866.pdf │ │ │ ├── EIF3I_IOH3628.pdf │ │ │ ├── GNAS_IOH39616.pdf │ │ │ ├── INHBE_U3932.pdf │ │ │ ├── MAPK8IP2_U13451.pdf │ │ │ ├── PRPH2_IOH61916.pdf │ │ │ ├── RPS18_IOH41520.pdf │ │ │ ├── SCAMP1_IOH12951.pdf │ │ │ └── UBL5_U5662.pdf │ │ └── stabilised │ │ ├── 0_U14469.pdf │ │ ├── APOA2_IOH7290.pdf │ │ ├── CHST9_IOH80001.pdf │ │ ├── SLC31A2_U13112.pdf │ │ └── XKR8_IOH14631.pdf │ ├── Test_vs_Control_dotplot.pdf │ ├── Test_vs_Control_dpsi_histogram.pdf │ ├── Test_vs_Control_dpsi_sd_histogram.pdf │ └── Test_vs_Control_psi_histogram.pdf ├── qc │ ├── alignment-rates.pdf │ ├── fastqc │ │ ├── Control_1_fastqc.zip │ │ ├── Control_1.html │ │ ├── Control_2_fastqc.zip │ │ ├── Control_2.html │ │ ├── Control_3_fastqc.zip │ │ ├── Control_3.html │ │ ├── Control_4_fastqc.zip │ │ ├── Control_4.html │ │ ├── Control_5_fastqc.zip │ │ ├── Control_5.html │ │ ├── Control_6_fastqc.zip │ │ ├── Control_6.html │ │ ├── Test_1_fastqc.zip │ │ ├── Test_1.html │ │ ├── Test_2_fastqc.zip │ │ ├── Test_2.html │ │ ├── Test_3_fastqc.zip │ │ ├── Test_3.html │ │ ├── Test_4_fastqc.zip │ │ ├── Test_4.html │ │ ├── Test_5_fastqc.zip │ │ ├── Test_5.html │ │ ├── Test_6_fastqc.zip │ │ └── Test_6.html │ ├── missed-barcodes.pdf │ ├── multiqc.html │ └── sequence-coverage.pdf └── trimmed ├── Control_1.qc.txt ├── Control_2.qc.txt ├── Control_3.qc.txt ├── Control_4.qc.txt ├── Control_5.qc.txt ├── Control_6.qc.txt ├── Test_1.qc.txt ├── Test_2.qc.txt ├── Test_3.qc.txt ├── Test_4.qc.txt ├── Test_5.qc.txt └── Test_6.qc.txt 11 directories, 65 files Count -------------------------------------------------------------------------------- The `count` directory contains the aggregated, non-normalised counts of barcodes across all conditions and bins (`counts-aggregated.tsv`). .. list-table:: :header-rows: 1 :widths: 25 10 10 3 3 3 3 3 3 3 3 3 3 3 3 * - barcode_id - orf_id - gene - Control_1 - Control_2 - Control_3 - Control_4 - Control_5 - Control_6 - Test_1 - Test_2 - Test_3 - Test_4 - Test_5 - Test_6 * - 1_IOH10003_2802_PLD2 - IOH10003 - PLD2 - 0 - 2 - 0 - 11 - 12 - 0 - 0 - 5 - 3 - 11 - 9 - 0 * - 2_IOH10003_2802_PLD2 - IOH10003 - PLD2 - 1 - 3 - 1 - 11 - 12 - 3 - 0 - 3 - 6 - 13 - 7 - 3 * - 3_IOH10003_2802_PLD2 - IOH10003 - PLD2 - 0 - 29 - 8 - 51 - 126 - 66 - 17 - 7 - 36 - 0 - 12 - 0 PSI -------------------------------------------------------------------------------- For each combination of hit threshold, proportion threshold and penalty factor, the `psi` directory contains the following files: - ``Test_vs_Control_barcode.summary.csv``: a CSV file with barcode-level results. .. list-table:: :header-rows: 1 * - barcode_id - orf_id - gene - Control_1 - Control_2 - Control_3 - Control_4 - Control_5 - Control_6 - Test_1 - Test_2 - Test_3 - Test_4 - Test_5 - Test_6 - SOB_Control - SOB_Test - num_barcodes - twin_peaks - good_barcodes - PSI_Control - PSI_Test - PSI_Control_mean - PSI_Test_mean - deltaPSI - delta_PSI_mean - delta_PSI_SD - z_score - z_score_corr * - 18_IOH10009_315_C9orf80 - IOH10009 - C9orf80 - 0.0 - 0.047 - 0.047 - 0.0 - 0.428 - 0.476 - 0.034 - 0.0 - 0.0 - 0.103 - 0.551 - 0.310 - 21.0 - 29.0 - 4 - False - 3 - 5.238 - 5.068 - 5.184 - 5.116 - -0.169 - -0.068 - 0.311 - 0.366 - 1.010 * - 19_IOH10009_315_C9orf80 - IOH10009 - C9orf80 - 0.0 - 0.066 - 0.133 - 0.0 - 0.267 - 0.533 - 0.0 - 0.043 - 0.0 - 0.043 - 0.391 - 0.521 - 15.0 - 23.0 - 4 - False - 3 - 5.066 - 5.347 - 5.184 - 5.116 - 0.281 - -0.068 - 0.311 - 0.366 - 1.010 * - 20_IOH10009_315_C9orf80 - IOH10009 - C9orf80 - 0.045 - 0.090 - 0.0 - 0.272 - 0.272 - 0.318 - 0.0 - 0.0 - 0.05 - 0.2 - 0.65 - 0.1 - 22.0 - 20.0 - 4 - True - 3 - NA - NA - 5.184 - 5.116 - NA - -0.068 - 0.311 - 0.366 - 1.010 - ``Test_vs_Control_gene.summary.csv``: a CSV file with the gene-level results. This file contains, among others, the z-scores for each gene, and whether a gene is stabilised/destabilised in the test condition compared to the control condition, as well as an associated ranking. .. list-table:: :header-rows: 1 * - orf_id - gene - delta_PSI_mean - good_barcodes - stabilised - destabilised - z_score_corr - stabilised_rank - destabilised_rank * - IOH10176 - TYROBP - 1.637 - 3 - True - False - 1.3896 - 31 - NA * - IOH10333 - C10orf54 - 1.647 - 3 - True - False - 1.572 - 22 - NA * - IOH11069 - UBD - -1.882 - 2 - False - True - -3.717 - NA - 1 PSI Plots -------------------------------------------------------------------------------- The `psi_plots` directory contains the following subdirectories for each combination of hit threshold, SD threshold, proportion threshold and penalty factor: - `Test_vs_Control`: contains the following subdirectories: * `destabilised_in_Test`: contains PDF files with the barcode profiles for each destabilised gene in the test condition. * `stabilised_in_Test`: contains PDF files with the barcode profiles for each stabilised gene in the test condition. Example of barcode profile: .. figure:: images/profile.png :alt: Barcode profile for a stabilised gene in the test condition Barcode profile for a stabilised gene in the test condition. - `Test_vs_Control_dotplot.pdf`: a PDF file with a dot plot of the z-scores for each gene in the test condition compared to the control condition. .. figure:: images/dotplot.png :alt: Dot plot of z-scores for each gene in the test condition compared to the control condition Dot plot of z-scores for each gene in the test condition compared to the control condition. .. note:: Proteins whose :math:`|dPSI_i|` is smaller than the mean :math:`|dPSI_i|` are omitted from the plots to avoid visual clutter around the origin. - `Test_vs_Control_psi_histogram.pdf`: a PDF file with a histogram of the :math:`\Psi_i` values for all genes. .. figure:: images/psi_histogram.png :alt: Histogram of PSI values Histogram of PSI values. - `Test_vs_Control_dpsi_histogram.pdf`: a PDF file with a histogram of the :math:`dPSI_i` values for all genes. .. figure:: images/dpsi_histogram.png :alt: Histogram of delta PSI values Histogram of delta PSI values. - `Test_vs_Control_dpsi_sd_histogram.pdf`: a PDF file with a histogram of the :math:`dPSI_i` SD values for all genes. .. figure:: images/sd_histogram.png :alt: Histogram of delta PSI SD values Histogram of delta PSI SD values. - `Test_vs_Control_sob_histogram.pdf`: a PDF file with a histogram of the sum of barcodes (SOB) values for all genes per condition. .. figure:: images/sob_histogram.png :alt: Histogram of SOB values Histogram of SOB values. Only the 99.9th percentile of SOB values is shown in the histogram. QC -------------------------------------------------------------------------------- Alignment rates of individual samples ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `Bowtie2` alignment rates for each sample are visualised in `alignment-rates.pdf` file. .. figure:: images/alignment-rates.png :alt: Alignment rates of individual samples Alignment rates of individual samples. Missed barcodes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The `missed-barcodes.pdf` file contains a plot of the number of barcodes that were not detected in each bin for each condition. This is useful to identify bins with low coverage or issues with barcode detection. .. figure:: images/missed-barcodes.png :alt: Missed barcodes Missed barcodes. Sequence coverage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The `sequence-coverage.pdf` file contains a plot of the sequence coverage across all bins for each condition. This is useful to identify bins with low coverage or issues with barcode detection. .. figure:: images/sequence-coverage.png :alt: Sequence coverage Sequence coverage. The sequence coverage is calculated by dividing the number of sequencing reads in each bin by the total number of barcodes in the ORF library. MultiQC report ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The `multiqc.html` file contains a summary of the quality control metrics for the trimmed reads. .. figure:: images/multiqc.png :alt: MultiQC report MultiQC report. Trimmed -------------------------------------------------------------------------------- The `trimmed` directory contains the quality control files for each sample after trimming with `cutadapt`. These files contain information about the number of reads before and after trimming, the number of reads that were discarded, and the number of reads that were kept. .. dropdown:: Example of a trimmed sample quality control file :icon: info :color: primary .. code-block:: text This is cutadapt 4.9 with Python 3.12.10 Command line parameters: --cores 4 -g CCAGTAGGTCCACTATGAGT -l 20 -q 20 --discard-untrimmed -o results/trimmed/Control_1.fastq.gz reads/Control_1.fastq.gz Processing single-end reads on 4 cores ... Finished in 0.842 s (4.809 µs/read; 12.48 M reads/minute). === Summary === Total reads processed: 175,000 Reads with adapters: 174,492 (99.7%) Reads written (passing filters): 174,492 (99.7%) Total basepairs processed: 26,250,000 bp Quality-trimmed: 27,953 bp (0.1%) Total written (filtered): 3,489,639 bp (13.3%) === Adapter 1 === Sequence: CCAGTAGGTCCACTATGAGT; Type: regular 5'; Length: 20; Trimmed: 174492 times Minimum overlap: 3 No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20 bp: 2 Overview of removed sequences length count expect max.err error counts 3 133 2734.4 0 133 4 1 683.6 0 1 18 3 0.0 1 2 1 19 47 0.0 1 6 38 3 20 19984 0.0 2 17434 2236 314 21 19934 0.0 2 17471 2165 298 22 24169 0.0 2 21274 2541 354 23 25389 0.0 2 22189 2779 421 24 23155 0.0 2 20282 2529 344 25 24034 0.0 2 21051 2524 459 26 20808 0.0 2 18067 2310 431 27 16792 0.0 2 14576 1895 321 28 34 0.0 2 13 18 3 31 1 0.0 2 1 32 1 0.0 2 0 0 1 33 1 0.0 2 1 36 1 0.0 2 1 37 1 0.0 2 1 38 1 0.0 2 1 40 1 0.0 2 1 47 1 0.0 2 1 57 1 0.0 2 0 1 Output with multiple test conditions ====================================== When running the workflow with multiple test conditions, additional files will be created in *results/*. PCA plot -------------------------------------------------------------------------------- The `qc/pca_plot.pdf` file contains a PCA plot of the ORF counts for all conditions. This plot is useful to visualise the overall distribution of ORF counts across different conditions and to identify potential outliers. .. figure:: images/pca.png :alt: PCA plot PCA plot of the ORF counts for all conditions. Heatmap of :math:`dPSI_i` values of all comparisons --------------------------------------------------------------------------------- The `psi_plots` directory contains pdf/csv files with the heatmap data of :math:`dPSI_i` values for each ORF found as hits in any of the comparisons. .. figure:: images/heatmap.png :alt: Heatmap of DeltaPSI values Heatmap of :math:`dPSI_i` values for all comparisons. .. note:: As the clustering algorithm does not allow missing data (some genes are not found in all comparisons), missing data is replaced with :math:`dPSI_i = 0`.