Genake in Snakemake's shell command

Elysee

# md5sum on fastq folder on cluster
rule md5sum_fastq_cluster:
     input:
         path_cluster+'/'+project_name+'/'+project_name+'.csv'
     output:
         path_cluster+'/'+project_name+'/'+'md5sum.txt'
     shell:
         """find {path_cluster}/{project_name} -type f -name "*.fastq.gz" -exec md5sum {{}} + | awk '{{print $1, gensub( ".*/", "", $2 )}}' | sort > {output}"""
 
 
 # md5sum on fastq folder on remote server
 rule md5sum_fastq_SAN:
     input:
         copyFASTQdone
     output:
         SFTPsan.remote(server_san+path_san+'/'+project_name+'/md5sum.txt')
     shell:
         """ssh imrb@{server_san} "find {path_san}/{project_name} -type f -name '*.fastq.gz' -exec md5sum {{}} + | awk '{{print \$1, gensub( ".*/", "", \$2 )}}' | sort" > {output}"""

--------------------------------------------------------------------------
awk: ligne de commande:1: {print $1, gensub( .*/, , $2 )}
awk: ligne de commande:1:                    ^ syntax error
awk: ligne de commande:1: {print $1, gensub( .*/, , $2 )}

Apparently, my gensub syntax is wrong.
Before adding the gensub command, I got 2 shell commands from 2 rules:

"""find {path_cluster}/{project_name} -type f -name "*.fastq.gz" -exec md5sum {{}} + | awk '{{print $1}}' | sort > {output}"""

"""ssh imrb@{server_san} "find {path_san}/{project_name} -type f -name '*.fastq.gz' -exec md5sum {{}} + | awk '{{print \$1}}' | sort > {output}"""

It's working. It's just that I can't find the correct syntax since I added gensub.
I need this gensub to basically do the same thing as basenamedeleting a file path .
Of course, I tried awk/gensub commands outside snakemake and it works.

Just in case, here is the file my rule generates:

# md5sum.txt before gensub
01afd3f2bf06d18c5609b2c2c963eddf /data/imrb/Data/200122_GSC/14-CTRL50TMZ1907192_S11_R2_001.fastq.gz
03e353c316aef09c748aa2363db95599 /data/imrb/Data/200122_GSC/15-11650TMZ1907192_S12_R2_001.fastq.gz
1ba21b8be882bcb62c464ba515800ca4 /data/imrb/Data/200122_GSC/1-CTRL120719_S1_R2_001.fastq.gz

# md5sum.txt after gensub
01afd3f2bf06d18c5609b2c2c963eddf 14-CTRL50TMZ1907192_S11_R2_001.fastq.gz
03e353c316aef09c748aa2363db95599 15-11650TMZ1907192_S12_R2_001.fastq.gz
1ba21b8be882bcb62c464ba515800ca4 1-CTRL120719_S1_R2_001.fastq.gz

Elysee

Thanks to dariober, I found the correct syntax for each rule.

For the first rule: I need to escape double quotes used in awk

rule md5sum_fastq_cluster:
     input:
         path_cluster+'/'+project_name+'/'+project_name+'.csv'
     output:
         path_cluster+'/'+project_name+'/'+'md5sum.txt'
     shell:
         """find {path_cluster}/{project_name} -type f -name "*.fastq.gz" -exec md5sum {{}} + | awk '{{print $1, gensub( \".*/\", \"\", $2 )}}' | sort > {output}"""

For the second rule, to pass a shell command to SSH, I needed to escape the double quotes twice and add one \before $2

 rule md5sum_fastq_SAN:
     input:
         copyFASTQdone
     output:
         SFTPsan.remote(server_san+path_san+'/'+project_name+'/md5sum.txt')
     shell:
         """ssh imrb@{server_san} "find {path_san}/{project_name} -type f -name '*.fastq.gz' -exec md5sum {{}} + | awk '{{print \$1, gensub( \\".*/\\", \\"\\", \$2 )}}' | sort" > {output}"""

Genake in Snakemake's shell command

Elysee # md5sum on fastq folder on cluster rule md5sum_fastq_cluster: input: path_cluster+'/'+project_name+'/'+project_name+'.csv' output: path_cluster+'/'+project_name+'/'+'md5sum.txt' shell: """find {path_cluster}/{p

Genake in Snakemake's shell command

Combine shell command line in snakemake

User 3224522 I would like to combine the two command lines into one to avoid intermediate files. workdir: "/path/to/workdir/" rule all: input: "my.filtered.vcf.gz" rule bedtools: input: invcf="/path/to/my.vcf.gz", bedgz="/pat

Snakemake: How to specify absolute path to shell command

Vkkodali I'm writing a snakemake rule that uses multiple commands like this: rule RULE1: input: 'path/to/input.file' output: 'path/to/output.file' shell: 'path/to/command1 {input} | /path/to/command2 | /path/to/command3 {output}' If /path/to/command1it'

Snakemake: How to specify absolute path to shell command

What's the best way to prevent snakemake failing shell/R errors?

Rioan I want the snakemake workflow to keep running even if some rules fail. For example, I use various tools to perform peak calling of ChIP-seq data. However, some programs issue error messages when peaks are not recognized. In this case I'd rather create an

What's the best way to prevent snakemake failing shell/R errors?

Understanding the shell's "read" command

Vilnius I'm trying to understand UNIX Shell and the "read" command confuses me. As shown in the following code snippet (or "while-read" idiom), this command "takes" a line of standard input. (read -r foo ; echo '*** Before cat ***' ; cat) << 'END' hello world

Shell command in gnuplot's if block

Asaka I want to execute a shell command inside an if-block in Gnuplot. I have tried the following: datatype = 'full' if ( datatype eq 'full' ) { # Run shell command !echo 'full' } else { # Run different shell command !echo 'not full' } Howeve

Understanding the shell's "read" command

Snakemake: Wrap command logging

Sebio I don't know how to log the commands executed by the wrapper. Neither "snakemake -p" nor "snakemake -D" showed me the actual command that was run. What's the best way to log commands created via wrappers? Cheers, Seb Johannes Coster So currently, -p does

Snakemake shell command can only get one file at a time, but it tries to process multiple files at the same time

Wood First of all, sorry if I can't explain my question clearly, English is not my native language. I'm trying to make a snake rule that takes a fastq file and filters it using a program called Filtlong. I have multiple fastq files on which I want to run this

Snakemake shell command can only get one file at a time, but it tries to process multiple files at the same time

Traverse strings in shell commands with Snakemake

neutral To provide some background, I am trying to compose a pipeline to analyze in silico deep sequencing results of CRISPR targets. I amplified a known sequence from the genome in 50 different places, and each amplicon contained a predicted off-target site t

Genake in Snakemake's shell command

Related

Genake in Snakemake's shell command

Genake in Snakemake's shell command

Genake in Snakemake's shell command

Genake in Snakemake's shell command

Genake in Snakemake's shell command

Genake in Snakemake's shell command

Combine shell command line in snakemake

Snakemake: How to specify absolute path to shell command

Snakemake: How to specify absolute path to shell command

Snakemake: How to specify absolute path to shell command

Snakemake: How to specify absolute path to shell command

What's the best way to prevent snakemake failing shell/R errors?

What's the best way to prevent snakemake failing shell/R errors?

What's the best way to prevent snakemake failing shell/R errors?

What's the best way to prevent snakemake failing shell/R errors?

What's the best way to prevent snakemake failing shell/R errors?

Understanding the shell's "read" command

Shell command in gnuplot's if block

Understanding the shell's "read" command

Snakemake: Wrap command logging

Snakemake shell command can only get one file at a time, but it tries to process multiple files at the same time

Snakemake shell command can only get one file at a time, but it tries to process multiple files at the same time

Snakemake shell command can only get one file at a time, but it tries to process multiple files at the same time

Traverse strings in shell commands with Snakemake

Format Snakemake input file in shell

Format Snakemake input file in shell

Traverse strings in shell commands with Snakemake

Format Snakemake input file in shell

Traverse strings in shell commands with Snakemake

Ranking