BioMysteryBench problems (99)

Total: 99 · human-solvable: 76 · human-difficult: 23

Each entry has the question, allowed network domains, data files staged into the agent's work_dir, and the answer rubric. Rubrics are spoilers — do not show to a solving agent.

hb001 — human-solvable

Question

Which human organ is this cell type single-cell RNA-seq dataset derived from?

Allowed network domains: conda.anaconda.org, repo.anaconda.com, ncbi.nlm.nih.gov, ftp.ncbi.nlm.nih.gov, ensembl.org, ftp.ensembl.org, hgdownload.soe.ucsc.edu, uniprot.org, bioconductor.org, pypi.org, bioconda.github.io, cran.r-project.org, cran.rstudio.com, ftp.ebi.ac.uk

Data files (staged into the agent's work_dir from hb001.zip · download zip):

cells.tsv.gz — 76.4 KB
counts.mtx.gz — 214.4 MB
genes.tsv.gz — 184.6 KB

Total: 3 file(s), 214.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Answer is the lung. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb002 — human-solvable

Question

What bacteria is found in this sequenced dataset? Provide the bacteria's scientific name as the answer.

Data files (staged into the agent's work_dir from hb002.zip · download zip):

anonymized_genome_1_.fasta — 4.2 MB

Total: 1 file(s), 4.2 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Bacillus licheniformis. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb003 — human-solvable

Question

Name one gene that was knocked out in the experimental sample compared to the control samples. Provide the gene symbol.

Data files (staged into the agent's work_dir from hb003.zip · download zip):

gene_info.gz — 1.29 GB
norm_counts_TPM.tsv — 1.1 MB
raw_counts.tsv — 845.0 KB

Total: 3 file(s), 1.29 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Accept any one of the knocked-out genes: ITGAV. The answer must name one gene from this list (gene symbol). Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb004 — human-solvable

Question

Which eukaryotic host organism was this metagenomic sample taken from? Provide the scientific species name (e.g., Homo sapiens).

Data files (staged into the agent's work_dir from hb004.zip · download zip):

reads_1.fastq — 1.20 GB
reads_2.fastq — 1.20 GB

Total: 2 file(s), 2.39 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Homo sapiens (human). Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb006 — human-difficult

Question

In this human intestinal biopsy RNA-seq dataset generated using Illumina HiSeq 2000 platform with strand-specific mRNA library preparation, which samples were collected from patients diagnosed with ulcerative colitis? Provide your answer as a list of sample identifiers (e.g., sample_1, sample_2, etc.).

Data files (staged into the agent's work_dir from hb006.zip · download zip):

data2.csv — 36.5 MB
processed_disease_data.csv — 4.9 KB

Total: 2 file(s), 36.5 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer mentions all of the following samples were collected from patients with ulcerative colitis: Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6 Sample_7 Sample_8 Sample_9 Sample_10 Sample_11 Sample_12 Sample_13 Sample_14 Sample_15 Sample_16 Sample_17 Sample_18 Sample_19 Sample_20 Sample_21 Sample_22 Sample_23 Sample_24 Sample_25 Sample_26 Sample_27 Sample_28 Sample_29 Sample_30 Sample_31 Sample_32 Sample_33 Sample_34 Sample_35 Sample_36 Sample_37 Sample_38 Sample_39 Sample_40 Sample_41 Sample_42 Sample_43 Sample_44 Sample_45 Sample_46 Sample_47 Sample_48 Sample_49 Sample_50 Sample_51 Sample_52 Sample_53 Sample_54 Sample_55 Sample_56 Sample_57 Sample_58 Sample_59 Sample_60 Sample_61 Sample_62 Sample_63 Sample_64 Sample_65 Sample_66 Sample_67 Sample_68 Sample_69 Sample_70 Sample_71 Sample_72 Sample_73 Sample_74 Must get at least 95% correct. Score 1.0 if at least 95% correct, 0.0 otherwise. No partial credit.

hb008 — human-solvable

Question

What mouse reference genome were the RNA-seq reads aligned to? Provide the genome assembly name (e.g., UCSC-style name such as mm10 or GRC name such as GRCm38).

Data files (staged into the agent's work_dir from hb008.zip · download zip):

archive.tar.gz.part-aa — 500.0 MB
archive.tar.gz.part-ab — 500.0 MB
archive.tar.gz.part-ac — 500.0 MB
archive.tar.gz.part-ad — 500.0 MB
archive.tar.gz.part-ae — 500.0 MB
archive.tar.gz.part-af — 199.2 MB

Total: 6 file(s), 2.64 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. mm10 (aka GRCm38) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb009 — human-solvable

Question

What genetic disease does Sample_02 have? Provide the common name of the genetic disease.

Data files (staged into the agent's work_dir from hb009.zip · download zip):

counts_tpm-2019-12-06.csv — 9.4 MB

Total: 1 file(s), 9.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Sample 2 has Fragile X Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb010 — human-difficult

Question

What amplicon sequencing primer(s) was/were used to characterize the microbial community of this dataset? List the target marker gene region(s) (e.g., 16S, 18S, ITS) used for amplicon sequencing.

Data files (staged into the agent's work_dir from hb010.zip · download zip):

hb010-sequences.fasta — 7.2 KB

Total: 1 file(s), 7.2 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Answer mentions all of the following: 16S, 18S, ITS Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb011 — human-solvable

Question

This human RNA‑seq dataset has an unknown sequencing setup. Using only the raw FASTQ file(s), determine: the read length (in base pairs) whether the data are single-end or paired-end the sequencing depth (number of reads, or read pairs if paired-end)Provide your answer in the following format (exactly): read length (integer in bp), configuration (e.g. single) and depth (in millions, rounded to 2 decimal places, e.g. 15.50M). If paired end configuration provide the two read lengths as follow R1(READ LENGHT bp)/R2 (READ LENGHT bp). Format your answer exactly as: 'R1 (XXXbp)/R2 (XXXbp), paired, XX.XXM' for paired-end or 'XXX bp, single, XX.XXM' for single-end.

Data files (staged into the agent's work_dir from hb011.zip · download zip):

hb011.zip — 427.7 MB

Total: 1 file(s), 427.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is read length: R1 (110bp),R2 (112bp) configuration: paired depth: 22.73M Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb012 — human-solvable

Question

What sample is the mother of sample X? What sample is the father? Answer should be comma separated: mother comes first, father comes second.

Data files (staged into the agent's work_dir from hb012.zip · download zip):

Sample_01.vcf.gz — 505.1 KB
Sample_01.vcf.gz.tbi — 95.4 KB
Sample_02.vcf.gz — 526.4 KB
Sample_02.vcf.gz.tbi — 96.7 KB
Sample_03.vcf.gz — 547.9 KB
Sample_03.vcf.gz.tbi — 97.0 KB
Sample_04.vcf.gz — 577.3 KB
Sample_04.vcf.gz.tbi — 95.4 KB
Sample_05.vcf.gz — 509.8 KB
Sample_05.vcf.gz.tbi — 96.5 KB
Sample_06.vcf.gz — 603.8 KB
Sample_06.vcf.gz.tbi — 96.2 KB
Sample_07.vcf.gz — 573.2 KB
Sample_07.vcf.gz.tbi — 95.8 KB
Sample_08.vcf.gz — 510.6 KB
Sample_08.vcf.gz.tbi — 96.5 KB
Sample_09.vcf.gz — 514.9 KB
Sample_09.vcf.gz.tbi — 96.8 KB
Sample_10.vcf.gz — 581.3 KB
Sample_10.vcf.gz.tbi — 95.8 KB
Sample_11.vcf.gz — 561.1 KB
Sample_11.vcf.gz.tbi — 97.1 KB
Sample_12.vcf.gz — 546.7 KB
Sample_12.vcf.gz.tbi — 97.2 KB
Sample_13.vcf.gz — 529.0 KB
Sample_13.vcf.gz.tbi — 96.6 KB
Sample_14.vcf.gz — 587.2 KB
Sample_14.vcf.gz.tbi — 96.0 KB
Sample_15.vcf.gz — 543.5 KB
Sample_15.vcf.gz.tbi — 96.8 KB
Sample_16.vcf.gz — 588.3 KB
Sample_16.vcf.gz.tbi — 96.0 KB
Sample_17.vcf.gz — 541.1 KB
Sample_17.vcf.gz.tbi — 96.9 KB
Sample_18.vcf.gz — 531.3 KB
Sample_18.vcf.gz.tbi — 95.0 KB
Sample_19.vcf.gz — 537.5 KB
Sample_19.vcf.gz.tbi — 96.9 KB
Sample_20.vcf.gz — 530.7 KB
Sample_20.vcf.gz.tbi — 94.8 KB
Sample_21.vcf.gz — 541.9 KB
Sample_21.vcf.gz.tbi — 97.2 KB
Sample_22.vcf.gz — 603.6 KB
Sample_22.vcf.gz.tbi — 96.6 KB
Sample_23.vcf.gz — 530.2 KB
Sample_23.vcf.gz.tbi — 96.8 KB
Sample_24.vcf.gz — 532.0 KB
Sample_24.vcf.gz.tbi — 96.6 KB
Sample_25.vcf.gz — 546.4 KB
Sample_25.vcf.gz.tbi — 97.0 KB
Sample_26.vcf.gz — 583.9 KB
Sample_26.vcf.gz.tbi — 95.4 KB
Sample_27.vcf.gz — 575.1 KB
Sample_27.vcf.gz.tbi — 95.3 KB
Sample_28.vcf.gz — 596.4 KB
Sample_28.vcf.gz.tbi — 96.3 KB
Sample_29.vcf.gz — 549.5 KB
Sample_29.vcf.gz.tbi — 97.0 KB
Sample_30.vcf.gz — 575.4 KB
Sample_30.vcf.gz.tbi — 95.2 KB
Sample_31.vcf.gz — 574.3 KB
Sample_31.vcf.gz.tbi — 95.8 KB
Sample_32.vcf.gz — 574.9 KB
Sample_32.vcf.gz.tbi — 95.5 KB
Sample_33.vcf.gz — 583.4 KB
Sample_33.vcf.gz.tbi — 95.8 KB
Sample_34.vcf.gz — 545.2 KB
Sample_34.vcf.gz.tbi — 97.2 KB
Sample_35.vcf.gz — 535.9 KB
Sample_35.vcf.gz.tbi — 94.9 KB
Sample_X.vcf.gz — 514.1 KB
Sample_X.vcf.gz.tbi — 95.8 KB

Total: 72 file(s), 22.8 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is "Sample_13, Sample_05" Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb013 — human-solvable

Question

Using HGVS nomenclature, what is the name of the human gene being evaluated in this data set? Provide the official gene symbol.

Data files (staged into the agent's work_dir from hb013.zip · download zip):

Dataset1.csv — 25.7 KB

Total: 1 file(s), 25.7 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is SLC6A1 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb014 — human-difficult

Question

What is the mitochondrial haplogroup of this patient/sample?

Data files (staged into the agent's work_dir from hb014.zip · download zip):

read_1.fastq.gz — 3.6 MB
read_2.fastq.gz — 3.9 MB
reference_anonymized.fasta — 16.7 KB

Total: 3 file(s), 7.5 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is mitochondrial haplogroup: L0 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb015 — human-solvable

Question

Which sample is SAMPLE_02's offspring?

Data files (staged into the agent's work_dir from hb015.zip · download zip):

cleaned_genotypes.vcf.gz — 62.9 MB

Total: 1 file(s), 62.9 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is SAMPLE_03 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb016 — human-solvable

Question

In this Bicyclus anynana (butterfly) RNA-seq dataset, samples were collected across five distinct developmental time points. To verify the developmental chronology, order the stages from earliest to latest. For each stage, group together the sample IDs of all biological replicates. Format your answer as a list where each line contains the developmental stage name followed by its replicate sample IDs in square brackets (Stage_0: [Sample_01, Sample_02] Stage_1: [Sample_03, Sample_04, Sample_05])

Data files (staged into the agent's work_dir from hb016.zip · download zip):

butterfly_data.csv — 1.5 MB

Total: 1 file(s), 1.5 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is: Stage_0: [Sample_01, Sample_02, Sample_03]; Stage_1: [Sample_04, Sample_05, Sample_06]; Stage_2: [Sample_07, Sample_08, Sample_09]; Stage_3: [Sample_10, Sample_11, Sample_12]; Stage_4: [Sample_13, Sample_14, Sample_15]. Accept any stage naming (e.g., numbered or descriptive like Larva/Pupa) as long as sample groupings and chronological order are correct. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb017 — human-solvable

Question

This RNA-sequencing dataset contains gene expression profiles from mouse tibial muscle collected across multiple timepoints after injury. Each sample has been anonymized and assigned to one of two genetic groups: Group_1 and Group_2. One group represents Wild-Type (WT) mice, and the other represents a transgenic NSE-BMP4 mouse model. Which samples belong to the transgenic mouse model group? Format your answer as a list of sample IDs in square brackets (e.g., [Sample_01, Sample_02, Sample_17]).

Data files (staged into the agent's work_dir from hb017.zip · download zip):

cleaned_counts.csv — 2.1 MB

Total: 1 file(s), 2.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Sample_10, Sample_11, Sample_12, Sample_13, Sample_14, Sample_15, Sample_16, Sample_17, Sample_18, Sample_19, Sample_20, Sample_21 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb018 — human-solvable

Question

What mouse reference genome were the RNA-seq reads aligned to? Provide the reference genome name using the UCSC-style identifier (e.g., mm9, mm10).

Data files (staged into the agent's work_dir from hb018.zip · download zip):

archive.tar.gz.part-aa — 500.0 MB
archive.tar.gz.part-ab — 500.0 MB
archive.tar.gz.part-ac — 500.0 MB
archive.tar.gz.part-ad — 500.0 MB
archive.tar.gz.part-ae — 500.0 MB
archive.tar.gz.part-af — 199.2 MB

Total: 6 file(s), 2.64 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. mm10 (aka GRCm38) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb019 — human-solvable

Question

What viral species is the human patient infected with? Provide the answer using ICTV viral abbreviations (e.g. ZIKV, YFV, HCV, etc).

Data files (staged into the agent's work_dir from hb019.zip · download zip):

hb019_subsampled_data.fastq — 1.9 MB

Total: 1 file(s), 1.9 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. EBOV Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb020 — human-solvable

Question

What organism does this crystal structure belong to? Provide the answer using binomial nomenclature (e.g. Canis lupus, Mus musculus, Danio rerio, etc).

Data files (staged into the agent's work_dir from hb020.zip · download zip):

data_scrubbed.cif — 2.8 MB

Total: 1 file(s), 2.8 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Homo sapiens Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb021 — human-difficult

Question

This human dataset contains DNA methylation profiles generated using the Illumina 450k array across multiple tissues. Each tissue exhibits a distinct, tissue-specific methylation signature. Based on these patterns, determine which internal organ the following anonymized sample IDs belong to (replicates): Sample IDs: [Sample_01, Sample_02, Sample_03, Sample_04]. Provide your answer as the name of a single organ.

Data files (staged into the agent's work_dir from hb021.zip · download zip):

processed_methylation_matrix.csv — 311.9 MB

Total: 1 file(s), 311.9 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Pancreas Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb022 — human-difficult

Question

This dataset contains gene expression profiles from human pancreatic cancer cell lines. The samples have been anonymized as Sample_01, Sample_02, ... and divided into two experimental conditions: Condition_X and Condition_Y. One of these conditions represents cells treated with the drug Erastin, which is known to induce metabolic stress. Which samples correspond to the Erastin-treated group? Format the answer as a list of sample identifiers exactly matching those in the dataset (e.g., [Sample_01, Sample_02, ...]).

Data files (staged into the agent's work_dir from hb022.zip · download zip):

anonymized_expression_matrix.csv — 1.5 MB

Total: 1 file(s), 1.5 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is: Sample_01', 'Sample_02', 'Sample_03', 'Sample_04', 'Sample_05', 'Sample_06', 'Sample_07', 'Sample_08' Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb023 — human-solvable

Question

From the data provided, which samples out of the total correspond to being derived from seawater and which to sediment? Format the answer as a list of which samples (and their IDs) are seawater and which are sediment (e.g. Seawater (#X): 1, 2,... Sediment (#Y): ...)

Data files (staged into the agent's work_dir from hb023.zip · download zip):

data.csv — 264.7 KB

Total: 1 file(s), 264.7 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Seawater (12): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 Sediment (11): 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb024 — human-difficult

Question

This dataset involves microbial communities from tissues of non-human species. How many tissue groups are collected, and from how many different species? If possible, additionally address which host species and from what tissues. Provide counts as integers, species as scientific names, and list tissues explicitly.

Data files (staged into the agent's work_dir from hb024.zip · download zip):

1.fasta — 65.6 KB
10.fasta — 65.6 KB
11.fasta — 65.7 KB
12.fasta — 65.7 KB
13.fasta — 65.7 KB
14.fasta — 65.9 KB
15.fasta — 65.7 KB
16.fasta — 65.8 KB
17.fasta — 65.7 KB
18.fasta — 65.7 KB
2.fasta — 65.8 KB
3.fasta — 52.0 KB
4.fasta — 62.6 KB
5.fasta — 65.4 KB
6.fasta — 65.4 KB
7.fasta — 65.4 KB
8.fasta — 65.5 KB
9.fasta — 65.4 KB

Total: 18 file(s), 1.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is: 2 different tissues (mouth and gut) from 3 different species (3 snakes: Boiga dendrophila, Trimeresurus flavomaculatus, Laticauda laticauda) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb025 — human-difficult

Question

Using columns 3 and 4, which genes are the subject of single-locus modification, as indicated by a vector where the NCBI RefSeq identifier for the 5' arm is identical to the NCBI RefSeq identifier for the 3' arm regardless of whether the specific vector listed is a CRISPR or Targeting vector? Provide the answer as a list of official gene symbols.

Data files (staged into the agent's work_dir from hb025.zip · download zip):

clone_seq.csv — 21.7 KB

Total: 1 file(s), 21.7 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is 'Cacna1c', 'Camk2d', 'Ctnna3', 'Fhit', 'Fmr1', 'Macrod2', 'Mecp2', 'Prkn' Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb026 — human-solvable

Question

From this adult mouse snATAC-seq fragments file, what major organ does the sample derive from? Provide the answer as the organ name within the set of all primary organs part of an organ system, as described by the MGI - Adult Mouse Anatomy Browser (e.g. lung, brain, blood, liver, etc).

Data files (staged into the agent's work_dir from hb026.zip · download zip):

hb026_subsampled_data.tsv — 2.0 MB

Total: 1 file(s), 2.0 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. heart Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb027 — human-difficult

Question

Based on the gene expression profiles in this anonymized microarray dataset derived from human ovarian tissue, determine the number of tumor samples vs normal samples. Report the counts as 'X tumor samples and Y normal samples' where X and Y are integers.

Data files (staged into the agent's work_dir from hb027.zip · download zip):

expression_matrix.tsv — 160.4 MB

Total: 1 file(s), 160.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is that the dataset contains 173 tumor samples and 217 normal samples Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb028 — human-solvable

Question

What disease do these samples have in common? Provide the common disease name.

Data files (staged into the agent's work_dir from hb028.zip · download zip):

Metabolomics_Data_Set.xlsx — 25.7 KB

Total: 1 file(s), 25.7 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Type 2 Diabetes Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb029 — human-solvable

Question

Which of the hippocampus RNA-seq samples came from sleep deprived mice? List the sample names (e.g., sample1, sample2) for the sleep deprived mice.

Data files (staged into the agent's work_dir from hb029.zip · download zip):

hb029_counts_cleaned.csv — 2.2 MB

Total: 1 file(s), 2.2 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is NSD: sample3, sample4, sample7, sample8. SD: sample1, sample2, sample5, sample6 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb030 — human-difficult

Question

Which embryonic cell types are the following RNA-seq experiments from, in order of the counts file? There are 2 replicates from 5 different tissues in Xenopus tropicalis. Provide the answers as Xenopus gastrula fate-map region names (e.g., Animal Cap, Marginal Zone, Organizer).

Data files (staged into the agent's work_dir from hb030.zip · download zip):

GeneExpression_tropicalis.txt — 26.1 MB
Xtropicalisv9.0.Named.primaryTrs.gff3 — 25.6 MB
rsem_v9_counts.txt — 1.6 MB

Total: 3 file(s), 53.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is (in order T1-T5): Animal Cap, Dorsal, Lateral, Vegetal Pole, Ventral Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb031 — human-solvable

Question

RNA-seq analysis of these human intestinal organoid samples (virus_rep1, virus_rep2, virus_rep3) infected with an RNA virus shows viral reads. Which viral species infected these samples? Provide your answer as the complete species name from NCBI Taxonomy (e.g., Hepatitis A virus, Rotavirus A, Human astrovirus)

Data files (staged into the agent's work_dir from hb031.zip · download zip):

datafiles_reduced.zip — 1.0 MB

Total: 1 file(s), 1.0 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Norovirus GII.4 (Or just Norovirus) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb032 — human-solvable

Question

What genetic abnormality does sample X have when compared to samples Y and Z? Provide the answer as the standard name for the chromosomal abnormality (e.g., Trisomy X, Monosomy X).

Data files (staged into the agent's work_dir from hb032.zip · download zip):

data.csv — 3.8 MB

Total: 1 file(s), 3.8 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Trisomy 21 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb033 — human-solvable

Question

In this zebrafish RNA-seq dataset, identify which samples come from heart tissue. Format your answer as a list of samples, such as: Sample_1, Sample_2, etc.

Data files (staged into the agent's work_dir from hb033.zip · download zip):

zebrafish_TPM_anonymized.csv — 3.6 MB

Total: 1 file(s), 3.6 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Sample_03 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb035 — human-difficult

Question

What is the name of the most common microRNA associated with the genes given in this data? (Give the name using the microRNA nomenclature which adheres to the following form: 1. species, 2. miRNA family and member, and 3. strand, e.g 5’ , 3’ specific to precursor hairpin, e.g. hsa-miR-102b-3p).

Data files (staged into the agent's work_dir from hb035.zip · download zip):

miRNA_data.csv — 81.5 KB

Total: 1 file(s), 81.5 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is hsa-let-7b-5p Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb036 — human-difficult

Question

The dataset comes from a research orchard, and consists of microbial genomic samples taken from trees grown for an experiment. When preparing the samples for analysis, some became contaminated with Agrobacterium fabrum from another experiment. Which samples are contaminated? Provide the answer by simply listing contaminated samples (i.e. sample_1, sample_3, etc)

Data files (staged into the agent's work_dir from hb036.zip · download zip):

hb036_dataset_reduced.fastq — 149.3 MB

Total: 1 file(s), 149.3 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is sample_2 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb038 — human-solvable

Question

Which of the bigWig files are from ChIP samples and which are from input controls? List the bigWig file names (e.g., sample1.bw) grouped by their type: ChIP samples and input controls.

Data files (staged into the agent's work_dir from hb038.zip · download zip):

hb038.zip — 327.8 MB

Total: 1 file(s), 327.8 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is sample2.bw and sample4.bw are ChIP and sample1.bw and sample3.bw are input Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb039 — human-solvable

Question

Based on the gene expression signatures in Affymetrix Mouse Genome 430 2.0 (GPL1261) microarray dataset from Mus musculus, which organ does Sample_X originate from? Provide the organ name.

Data files (staged into the agent's work_dir from hb039.zip · download zip):

sample_X.CEL.gz — 6.1 MB

Total: 1 file(s), 6.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Heart Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb040 — human-solvable

Question

The dataset contains gene expression measurements for cornea samples, with six columns: three mock-infected samples (cornea_mock_1, cornea_mock_2, cornea_mock_3) and three virus-infected samples (cornea_viral_1, cornea_viral_2, cornea_viral_3). For each gene, expression was measured in either mock-infected or virus-infected samples. Other than HSV-1, what virus are these samples infected with? Provide the virus name or standard abbreviation (e.g., HIV-1, influenza A).

Data files (staged into the agent's work_dir from hb040.zip · download zip):

anonymized_mock_vs_cornea_samples.csv — 724.7 KB

Total: 1 file(s), 724.7 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. SARS-CoV-2 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb041 — human-solvable

Question

What fruit was this methylation dataset generated from? Provide the common name of the fruit.

Data files (staged into the agent's work_dir from hb041.zip · download zip):

methylation_fold_matrix.csv — 263.4 KB

Total: 1 file(s), 263.4 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is tomato Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb043 — human-solvable

Question

The following dataset comes from a mixed cohort of male and female post-mortem prefrontal cortex samples. Using genes with known differential sex expression (i.e. XIST, RPS4Y1, EIF1AY, DDX3Y, UTY), identify which samples came from female subjects. Simply list all female samples as the asnwer (i.e. samples 1, 3, 5, 7... etc)

Data files (staged into the agent's work_dir from hb043.zip · download zip):

alcohol.txt — 22.6 MB

Total: 1 file(s), 22.6 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is samples 5 6 10 11 14 15 17 28 29 33 34 37 38 40 47 48 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb044 — human-solvable

Question

Which set of peaks from histone mark ChIP-seq is from H3K4me3, and which is from H3K4me1? Genome alignment is mm10. Format answer as "peakset1:h3k4mex, peakset2:h3k4mey[...]". Label all four peaksets (peakset1 through peakset4) in your answer.

Data files (staged into the agent's work_dir from hb044.zip · download zip):

peakset1.bed — 5.8 MB
peakset2.bed — 1.6 MB
peakset3.bed — 5.1 MB
peakset4.bed — 1.6 MB

Total: 4 file(s), 14.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is peakset1:h3k4me1, peakset2:h3k4me3, peakset3:h3k4me1, peakset4:h3k4me3 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb045 — human-difficult

Question

Based on the read files from this Gossypium hirsutum transcriptomics dataset, what tissue type is this data sampled from? Provide the tissue type as a simple anatomical term (e.g., leaf, root, stem).

Data files (staged into the agent's work_dir from hb045.zip · download zip):

cotton_r1/archive.tar.gz.part-aa — 500.0 MB
cotton_r1/archive.tar.gz.part-ab — 500.0 MB
cotton_r1/archive.tar.gz.part-ac — 184.6 MB
cotton_r2/archive.tar.gz.part-aa — 500.0 MB
cotton_r2/archive.tar.gz.part-ab — 500.0 MB
cotton_r2/archive.tar.gz.part-ac — 221.5 MB

Total: 6 file(s), 2.35 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Root tissue Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb046 — human-solvable

Question

Given the mass spectrometry data, identify the taxonomic family of the organism from which the data is derived. Provide the scientific name of the taxonomic family (e.g., Muridae, Hominidae).

Data files (staged into the agent's work_dir from hb046.zip · download zip):

archive.tar.gz.part-aa — 500.0 MB
archive.tar.gz.part-ab — 500.0 MB
archive.tar.gz.part-ac — 500.0 MB
archive.tar.gz.part-ad — 500.0 MB
archive.tar.gz.part-ae — 35.5 MB

Total: 5 file(s), 1.99 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Muridae Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb047 — human-difficult

Question

Given the ATAC-seq dataset obtained from hiPSC samples, identify the six samples that were treated with one of two variant class I HDAC inhibitor drugs. Provide the answer as a comma separated sample list (e.g. Sample_1, Sample_2, Sample_3, etc).

Data files (staged into the agent's work_dir from hb047.zip · download zip):

data_scrubbed.txt — 6.0 MB

Total: 1 file(s), 6.0 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Sample_1, Sample_2, Sample_3, Sample_7, Sample_8, Sample_9 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb048 — human-solvable

Question

What parent drug was administered to this patient given the untargeted mass spectrometry data? Provide the answer as its generic drug name.

Data files (staged into the agent's work_dir from hb048.zip · download zip):

data_scrubbed.mzML — 42.1 MB

Total: 1 file(s), 42.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Diphenhydramine Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb049 — human-solvable

Question

From microbial data provided, identify the habitat from which these samples were collected.

Data files (staged into the agent's work_dir from hb049.zip · download zip):

data_1_.csv — 206.6 KB

Total: 1 file(s), 206.6 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer should identify the habitat as coastal/intertidal/marine. Accept: coast, intertidal zone, marine, rocky intertidal, tidepool, or any answer that correctly identifies a coastal/marine intertidal habitat. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb050 — human-solvable

Question

From the microbial dataset provided, identify what human-caused pollutant(s) is/are present in this environment based on the the core taxa. Provide the pollutant type or category, and if applicable, list specific chemical compounds using standard chemical names or formulas.

Data files (staged into the agent's work_dir from hb050.zip · download zip):

data.csv — 38.9 KB

Total: 1 file(s), 38.9 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Aquaculture pollutants. Namely nitrogen, but can be as specific as: Ammonium (NH₄⁺), Nitrate (NO₃⁻), and Phosphorous (PO₄³⁻) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb052 — human-difficult

Question

This microarray dataset contains expression data from various ovarian and pancreatic cancer biopsies ran on GPL 887 and GPL 4133 platforms, respectively. Based on the expression profile of each sample, assign a sex (male, female, or unknown) to each sample. The final output should be in the format of “Sample; sex” (eg Sample 1; male)

Data files (staged into the agent's work_dir from hb052.zip · download zip):

Cancers.csv — 18.9 MB
refseq_to_symbol_map.csv — 687.4 KB

Total: 2 file(s), 19.6 MB

Answer rubric (spoiler)

Award full credit if at least 95% of sample sex assignments are correct. The answer is "Sample_1 ; female Sample_2 ; male Sample_3 ; unknown Sample_4 ; unknown Sample_5 ; male Sample_6 ; male Sample_7 ; unknown Sample_8 ; female Sample_9 ; unknown Sample_10 ; female Sample_11 ; female Sample_12 ; male Sample_13 ; male Sample_14 ; male Sample_15 ; female Sample_16 ; unknown Sample_17 ; male Sample_18 ; female Sample_19 ; male Sample_20 ; male Sample_21 ; female Sample_22 ; female Sample_23 ; female Sample_24 ; female Sample_25 ; male Sample_26 ; male Sample_27 ; female Sample_28 ; female Sample_29 ; unknown Sample_30 ; female Sample_31 ; male Sample_32 ; male Sample_33 ; unknown Sample_34 ; female Sample_35 ; unknown Sample_36 ; female Sample_37 ; unknown Sample_38 ; female Sample_39 ; female Sample_40 ; female Sample_41 ; female Sample_42 ; female Sample_43 ; unknown Sample_44 ; female Sample_45 ; male Sample_46 ; unknown Sample_47 ; unknown Sample_48 ; female Sample_49 ; unknown" Score 1.0 if at least 95% correct, 0.0 otherwise. No partial credit.

hb053 — human-difficult

Question

The transcriptome of Brachypodium distachyon was sequenced under a specific stress. Based on the sequences of differentially expressed genes, what was the perturbation? Provide the stress type as a short descriptive phrase (e.g., 'drought stress', 'cold stress').

Data files (staged into the agent's work_dir from hb053.zip · download zip):

B.distachyon_DEG_scrubbed.txt — 53.5 KB

Total: 1 file(s), 53.5 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Heat stress Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

hb054 — human-solvable

Question

The following PRO-seq datasets were collected from different samples of mice cells. Some of the samples were treated with a drug that causes the endogenous NELF-B protein to be rapidly degraded within the samples. Identify which of the sample(s) were treated with NELF-B. Provide the answer as a comma separated list of sample/strand identifiers, in alphanumeric order (e.g. Sample_1_minus, Sample_1_plus, etc).

Data files (staged into the agent's work_dir from hb054.zip · download zip):

Sample_1_minus.bw — 84.6 MB
Sample_1_plus.bw — 80.2 MB
Sample_2_minus.bw — 81.5 MB
Sample_2_plus.bw — 77.6 MB
Sample_3_minus.bw — 24.5 MB
Sample_3_plus.bw — 24.5 MB

Total: 6 file(s), 372.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The answer is Sample_3_minus and Sample_3_plus. Answers that include any other samples (e.g. Sample_2) are incorrect. Score 1.0 only if the answer is exactly those two samples, 0.0 otherwise. No partial credit.

rec1vycgih4bavtur — human-difficult

Question

Name 1 gene on chromosome 19 that is knocked out in this mouse tumor cell line. Provide the official gene symbol.

Data files (staged into the agent's work_dir from rec1vycgih4bavtur.zip · download zip):

read1.fastq.gz — 629.8 MB
read2.fastq.gz — 654.9 MB

Total: 2 file(s), 1.25 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Gm38656, Scgb2a2, Ms4a12, Gm57661, Gm41844 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rec35farlwqz6kmy7 — human-solvable

Question

What freshly FACS-sorted cell type of PBMC (cellType in the SingleCellExperiment object) is this bulk RNA-seq count data most likely derived from? Provide the exact cellType label as it appears in the SingleCellExperiment object.

Data files (staged into the agent's work_dir from rec35farlwqz6kmy7.zip · download zip):

anonymized_pbmc_reference.rds — 734.2 MB
bulk_RNA_seq_counts.csv — 1021.7 KB

Total: 2 file(s), 735.2 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Treg Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rec4kcr3oroe3jc1j — human-solvable

Question

Using the provided normalized RNA-sequencing time-series data, identify which gene displays circadian rhythmicity (24-hour periodicity). Provide your answer as the gene name exactly as it appears in the dataset (e.g., GENE1).

Data files (staged into the agent's work_dir from rec4kcr3oroe3jc1j.zip · download zip):

RHYTHMIC.txt — 22.3 KB

Total: 1 file(s), 22.3 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is GENE15, gene displaying circadian rhythmicity, formatted as the gene name (e.g., GENE15). Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rec5qx7nedrwk4zog — human-solvable

Question

Identify the type of cDNA (complementary DNA) sequence from a whole transcriptome RNA sequencing library. Provide the type of cDNA as a general element category name (e.g., a transposable element family or transcript class).

Data files (staged into the agent's work_dir from rec5qx7nedrwk4zog.zip · download zip):

cDNA_seq.fa — 6.0 KB

Total: 1 file(s), 6.0 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is LINE, acceptable as LINE, L1, L1HS, or LINE transposable element. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rec5xuqc70ithi19c — human-solvable

Question

During sample collection for a multi-tissue gene expression study, several sample labeling errors occurred, and specific tissue subtype labels were masked with generic identifiers(e.g. brain subsample 1,2,3). Given RNA-seq expression profiles for samples from multiple donors, identify the masked tissue labels, and determine the true donor and tissue identity for each sample. Report your answer as a table with columns: Sample, Tissue, and Sex (use tissue subtype names such as 'Cortex', 'Cerebellum', etc., and Sex as 'Male' or 'Female').

Data files (staged into the agent's work_dir from rec5xuqc70ithi19c.zip · download zip):

expression_matrix.csv — 20.0 MB
sample_metadata_masked.csv — 2.4 KB

Total: 2 file(s), 20.0 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Sample Tissue Sex S004 Atrial Female S013 Cerebellum Female S015 Blood Male S026 Liver Female S027 Cortex Female S035 Cortex Male S045 Liver Male S051 Caudate Male S060 Ventricular Male S068 Liver Female Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rec6urlqzwqkhmhaj — human-solvable

Question

This RNA-seq sample contains multiple subtypes of influenza. Please find all of subtypes in H,N format. For example (H2N10)... List all subtypes as a comma-separated list in HxNx format (e.g., H1N1, H2N2).

Data files (staged into the agent's work_dir from rec6urlqzwqkhmhaj.zip · download zip):

read1.fastq.gz — 386.6 MB

Total: 1 file(s), 386.6 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer includes all six influenza subtypes: H1N1, H2N2, H3N2, H5N1, H7N9, and H9N2. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rec6xeqyddiz6desi — human-solvable

Question

Identify the genomic coordinate (hg38 assembly) of the primary on-target Cas9 cleavage site from paired-end whole-genome sequencing data of CRISPR–edited human cell line. Report the coordinate as the position of the blunt end of the Cas9 cut (3bp upstream of the PAM, 1-based hg38 coordinates). Report the coordinate in the format chr#:position (e.g., chr1:12345678).

Data files (staged into the agent's work_dir from rec6xeqyddiz6desi.zip · download zip):

annonymize1.fastq — 349.5 MB
annonymize2.fastq — 349.5 MB

Total: 2 file(s), 699.0 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is the Cas9 cut site that is located at chr3: 46373175 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rec9ogrlqg5u0ke09 — human-solvable

Question

You are provided with 10 anonymized human RNA-seq BigWig files (sample01.bigWig to sample10.bigWig). Four samples were prepared using Ribo-Zero rRNA depletion, and six samples were prepared using oligo(dT) polyA selection. Using GENCODE v46 gene annotations (hg38), identify the four Ribo-Zero samples based on the distribution of RNA-seq signal across genomic features (exons, introns, and intergenic regions). List the four Ribo-Zero sample filenames exactly as provided (e.g., sample01.bigWig), separated by commas.

Data files (staged into the agent's work_dir from rec9ogrlqg5u0ke09.zip · download zip):

Sample1.bw — 39.2 MB
Sample10.bw — 22.1 MB
Sample2.bw — 35.5 MB
Sample3.bw — 41.9 MB
Sample4.bw — 43.0 MB
Sample5.bw — 25.4 MB
Sample6.bw — 27.4 MB
Sample7.bw — 27.2 MB
Sample8.bw — 26.0 MB
Sample9.bw — 20.1 MB

Total: 10 file(s), 307.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is Sample1.bigwig, Sample2.bigwig, Sample3.bigwig, and Sample4.bigwig, listing all four files. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recaikavdwoimjy3b — human-difficult

Question

Which genes was knocked out in the sample 1,2,3 versus 4,5,6? Provide the knocked-out gene(s) as official gene symbols, separated by commas if multiple.

Data files (staged into the agent's work_dir from recaikavdwoimjy3b.zip · download zip):

anonyomized_rnaseq_count.tsv.gz — 475.9 KB

Total: 1 file(s), 475.9 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer are knock out genes: ALKBH5 and FTO Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recartntbrx7kwzkv — human-solvable

Question

Which 2 samples in this set are close relatives. Provide the two sample identifiers (e.g., sample1, sample2) as they appear in the dataset.

Data files (staged into the agent's work_dir from recartntbrx7kwzkv.zip · download zip):

sample1.vcf.gz — 276.7 MB
sample10.vcf.gz — 276.6 MB
sample11.vcf.gz — 276.7 MB
sample12.vcf.gz — 276.6 MB
sample13.vcf.gz — 276.6 MB
sample14.vcf.gz — 276.7 MB
sample15.vcf.gz — 276.7 MB
sample16.vcf.gz — 276.7 MB
sample17.vcf.gz — 276.6 MB
sample18.vcf.gz — 276.7 MB
sample19.vcf.gz — 276.6 MB
sample2.vcf.gz — 276.6 MB
sample20.vcf.gz — 276.6 MB
sample21.vcf.gz — 276.7 MB
sample22.vcf.gz — 276.6 MB
sample23.vcf.gz — 276.6 MB
sample24.vcf.gz — 276.7 MB
sample25.vcf.gz — 275.8 MB
sample26.vcf.gz — 275.9 MB
sample27.vcf.gz — 275.9 MB
sample28.vcf.gz — 281.9 MB
sample29.vcf.gz — 281.9 MB
sample3.vcf.gz — 276.6 MB
sample4.vcf.gz — 276.6 MB
sample5.vcf.gz — 276.6 MB
sample6.vcf.gz — 276.6 MB
sample7.vcf.gz — 276.7 MB
sample8.vcf.gz — 276.7 MB
sample9.vcf.gz — 276.7 MB

Total: 29 file(s), 7.84 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: sample28 AND sample29 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recav6jt6q0aa9sjs — human-solvable

Question

Which cell line is this RNA-seq gene expression data from?

Candidate cell lines (24 total): K562, HL-60, GM12878, DND-41, THP-1, HepG2, A549, IMR-90, MCF-7, MDA-MB-231, HCT116, Caco-2, SW620, SK-N-SH, SH-SY5Y, BE2C, PC-3, DU145, Panc1, HEK293, H1, H9, BJ, HFFc6

Data files (staged into the agent's work_dir from recav6jt6q0aa9sjs.zip · download zip):

sample_A_gene_quantifications.tsv — 10.4 MB

Total: 1 file(s), 10.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is K562. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recbkqinqhpfdn9bq — human-solvable

Question

What pathogen was sequenced to produce these RNA-seq reads? Please report the standard scientific name of genus and species.

Data files (staged into the agent's work_dir from recbkqinqhpfdn9bq.zip · download zip):

read1.fastq.gz — 2.9 MB

Total: 1 file(s), 2.9 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Trypanosoma brucei Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recc3vmqjrsefqw57 — human-solvable

Question

Which exon of the BRCA1 gene contains a homozygous deletion in this sample? Provide the Ensembl exon ID (e.g., ENSE00004011560).

Data files (staged into the agent's work_dir from recc3vmqjrsefqw57.zip · download zip):

BRCA1_ANON_R1.fastq.gz — 544.6 MB
BRCA1_ANON_R2.fastq.gz — 547.1 MB

Total: 2 file(s), 1.07 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is ENSE00004011566, formatted as an BRCA1 Ensembl exon ID (e.g., ENSE00004011560, ENSE00004011565, ENSE00004011568). Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

reccipvrmk1k0gqkr — human-solvable

Question

Which 2 protein-coding genes are knocked out in this yeast strain? Provide the standard gene symbols for the knocked out genes.

Data files (staged into the agent's work_dir from reccipvrmk1k0gqkr.zip · download zip):

read1.fastq.gz — 330.7 MB
read2.fastq.gz — 345.5 MB

Total: 2 file(s), 676.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The primary answer is LEU2 and URA3 (the unique homozygous auxotrophic knockouts). Also accept if the answer additionally mentions HIS3, LYS2, or MET15/MET17, as these are also knocked out in the BY4743 background strain. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

reccniibn7ary80hj — human-solvable

Question

What chromosome is triplicated in this sample? Provide the chromosome number (e.g., 'chromosome 21' or just '21').

Data files (staged into the agent's work_dir from reccniibn7ary80hj.zip · download zip):

uploadData/read1.part_001.fastq.gz — 942.2 MB
uploadData/read1.part_002.fastq.gz — 942.3 MB
uploadData/read1.part_003.fastq.gz — 942.3 MB
uploadData/read1.part_004.fastq.gz — 942.3 MB
uploadData/read1.part_005.fastq.gz — 942.2 MB
uploadData/read1.part_006.fastq.gz — 942.5 MB
uploadData/read1.part_007.fastq.gz — 942.2 MB
uploadData/read1.part_008.fastq.gz — 942.3 MB
uploadData/read1.part_009.fastq.gz — 942.2 MB
uploadData/read1.part_010.fastq.gz — 942.4 MB
uploadData/read1.part_011.fastq.gz — 942.3 MB
uploadData/read1.part_012.fastq.gz — 942.3 MB
uploadData/read2.part_001.fastq.gz — 945.3 MB
uploadData/read2.part_002.fastq.gz — 945.7 MB
uploadData/read2.part_003.fastq.gz — 945.2 MB
uploadData/read2.part_004.fastq.gz — 945.2 MB
uploadData/read2.part_005.fastq.gz — 945.2 MB
uploadData/read2.part_006.fastq.gz — 945.7 MB
uploadData/read2.part_007.fastq.gz — 945.2 MB
uploadData/read2.part_008.fastq.gz — 945.2 MB
uploadData/read2.part_009.fastq.gz — 945.3 MB
uploadData/read2.part_010.fastq.gz — 945.7 MB
uploadData/read2.part_011.fastq.gz — 945.2 MB
uploadData/read2.part_012.fastq.gz — 945.2 MB

Total: 24 file(s), 22.12 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: chromosome 21 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

reccslfjnjcfdpgak — human-difficult

Question

Given mtDNA whole genome sequencing data from a mother-son pair, identify the number of homoplasmic variants (≥95% heteroplasmy) that demonstrate maternal inheritance patterns. Report your answer as a single integer representing the count of variants.

Data files (staged into the agent's work_dir from reccslfjnjcfdpgak.zip · download zip):

MOTHER_ANON_R1.fastq.gz — 679.8 MB
MOTHER_ANON_R2.fastq.gz — 768.3 MB
SON_ANON_R1.fastq.gz — 692.6 MB
SON_ANON_R2.fastq.gz — 780.8 MB

Total: 4 file(s), 2.85 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: 10 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

reccwgc4buredxvyz — human-solvable

Question

Which two samples were swapped in the sequencing run? Provide the sample names or identifiers for the two swapped samples.

Data files (staged into the agent's work_dir from reccwgc4buredxvyz.zip · download zip):

data_files/sample10_R1.fastq.gz — 756.1 MB
data_files/sample10_R2.fastq.gz — 707.7 MB
data_files/sample11_R1.fastq.gz — 751.3 MB
data_files/sample11_R2.fastq.gz — 717.5 MB
data_files/sample12_R1.fastq.gz — 666.0 MB
data_files/sample12_R2.fastq.gz — 803.5 MB
data_files/sample13_R1.fastq.gz — 664.4 MB
data_files/sample13_R2.fastq.gz — 794.2 MB
data_files/sample14_R1.fastq.gz — 753.1 MB
data_files/sample14_R2.fastq.gz — 702.2 MB
data_files/sample15_R1.fastq.gz — 666.0 MB
data_files/sample15_R2.fastq.gz — 801.6 MB
data_files/sample16_R1.fastq.gz — 750.6 MB
data_files/sample16_R2.fastq.gz — 696.8 MB
data_files/sample17_R1.fastq.gz — 663.1 MB
data_files/sample17_R2.fastq.gz — 794.6 MB
data_files/sample18_R1.fastq.gz — 753.6 MB
data_files/sample18_R2.fastq.gz — 706.9 MB
data_files/sample1_R1.fastq.gz — 661.5 MB
data_files/sample1_R2.fastq.gz — 797.1 MB
data_files/sample2_R1.fastq.gz — 665.3 MB
data_files/sample2_R2.fastq.gz — 799.2 MB
data_files/sample3_R1.fastq.gz — 660.9 MB
data_files/sample3_R2.fastq.gz — 788.3 MB
data_files/sample4_R1.fastq.gz — 754.8 MB
data_files/sample4_R2.fastq.gz — 696.4 MB
data_files/sample5_R1.fastq.gz — 662.4 MB
data_files/sample5_R2.fastq.gz — 800.9 MB
data_files/sample6_R1.fastq.gz — 659.6 MB
data_files/sample6_R2.fastq.gz — 788.7 MB
data_files/sample7_R1.fastq.gz — 750.9 MB
data_files/sample7_R2.fastq.gz — 700.2 MB
data_files/sample8_R1.fastq.gz — 749.2 MB
data_files/sample8_R2.fastq.gz — 701.3 MB
data_files/sample9_R1.fastq.gz — 747.3 MB
data_files/sample9_R2.fastq.gz — 692.1 MB
data_files/sample_info.csv — 383 B

Total: 37 file(s), 25.61 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is sample3 and sample10 have been switched, sample3 belongs to cell_type_2 and sample10 belongs to cell_type_1. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rece8yuamgclcpj9i — human-solvable

Question

Given gene expression profiles from 4 drug treatments in OCI-LY3 cells, match each treatment group to the correct compound from the following candidates: Geldanamycin, Trichostatin A, Rapamycin, Doxorubicin. Provide your answer as a list matching each treatment group (A, B, C, D) to its compound, e.g., 'Group A: [Compound]'.

Data files (staged into the agent's work_dir from rece8yuamgclcpj9i.zip · download zip):

anonymized_expression.txt — 4.4 MB

Total: 1 file(s), 4.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Group A: Geldanamycin Group B: Trichostatin A Group C: Rapamycin Group D: Doxorubicin Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recea4hqimc4sypon — human-solvable

Question

The whole genome CpG methylation data are generated for different donor's different tissue types. Each data file represents one donor's certain tissue's genome wide CpG methylation profile. During sample transfer and processing, donor's different tissue data may get swapped or mislabelled. From the list of 16 CpG methylation bed files (labeled as -.bed.gz) from 7 donors across 4 different tissue types, identify which donor's identity or tissue type information has been mislabelled. Report your findings by specifying which donor(s) and tissue type(s) are mislabelled, and describe what the correct labels should be (e.g., 'Donor X's tissue A is mislabelled as tissue B').

Data files (staged into the agent's work_dir from recea4hqimc4sypon.zip · download zip):

task2_data/adrenal_gland-donor2.bed.gz — 669.5 MB
task2_data/adrenal_gland-donor3.bed.gz — 715.1 MB
task2_data/adrenal_gland-donor6.bed.gz — 607.6 MB
task2_data/adrenal_gland-donor7.bed.gz — 638.3 MB
task2_data/heart-donor1.bed.gz — 720.1 MB
task2_data/heart-donor3.bed.gz — 736.8 MB
task2_data/heart-donor5.bed.gz — 669.1 MB
task2_data/heart-donor6.bed.gz — 662.5 MB
task2_data/spleen-donor1.bed.gz — 661.3 MB
task2_data/spleen-donor2.bed.gz — 672.7 MB
task2_data/spleen-donor3.bed.gz — 716.1 MB
task2_data/spleen-donor4.bed.gz — 657.3 MB
task2_data/spleen-donor5.bed.gz — 668.9 MB
task2_data/stomach-donor1.bed.gz — 670.0 MB
task2_data/stomach-donor2.bed.gz — 673.5 MB
task2_data/stomach-donor3.bed.gz — 738.3 MB
task2_data/stomach-donor5.bed.gz — 647.3 MB
task2_data/stomach-donor7.bed.gz — 641.9 MB

Total: 18 file(s), 11.88 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: No donor mislabelled incident. Donor 3's spleen tissue is mislabelled as stomach. Donor 3's stomach tissue is mislabelled as spleen. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recffr4vmqdynph2n — human-difficult

Question

Which protein-coding C. elegans gene has a heterozygous knockout in this sample? Provide the gene symbol.

Data files (staged into the agent's work_dir from recffr4vmqdynph2n.zip · download zip):

read1.fastq.gz — 444.2 MB
read2.fastq.gz — 439.4 MB

Total: 2 file(s), 883.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: alh-2 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recgrmy32hxalop9m — human-solvable

Question

You are provided with anonymized ChIP-seq peak files generated from multiple histone modification experiments (e.g., H3K4me1/2/3, H3K27ac, H3K27me3, H3K36me3, H3K79me1/2, H3K9me3, H2A.Z, and others; up to 30 distinct histone marks). The histone modification names and filenames are anonymized. Identify which sample corresponds to the H3K36me3 histone modification experiment.

Data Characteristics Genome assembly: hg19 Cell line: Human H1 embryonic stem cells (H1 ESC) File format: BED (chr, start, end, name, score, strand) Number of anonymized peak files: 30 Design: One sample per histone modification. Provide your answer as the filename (e.g., SampleX.bed).

Data files (staged into the agent's work_dir from recgrmy32hxalop9m.zip · download zip):

Sample1.bed — 6.7 MB
Sample10.bed — 11.7 MB
Sample11.bed — 9.6 MB
Sample12.bed — 5.3 MB
Sample13.bed — 8.0 MB
Sample14.bed — 14.3 MB
Sample15.bed — 2.5 MB
Sample16.bed — 16.0 MB
Sample17.bed — 4.6 MB
Sample18.bed — 8.8 MB
Sample19.bed — 7.3 MB
Sample2.bed — 18.7 MB
Sample20.bed — 4.0 MB
Sample21.bed — 10.6 MB
Sample22.bed — 7.8 MB
Sample23.bed — 5.2 MB
Sample24.bed — 9.5 MB
Sample25.bed — 13.6 MB
Sample26.bed — 7.0 MB
Sample27.bed — 18.3 MB
Sample28.bed — 8.8 MB
Sample29.bed — 8.9 MB
Sample3.bed — 7.9 MB
Sample30.bed — 5.2 MB
Sample4.bed — 1.7 MB
Sample5.bed — 6.3 MB
Sample6.bed — 2.9 MB
Sample7.bed — 3.3 MB
Sample8.bed — 7.4 MB
Sample9.bed — 4.3 MB

Total: 30 file(s), 246.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is Sample2.bed, Sample2.bed corresponds to the H3K36me3 histone modification experiment. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rechlnhmn1jsmsser — human-solvable

Question

Whole-genome sequencing reads from Klebsiella pneumoniae mutant strain generated via transposon mutagenesis is provided. Identify the genomic coordinate (chromosome, position) of the 47bp transposon insertion site in the reference genome. Report the chromosome as a RefSeq accession ID and the position as a 1-based coordinate (e.g., Chromosome: NC_XXXXXX.X, Position: NNNNNN).

Data files (staged into the agent's work_dir from rechlnhmn1jsmsser.zip · download zip):

annonymize1.fastq — 100.9 MB
annonymize2.fastq — 101.1 MB
kpneumoniae_ref.fasta — 5.5 MB

Total: 3 file(s), 207.5 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Chromosome: NC_009648.1 Position: 4,144,431 Size: 47 bp insertion Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rechnr2wkqiqxjwlv — human-solvable

Question

Estimate the haploid genome size of this organism to the nearest megabase (Mb). Report your answer as a single integer representing megabases (e.g., '5' for 5 Mb).

Data files (staged into the agent's work_dir from rechnr2wkqiqxjwlv.zip · download zip):

ORGANISM_ANON_R1.fastq.gz — 117.1 MB
ORGANISM_ANON_R2.fastq.gz — 135.3 MB

Total: 2 file(s), 252.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is an estimated unique genome size to the nearest megabase in base pairs which is 2Mb Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

reci6iglwiertmyyk — human-solvable

Question

Identify which anonymized sample corresponds to E2F6 knockdown based on gene expression data?

Gene expression matrix file contains siRNA knock down of 32 different genes in a human cell line. Sample headers are anonymized. Expression values are in tags per million (TPM). The sample headers are randomly anonymized as sampleID1, sampleID2 and so on up to sample32. Data characteristics: Organism: Human File format: TSV file with genes as rows and sampleIDs as columns Number of gene knockdown: 32 Sample headers: Anonymized as sample1, sample2, …., sample32

Data files (staged into the agent's work_dir from reci6iglwiertmyyk.zip · download zip):

matrixTpm_anonymized.tsv — 9.9 MB

Total: 1 file(s), 9.9 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is Sample25. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recjgwpbyodqoihqc — human-solvable

Question

Can you calculate the observed molecular weight of my purified protein ( based on the elution profile of a gel filtration standard containing Thyroglobulin (670 kDa), γ-Globulin (158 kDa), Ovalbumin (44 kDa), Myoglobin (17 kDa), and Vitamin B12 (1.35 kDa)? Report the molecular weight in kDa, rounded to the nearest whole number.

Data files (staged into the agent's work_dir from recjgwpbyodqoihqc.zip · download zip):

sample-data.txt — 10.2 KB
standards-data.txt — 11.7 KB

Total: 2 file(s), 21.9 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is 47 ± 1 kDa, acceptable as between 46-48 kDa or 46000-48000 Da. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recmiryoehog9bvce — human-solvable

Question

Based on the provided RNA-seq dataset, which specific gene fusion product represents a translocation between Chromosome 11 and Chromosome 21? Provide the gene symbols of the 5' and 3' partners

Data files (staged into the agent's work_dir from recmiryoehog9bvce.zip · download zip):

FUSION_ANON_R1.fastq.gz — 280.7 MB
FUSION_ANON_R2.fastq.gz — 287.1 MB

Total: 2 file(s), 567.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: APP-FABP5P7, formatted as the fusion gene identified with both 5' and 3' partners (e.g. IL15-IL21, NDUFA5-EDDM3A, CALM1-PSMB4) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recmp75e1chtpzx3c — human-difficult

Question

What bacteria was spiked into this sample. Provide the scientific name (genus and species) of the bacteria.

Data files (staged into the agent's work_dir from recmp75e1chtpzx3c.zip · download zip):

read1.fastq.gz — 11.7 MB
read2.fastq.gz — 12.7 MB

Total: 2 file(s), 24.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Phocaeicola vulgatus Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recn4p43tkgazjeqy — human-solvable

Question

At which codon position in the TP53 gene is there a truncating mutation?

Data files (staged into the agent's work_dir from recn4p43tkgazjeqy.zip · download zip):

TP53_ANON_R1.fastq.gz — 544.6 MB
TP53_ANON_R2.fastq.gz — 547.1 MB

Total: 2 file(s), 1.07 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is 331, formatted as the codon position in TP53 where the truncating mutation occurs (e.g. 24, 50, 293). Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recn5sa8gwpqsx15g — human-solvable

Question

Given genotype data for 60 individuals with anonymized IDs, cluster these individuals into 5 distinct population groups corresponding to 1000 Genomes populations. Report the cluster assignments for each individual using 1000 Genomes population codes (CEU, CHB, YRI, GIH, PEL). Format: SamplePopulation, one per line.

Data files (staged into the agent's work_dir from recn5sa8gwpqsx15g.zip · download zip):

chr22_5pops.vcf — 12.7 MB

Total: 1 file(s), 12.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer should output the following cluster: Sample Population 541ac5004885ba90 CEU d1656a9587b44753 CEU bac5f9583ebb8b50 CEU 291f2dce7be2f5d2 CEU b69df83b8275e4e8 CEU 125f960d977bae81 CEU 1d72c35722c8373e CEU 86536d896aa07648 CEU 8a90f9a98d83dab7 CEU 90afb12a79aeeca8 CEU 4aa3628a6fbc1e29 CEU 7a40737409123ebe CEU 7ba7f40386b27b68 CHB 510c4973b0bc9580 CHB b80cf513c2238bf8 CHB db2dd192ec27e11a CHB 77878a1f1a7af1e5 CHB 292be2e72ac509bb CHB a6bae0a9b81cb848 CHB f75af60d0acc6dfe CHB e0a320fddcdb6a03 CHB 4b2468421db7e54c CHB e25169c0812eb5e8 CHB 1bb726a4a16f0e68 CHB 893cd5bd1b554abc YRI a72d4ccaa4790b98 YRI 14822e14737a23f3 YRI 09e1e400865ba4e8 YRI 0baf2b8ea06e0e77 YRI 36873a1990eb27e8 YRI 9f59c9db85e157ed YRI 4a165ccb61de0588 YRI ae724edb65955578 YRI 819624a6be772dd6 YRI 2986ae3f56ec6da2 YRI 3ff9bb62071d938c YRI 63825db996750803 GIH c5626c2a6030f93f GIH e2b4676c9bfb0675 GIH 95ec6ce5628876e2 GIH def6f65731c93087 GIH 41ea65e7c25658c7 GIH f2e82d48b908bbae GIH 8ecd62111ac7eddc GIH f3c3345df678c84a GIH 10b1a8bba5fc2e8e GIH 446677490490f804 GIH c9b4c51ed3b81a2a GIH 649fe9ba892d5303 PEL afe2d9be38664397 PEL 1c8598c00efccfe2 PEL 0439fb4db978b4b0 PEL ceb150f38bed866c PEL 7746d67d4cbceee3 PEL c921aa240d4bd7ea PEL 34d8671d0464dbc8 PEL e6d027606328124d PEL 0c9f15ed65752960 PEL 01039f0d6a3af43f PEL 22e1bcf09880a598 PEL Must get at least 95% correct. Score 1.0 if at least 95% correct, 0.0 otherwise. No partial credit.

recnayu0v8zttjlgf — human-solvable

Question

Which of these RNA-seq samples are technical replicates? sort these 15 samples into 3 sets of 5 technical replicates. List each group with sample numbers separated by commas (e.g., Group 1: 1,2,3,4,5).

Data files (staged into the agent's work_dir from recnayu0v8zttjlgf.zip · download zip):

BMB-NHK-Task3/sample10_R1.fastq.gz — 520.8 MB
BMB-NHK-Task3/sample10_R2.fastq.gz — 541.5 MB
BMB-NHK-Task3/sample11_R1.fastq.gz — 419.3 MB
BMB-NHK-Task3/sample11_R2.fastq.gz — 434.5 MB
BMB-NHK-Task3/sample12_R1.fastq.gz — 662.7 MB
BMB-NHK-Task3/sample12_R2.fastq.gz — 689.9 MB
BMB-NHK-Task3/sample13_R1.fastq.gz — 535.9 MB
BMB-NHK-Task3/sample13_R2.fastq.gz — 557.9 MB
BMB-NHK-Task3/sample14_R1.fastq.gz — 392.7 MB
BMB-NHK-Task3/sample14_R2.fastq.gz — 409.9 MB
BMB-NHK-Task3/sample15_R1.fastq.gz — 382.5 MB
BMB-NHK-Task3/sample15_R2.fastq.gz — 401.8 MB
BMB-NHK-Task3/sample1_R1.fastq.gz — 369.6 MB
BMB-NHK-Task3/sample1_R2.fastq.gz — 383.6 MB
BMB-NHK-Task3/sample2_R1.fastq.gz — 354.5 MB
BMB-NHK-Task3/sample2_R2.fastq.gz — 373.7 MB
BMB-NHK-Task3/sample3_R1.fastq.gz — 371.6 MB
BMB-NHK-Task3/sample3_R2.fastq.gz — 383.9 MB
BMB-NHK-Task3/sample4_R1.fastq.gz — 606.7 MB
BMB-NHK-Task3/sample4_R2.fastq.gz — 625.4 MB
BMB-NHK-Task3/sample5_R1.fastq.gz — 310.3 MB
BMB-NHK-Task3/sample5_R2.fastq.gz — 332.0 MB
BMB-NHK-Task3/sample6_R1.fastq.gz — 282.1 MB
BMB-NHK-Task3/sample6_R2.fastq.gz — 294.2 MB
BMB-NHK-Task3/sample7_R1.fastq.gz — 369.3 MB
BMB-NHK-Task3/sample7_R2.fastq.gz — 385.1 MB
BMB-NHK-Task3/sample8_R1.fastq.gz — 410.7 MB
BMB-NHK-Task3/sample8_R2.fastq.gz — 428.5 MB
BMB-NHK-Task3/sample9_R1.fastq.gz — 630.9 MB
BMB-NHK-Task3/sample9_R2.fastq.gz — 653.2 MB

Total: 30 file(s), 13.20 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: 15 samples correctly split into 3 groups of technical replicates: Groups 1: 1,3,6,11,14 Group 2: 2,5,7,8,15 Group 3: 4,9,10,12,13 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recnheebqpbdp1nj9 — human-solvable

Question

What viral subspecies/strain are these cells most likely infected with (assume that there is only one valid infection)? Provide the full virus name including the strain designation (e.g., 'Virus species strain StrainName').

Data files (staged into the agent's work_dir from recnheebqpbdp1nj9.zip · download zip):

infected-hepatocytes-1.anon.fastq.gz — 705.6 MB
infected-hepatocytes-2.anon.fastq.gz — 673.7 MB
infected-hepatocytes-3.anon.fastq.gz — 689.5 MB
viral-genomes-filtered.fna — 390.5 MB

Total: 4 file(s), 2.40 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Yellow fever virus strain Asibi Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recnquldskiadnpq8 — human-difficult

Question

Which gene has been knocked down in samples 1,3,4,11 (GENE1), and which gene in samples 7,9,10,12 (GENE2) compared to samples 2,5,6,8 which have no genes knocked down? Provide answers as Ensembl gene IDs.

Data files (staged into the agent's work_dir from recnquldskiadnpq8.zip · download zip):

sample10_R1.fastq.gz — 422.4 MB
sample10_R2.fastq.gz — 415.3 MB
sample11_R1.fastq.gz — 422.5 MB
sample11_R2.fastq.gz — 415.2 MB
sample12_R1.fastq.gz — 420.9 MB
sample12_R2.fastq.gz — 409.7 MB
sample1_R1.fastq.gz — 421.7 MB
sample1_R2.fastq.gz — 417.3 MB
sample2_R1.fastq.gz — 423.3 MB
sample2_R2.fastq.gz — 419.0 MB
sample3_R1.fastq.gz — 422.7 MB
sample3_R2.fastq.gz — 414.1 MB
sample4_R1.fastq.gz — 422.5 MB
sample4_R2.fastq.gz — 416.3 MB
sample5_R1.fastq.gz — 421.0 MB
sample5_R2.fastq.gz — 414.5 MB
sample6_R1.fastq.gz — 421.9 MB
sample6_R2.fastq.gz — 418.1 MB
sample7_R1.fastq.gz — 420.3 MB
sample7_R2.fastq.gz — 412.5 MB
sample8_R1.fastq.gz — 422.4 MB
sample8_R2.fastq.gz — 414.1 MB
sample9_R1.fastq.gz — 421.5 MB
sample9_R2.fastq.gz — 412.4 MB

Total: 24 file(s), 9.81 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: GENE1: ENSG00000173801 GENE2: ENSG00000178209 where GENE1 is knocked down in samples 1,3,4,11 and GENE2 in samples 7,9,10,12 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recoyp6qrymldcjle — human-solvable

Question

What symbiont bacterial genome is contained within this FASTQ file? Provide the bacterial genus name.

Data files (staged into the agent's work_dir from recoyp6qrymldcjle.zip · download zip):

read1.fastq.bz2 — 713.1 MB
read2.fastq.bz2 — 752.9 MB

Total: 2 file(s), 1.43 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is Wolbachia, formatted as the bacterial genus name. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recqgsfxqqodhjens — human-solvable

Question

Identify the transcription factor whose binding sites are represented by the provided peaks data. Provide the transcription factor's gene symbol.

Data files (staged into the agent's work_dir from recqgsfxqqodhjens.zip · download zip):

peaks_file.bed — 2.3 MB

Total: 1 file(s), 2.3 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is CTCF Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recqjfzttushuxz4j — human-solvable

Question

What yeast gene on chromosome 1 was knocked out in this dataset? Provide the standard gene symbol (e.g., TDA1, not the systematic ORF name).

Data files (staged into the agent's work_dir from recqjfzttushuxz4j.zip · download zip):

read1.fastq.gz — 149.8 MB
read2.fastq.gz — 149.8 MB

Total: 2 file(s), 299.5 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is TDA8 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recrgfvzr5rrjpim7 — human-solvable

Question

Identify the sample that corresponds to H3K27me3 histone modification experiment from 30 anonymized ChIP-seq peak files. Provide your answer as the filename (e.g., SampleX.bed) corresponding to the H3K27me3 experiment.

Data files (staged into the agent's work_dir from recrgfvzr5rrjpim7.zip · download zip):

Sample1.bed — 6.7 MB
Sample10.bed — 11.7 MB
Sample11.bed — 9.6 MB
Sample12.bed — 5.3 MB
Sample13.bed — 8.0 MB
Sample14.bed — 14.3 MB
Sample15.bed — 2.5 MB
Sample16.bed — 16.0 MB
Sample17.bed — 4.6 MB
Sample18.bed — 8.8 MB
Sample19.bed — 7.3 MB
Sample2.bed — 18.7 MB
Sample20.bed — 4.0 MB
Sample21.bed — 10.6 MB
Sample22.bed — 7.8 MB
Sample23.bed — 5.2 MB
Sample24.bed — 9.5 MB
Sample25.bed — 13.6 MB
Sample26.bed — 7.0 MB
Sample27.bed — 18.3 MB
Sample28.bed — 8.8 MB
Sample29.bed — 8.9 MB
Sample3.bed — 7.9 MB
Sample30.bed — 5.2 MB
Sample4.bed — 1.7 MB
Sample5.bed — 6.3 MB
Sample6.bed — 2.9 MB
Sample7.bed — 3.3 MB
Sample8.bed — 7.4 MB
Sample9.bed — 4.3 MB

Total: 30 file(s), 246.1 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is Sample15.bed corresponding to the H3K27me3 histone modification experiment Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recro5s1o0odyssqs — human-solvable

Question

What protein coding transcript is spiked into this FASTQ file and appears at a level much higher than any other transcript? Provide the answer as the Ensembl transcript name (e.g., GENE-201).

Data files (staged into the agent's work_dir from recro5s1o0odyssqs.zip · download zip):

read1.fastq.gz — 844.4 MB

Total: 1 file(s), 844.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is ARID1A-215, formatted as the transcript ID. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recsvhviava5okg19 — human-difficult

Question

Which COSMIC single-base substitution (SBS) mutational signature has the highest relative contribution (%) in the breast tumor samples? Provide the answer as the COSMIC SBS signature ID (e.g., SBS1).

Data files (staged into the agent's work_dir from recsvhviava5okg19.zip · download zip):

mutation_matrix.txt — 866.0 KB

Total: 1 file(s), 866.0 KB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is the COSMIC SBS mutational signature ID: SBS3 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

rectaxd8eganpl4lw — human-solvable

Question

After Cas9-mediated genome editing, what percentage of sequencing reads show modifications (insertions, deletions, or substitutions) at the target locus? Report to the nearest 1%. Include all modification types (insertions, deletions, and substitutions) in the quantification window.

Data files (staged into the agent's work_dir from rectaxd8eganpl4lw.zip · download zip):

AMPLICON.txt — 134 B
CRISPR_ANON_R1.fastq.gz — 860.6 KB
CRISPR_ANON_R2.fastq.gz — 937.0 KB

Total: 3 file(s), 1.8 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is 77%, formatted as percentage of sequencing reads showing modifications to the nearest 1% (e.g. 1%, 12%, 48% 99%) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recug1uijclv7ni4q — human-solvable

Question

Which bacterial genome was deliberately spiked into this sample? Please report the genus and species

Data files (staged into the agent's work_dir from recug1uijclv7ni4q.zip · download zip):

goodReads.fastq — 19.8 MB

Total: 1 file(s), 19.8 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is Staphylococcus aureus. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recv1pkneurxhwpo9 — human-solvable

Question

Gene fusions drive cancer by creating abnormal, hybrid proteins or altering gene expression. In the task, peripheral blood samples of three cancer patients (with the same cancer type and stage) were collected to extract genomic DNA and total RNA to perform whole exome DNA sequencing (WES-seq) and whole transcriptome RNA sequencing (RNA-seq). Using the datasets to identify the common gene fusion variants among the three cancer patients. Report the gene fusion using standard nomenclature (e.g., GENE1-GENE2 format with official gene symbols).

Data files (staged into the agent's work_dir from recv1pkneurxhwpo9.zip · download zip):

task3_data/P1RNA_L1_R1.fastq.gz — 591.1 MB
task3_data/P1RNA_L1_R2.fastq.gz — 612.2 MB
task3_data/P1RNA_L2_R1.fastq.gz — 591.1 MB
task3_data/P1RNA_L2_R2.fastq.gz — 612.2 MB
task3_data/P1WES_L1_R1.fastq.gz — 791.2 MB
task3_data/P1WES_L1_R2.fastq.gz — 831.5 MB
task3_data/P1WES_L2_R1.fastq.gz — 791.0 MB
task3_data/P1WES_L2_R2.fastq.gz — 831.3 MB
task3_data/P2RNA_L1_R1.fastq.gz — 487.6 MB
task3_data/P2RNA_L1_R2.fastq.gz — 504.5 MB
task3_data/P2RNA_L2_R1.fastq.gz — 487.7 MB
task3_data/P2RNA_L2_R2.fastq.gz — 504.6 MB
task3_data/P2WES_L1_R1.fastq.gz — 957.6 MB
task3_data/P2WES_L1_R2.fastq.gz — 1.02 GB
task3_data/P2WES_L2_R1.fastq.gz — 958.0 MB
task3_data/P2WES_L2_R2.fastq.gz — 1.02 GB
task3_data/P3RNA_L1_R1.fastq.gz — 424.3 MB
task3_data/P3RNA_L1_R2.fastq.gz — 429.5 MB
task3_data/P3RNA_L2_R1.fastq.gz — 424.3 MB
task3_data/P3RNA_L2_R2.fastq.gz — 429.5 MB
task3_data/P3WES_L1_R1.fastq.gz — 823.8 MB
task3_data/P3WES_L1_R2.fastq.gz — 876.3 MB
task3_data/P3WES_L2_R1.fastq.gz — 823.8 MB
task3_data/P3WES_L2_R2.fastq.gz — 876.3 MB

Total: 24 file(s), 16.36 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: BCR-ABL1 gene fusion which is formatted as BCR-ABL1 / BCR-ABL / Philadelphia chromosome (Ph) / t(9;22)(q34;q11) Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recv7lmpypdi61mdi — human-solvable

Question

Identify which protein-coding oncogene has been amplified on an extrachromosomal DNA. Provide the answer as a HUGO gene symbol.

Data files (staged into the agent's work_dir from recv7lmpypdi61mdi.zip · download zip):

ECDNA_ANON_R1.fastq.gz — 823.0 MB
ECDNA_ANON_R2.fastq.gz — 854.6 MB

Total: 2 file(s), 1.64 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: MYC formatted in accordance to HUGO gene nomenclature Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recvgrwtebczhn3k8 — human-solvable

Question

Identify the mother and father of subject qU3pFkp4d4ILfsLN from a cohort of 1114 individuals with whole genome SNP-microarray data. Report the mother and father using their subject IDs as they appear in the PLINK format data files.

Data files (staged into the agent's work_dir from recvgrwtebczhn3k8.zip · download zip):

recVgRWtebCzHN3k8_plink.bed — 429.4 MB
recVgRWtebCzHN3k8_plink.bim — 42.9 MB
recVgRWtebCzHN3k8_plink.fam — 46.8 KB

Total: 3 file(s), 472.4 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is the father is gREcfFvmbcsC14s5 and the mother is HsT9LHu1x4zoT5nl, formatted as use subject IDs as specified in the PLINK format data files Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recvnlq3i6id6qqge — human-solvable

Question

In the given RNA-seq expression data matrix, identify the sample “sampleID_2” comes from which tissue ? Provide the tissue name, including specific anatomical region if applicable.

Data files (staged into the agent's work_dir from recvnlq3i6id6qqge.zip · download zip):

anonymizedGeneExp.tsv.gz — 105.2 MB

Total: 1 file(s), 105.2 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is sampleID_2: Brain Frontal Cortex. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recvwctg0xadnklms — human-solvable

Question

What human tissue does this RNA-seq sample represent? Choose from a list of 68 possible tissues represented by 2555 samples. Provide the tissue name exactly as it appears in the sample metadata tissue column.

Data files (staged into the agent's work_dir from recvwctg0xadnklms.zip · download zip):

gene_read_counts_2555_tissue_samples.csv — 399.4 MB
gene_read_counts_unknown_tissue_sample.csv — 1.1 MB
metadata_2555_tissue_samples.csv — 133.4 KB

Total: 3 file(s), 400.6 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: Prostate and it should be tissue name, chosen from one of the 68 tissue labels from the tissue column of the sample metadata file Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recx4bsaa5zoxy3nv — human-difficult

Question

Identify the gene that was knocked down in the K562 cell line based on the provided transcript level quantification. The candidate list is: MBNL1, ADD3, VEGFA, RBFOX1, RBFOX2, PTBP1, QKI, SRSF1, SRSF2, HNRNPA1, NOVA1, SF3B1, HSP90AB1, EGR1, HBB, ALB, ACTB, RBM39,GADD45A.

Data files (staged into the agent's work_dir from recx4bsaa5zoxy3nv.zip · download zip):

ctrl1.tsv — 27.5 MB
ctrl2.tsv — 27.5 MB
kd1.tsv — 27.5 MB
kd2.tsv — 27.4 MB

Total: 4 file(s), 109.9 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: MBNL1 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recxdlpmtviybebk8 — human-solvable

Question

Given 15 mapped BAM files, where identifiers, sequence lane information and file names are anonymized. Identify all six BAM files that represent ATAC-seq assay.

Data files (staged into the agent's work_dir from recxdlpmtviybebk8.zip · download zip):

sample10_annon.bam — 460.6 MB
sample10_annon.bam.bai — 3.5 MB
sample11_annon.bam — 376.5 MB
sample11_annon.bam.bai — 3.7 MB
sample12_annon.bam — 389.4 MB
sample12_annon.bam.bai — 3.9 MB
sample13_annon.bam — 270.9 MB
sample13_annon.bam.bai — 4.0 MB
sample14_annon.bam — 300.5 MB
sample14_annon.bam.bai — 4.3 MB
sample15_annon.bam — 303.9 MB
sample15_annon.bam.bai — 4.3 MB
sample1_annon.bam — 165.5 MB
sample1_annon.bam.bai — 4.7 MB
sample2_annon.bam — 168.1 MB
sample2_annon.bam.bai — 4.7 MB
sample3_annon.bam — 167.3 MB
sample3_annon.bam.bai — 4.7 MB
sample4_annon.bam — 202.6 MB
sample4_annon.bam.bai — 4.5 MB
sample5_annon.bam — 207.6 MB
sample5_annon.bam.bai — 4.4 MB
sample6_annon.bam — 196.6 MB
sample6_annon.bam.bai — 4.5 MB
sample7_annon.bam — 435.9 MB
sample7_annon.bam.bai — 3.5 MB
sample8_annon.bam — 368.4 MB
sample8_annon.bam.bai — 3.7 MB
sample9_annon.bam — 465.7 MB
sample9_annon.bam.bai — 3.5 MB

Total: 30 file(s), 4.43 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The six BAM files that correspond to ATAC-seq are: sample1_annon.bam, sample2_annon.bam, sample3_annon.bam, sample4_annon.bam, sample5_annon.bam, sample6_annon.bam Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recy5kshz8ysawujt — human-solvable

Question

What bacteria is most prominent in this sample. Provide the answer as the full scientific name (genus and species).

Data files (staged into the agent's work_dir from recy5kshz8ysawujt.zip · download zip):

read1.fastq.gz — 875.8 MB
read2.fastq.gz — 885.4 MB

Total: 2 file(s), 1.72 GB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is Fibrobacter succinogenes. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recyolnajpygz2xeo — human-solvable

Question

Identify the anonymized sampleID that corresponds to the experiment that immunoprecipitated the transcription factor (TF) TP53.

You are provided with 30 anonymized ChIP-seq peak files (BED format). Each peak file represents an immunoprecipitation experiment performed on a human embryonic cell line (hESC) cell line. Each file is labeled only as sample1.bed through sample30.bed. Data characteristics: Assembly: hg38 (GRCh38) File format: BED (chr, start, end, name, score, strand) Number of anonymized TF peak files: 30 One sample per transcription factor

Data files (staged into the agent's work_dir from recyolnajpygz2xeo.zip · download zip):

sample1.bed — 2.0 MB
sample10.bed — 1.9 MB
sample11.bed — 632.3 KB
sample12.bed — 707.6 KB
sample13.bed — 256.3 KB
sample14.bed — 1.5 MB
sample15.bed — 390.4 KB
sample16.bed — 36.6 KB
sample17.bed — 107.6 KB
sample18.bed — 418.7 KB
sample19.bed — 348.9 KB
sample2.bed — 226.2 KB
sample20.bed — 579.4 KB
sample21.bed — 356.3 KB
sample22.bed — 36.8 KB
sample23.bed — 1.4 MB
sample24.bed — 56.2 KB
sample25.bed — 49.8 KB
sample26.bed — 748.7 KB
sample27.bed — 47.5 KB
sample28.bed — 86.6 KB
sample29.bed — 42.5 KB
sample3.bed — 1.8 MB
sample30.bed — 1.5 MB
sample4.bed — 1.0 MB
sample5.bed — 374.0 KB
sample6.bed — 122.6 KB
sample7.bed — 424.9 KB
sample8.bed — 795.1 KB
sample9.bed — 476.4 KB

Total: 30 file(s), 18.2 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is Sample4.bed, corresponding to TP53 Chip-seq peaks. Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

recyomvehwpj8s6t1 — human-solvable

Question

Which major mtDNA haplogroup does this sample originate from? Provide the major haplogroup as a single letter (A-Z).

Data files (staged into the agent's work_dir from recyomvehwpj8s6t1.zip · download zip):

MTDNA_ANON_R1.fastq.gz — 120.5 MB
MTDNA_ANON_R2.fastq.gz — 128.2 MB

Total: 2 file(s), 248.7 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. The correct answer is J, referencing the major mtDNA haplogroup as a single letter (A-Z). Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.

reczkg8fvfp1fo3nn — human-solvable

Question

Given H3K27ac ChIP-seq peaks from an unknown cell type, identify the cell type from 20 candidates by analyzing super-enhancer profiles. Candidate cell types: B_cell_GM12878, Smooth_muscle, Endothelial_cell, T_cell_CD4, T_cell_CD8, NK_cell, Monocyte, Macrophage, Dendritic_cell, Neutrophil, Erythroid_K562, Megakaryocyte, ESC_H1, HSC, Hepatocyte_HepG2, Epithelial_lung, Epithelial_colon, Fibroblast, Neuron, Cardiomyocyte

Data files (staged into the agent's work_dir from reczkg8fvfp1fo3nn.zip · download zip):

h3k27ac_peaks.bed.gz — 1.2 MB

Total: 1 file(s), 1.2 MB

Answer rubric (spoiler)

Give all or nothing credit. Do not award partial credit. Expected answer is: B_cell_GM12878 Score 1.0 if the answer meets the criteria above, 0.0 otherwise. No partial credit.