Datasets
Unity hosts a variety of commonly used public datasets for easy access for Unity workloads. You can find all of Unity’s hosted datasets in the /datasets
directory from any Unity node.
You can also view
/datasets
in the Open OnDemand file browser by navigating to the “/datasets
” entry in the “Files” dropdown.To get information about each dataset, see the menu below.
AI and ML
Code Llama
Imagenet
Imagenet 1K
LAION
Llama2
mixtral
Bioinformatics
Alphafold databases
Colabfold databases
dfam
infoOpen collection of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
folder_open
/datasets/bio/dfam/
Eggnog
infoA database of orthology relationships, functional annotation, and gene evolutionary histories.
folder_open
/datasets/bio/eggnog-data/
folder_open
/datasets/bio/eggnog6-data/
NCBI NT, NR, Eukaryotic, and Prokaryote databases
infoNCBI’s databases are downloaded weekly. See the full details for more information.
folder_open
/datasets/bio/ncbi-db/
Tara Oceans
infoTara Oceans
folder_open
/datasets/bio/tara-oceans/MGT-transcriptomes/
folder_open
/datasets/bio/tara-oceans/MATOU-gene-catalog/