Petabyte scale genomic data repository enabling automated submission pipelines, access, control, data augmentation, data mining and query services.
Secure, reliable, high-speed transfer and remote random access optimized for very large files.
Collaboration portal integrating graphical views and advanced search across all GNOS enabled repositories.
Local High Performance compute and storage for data analysis.
At Annai Systems we’ve been busy overcoming some persistent problems of big data that stand in the way of unleashing the full power of genomics in research and medicine. Ours are not the glamorous problems of perfecting, analyzing and interpreting genomic data. We leave that to hundreds of brilliant bioinformaticians. No, ours are the nagging operational problems of holding the data, protecting it, accessing it, searching it, transferring it, sharing it, and excising only the segments of it that really interest researchers at a moment in time. Oh, and aggregating datasets massive enough to allow algorithms to find an elusive signal emitted by clinically significant clusters of heterogeneous rare variants. We’re the network engineers in hardhats and jumpsuits who are crushing these operational problems with industrial-strength solutions built to support the era of genomic medicine.
The Annai-GNOS data management solution enables a streamlined workflow through a combination of state-of-the art data transport, search, collaboration and analysis tools. From cataloguing to searching to transporting data, our solution enables comprehensive functionality via remote access to networked repositories worldwide. Our products and services are designed to facilitate collaboration while addressing data security considerations and regulations. Below is an overview of the Annai-GNOS environment:
Click here to view our Products and Services
Literature & Publications
Annai Systems Helps Make Sense of Genomics
Annai Systems to Host Data from International Cancer Genome Consortium
ShareSeq™ Platform Will Accelerate Use of ICGC Data in Research and Medicine
CARLSBAD, CA — November 28, 2014 – Annai Systems Inc., a pioneer in genomic data management solutions, today announced an agreement with the Ontario Institute of Cancer Research (OICR) to host data from more than 10,000 human tumors to be generated by the International Cancer Genome Consortium (ICGC). As a premier partner of OICR, Annai will make ICGC data available on ShareSeq, its secure cloud-based platform designed to enable researchers and clinicians to overcome many of the bottlenecks that commonly impede the extraction of meaning and value from genomic data.
Following deployment of ICGC data sets on ShareSeq to begin immediately, researchers may elect to download raw ICGC data from public sites at no charge or to access ICGC data online in the ShareSeq environment that provides normalized/processed data, integrated storage for private data, data search and retrieval, high performance computing, flexible analytical workflows, data sharing, and expert bioinformatics support. ShareSeq’s value-added capabilities and services are available on either a subscription or pay-for-use basis. In the near future, researchers using ShareSeq will also be able to aggregate ICGC data with their own data or with other data sets that have been pre-analyzed or normalized and made ready for biological, pathway, functional, or clinical analysis.
“The combination of the ICGC dataset and ShareSeq is an invaluable and ground-breaking resource that directly addresses the needs of cancer researchers by eliminating many of the major challenges associated with accessing, analyzing and managing today’s large data sets,” commented Dr. Lincoln Stein, Director of OICR’s Informatics and Bio-Computing Program and Director of the ICGC’s Data Coordination Centre housed in Toronto, Canada. “We are excited to be working with Annai Systems to make ShareSeq the premiere genomics platform for oncology research,” added Stein.
“The mobilization of this important ICGC data on ShareSeq represents another key milestone in the realization of our vision for an interconnected network of data repositories and computing nodes that will dramatically accelerate the use of genomic data in research and medicine,” said Dr. Thomas Schlumpberger, Vice President of Business Development and Sales of Annai Systems.
About Annai Systems
Annai Systems Inc., with offices in San Francisco and Carlsbad, California, is a technology company with core competency in big data management solutions for genomics. Annai Systems offers products and services to producers and consumers of genomic data that enable them to overcome key data-related bottlenecks and roadblocks that impede progress in research and medicine.
Annai Systems and Hitachi Data Systems Enter Into Agreement to Deliver Data Analysis and Management Solutions
Unique Combination of Data Management Software and Compute Resources Will Deliver Powerful Genomic Data Platform
CARLSBAD, CA — April 22, 2014 – Annai Systems Inc., a pioneer in genomic data management solutions, today announced that it is working with Hitachi Data Systems (HDS) to provide an integrated resource for data management and analysis that leverages Hitachi Data Systems expertise in high performance compute and cloud-based solutions, in combination with the proven, Annai-GNOS™ data management software platform. Annai Systems is a preferred partner, under the Hitachi Data Systems Technology Alliance program.
As part of this agreement, Annai Systems will offer a number of different products and services, underpinned by Hitachi Content Platform (HCP), to enable researchers to better utilize genomic data, with an initial focus on cancer related research. This cloud-based offering will deliver unprecedented access to high value data sets in a scalable environment designed to overcome the big data challenges. This solution has been designed to provide researchers with a means to answer key biomedical questions that have long eluded them, in part due to the complexity of data access.
“We are excited to be working with the Hitachi Data Systems team to take advantage of the synergies that our products and technologies will bring to bear on the data-related challenges of cancer research,” said Michael Penley, Annai’s Chief Executive Officer. “We have a unique opportunity to deliver highly impactful solutions that can change the way researchers use large cancer related data sets.”
David Wilson, Senior Director Health & Life Sciences, Hitachi Data Systems, commented, “We are proud to work with Annai Systems on this integrated system for genomic data access, analysis and management. The system puts to good use the unique metadata gathering and intelligence tools in Hitachi Content Platform that bring crucial structure to unstructured file data for intelligent automation and deeper analysis of data.”
The combined product offering will be available later in the year. Both companies will be attending the upcoming Bio IT World conference, in Boston Massachusetts, from April 29th to May 1st. Please visit each company in their respective booth to learn more about these products and services; Annai in booth 417, Hitachi Data Systems in booths 348 and 350.
Annai Systems Appoints Francisco M. De La Vega as Chief Scientific Officer
Distinguished industry scientist brings significant bioinformatics and genomics expertise to data management pioneer
CARLSBAD, CA — March 4, 2014 – Annai Systems Inc., a genomic data management solution provider, today announced that Francisco M. De La Vega, D. Sc., a leading expert in the analysis and applications of high-throughput sequencing, has been appointed to the newly created role of Chief Scientific Officer.
“We are pleased to have a scientist of Francisco’s caliber become part of the Annai team,” said Michael Penley, Annai Systems’ Chief Executive Officer. “His breadth of experience and commercial track record will prove invaluable to our continued growth and delivery of world class genomic data management solutions.”
Francisco was previously Vice President of Genome Science at Real Time Genomics, a data analysis solutions provider. He also spent thirteen years at Applied Biosystems (currently Thermo Fisher), where he was most recently the Distinguished Scientific Fellow and Vice President of Next-Generation Sequencing Applications. During his tenure at AB, he oversaw the research, development, and validation of numerous innovative genetic analysis technologies and bioinformatics tools enabling high-throughput biology. Francisco received the Bio-IT World Best Practices Award in Basic Research in 2008 for a collaborative project with the Christian Albrechts University of Kiel leading to the identification of a novel Crohn’s disease gene. He has been an active participant of trend setting community projects such as the 1000 Genomes Project, where he was member of both the steering committee and analysis group, and the Genome-in-a-Bottle consortium, where he serves on the steering committee. Francisco earned his Doctor of Science degree in Genetics and Molecular Biology at the Center for Research and Advanced Studies of the National Polytechnic Institute of Mexico. Francisco has been Visiting Instructor at the Department of Genetics of the Stanford School of Medicine where he performed research in population and clinical genomics.
Dr. De La Vega commented, “Given the current state and future direction of genomic-based medicine and the role that sequencing is playing in this exciting evolution, data analysis and management are key enabling capabilities to unlock the full potential of this data. Annai Systems’ data management software and related services as well as their engagement with the cancer research community, places them in a strong position to continue to address many of the challenges that must be overcome in order to advance the field.”
About Annai Systems
Annai Systems Inc., with offices in California’s Silicon Valley and Carlsbad California, is a technology company with a core competency in big data management solutions.
We are focused on applying our technology to the field of genomics to enable easier, more efficient access to genomic data. Our technology enables the producers and consumers of genomic data to achieve better results faster and at lower operating costs by providing products that overcome key data-related bottlenecks and roadblocks that are impeding progress in the growing fields of genomic research and medicine. Annai Systems offers a variety of products and services to address the “big data” challenges associated with using genomic data in personalized medicine and healthcare improvement.
Annai Systems Software to Manage Genome Sequence Data for Pan Cancer Project at Six Centers Worldwide
Data from 2,000 cancer genomes to be jointly analyzed using uniform algorithms.
Los Gatos, CA (PRWEB) February 12, 2014 – Annai Systems Inc., a provider of genomic data management solutions, today announced that it has partnered with the Ontario Institute for Cancer Research (OICR) and the International Cancer Genome Consortium (ICGC) to provide its Annai GNOSTM data management software in support of six data centers around the world that will house a large data set for the Pan Cancer Project.
The Pan Cancer Project is an international effort to enable the further improvement and standardization of data analysis methods, including somatic mutation calling pipelines. This project is part of ICGC’s continued focus on generating comprehensive catalogues of genomic abnormalities in tumors from 50 different cancer types and/or subtypes.
The Pan Cancer data set consists of 2,000 whole genome tumor-normal pairs, on many of which transcriptome, DNA methylation and other analysis has been performed. The intention of the project is to define and organize a core analysis package that will be run for each cancer-normal whole genome pair. The sequence and associated data will be uploaded to one of six cloud computing facilities, where alignment to the reference human genome will be performed and basic variant calling algorithms run. These analyses will generate VCF format files containing all somatic mutations identified from the major classes of variants (substitutions, small indels, genomic rearrangements, copy number, retrotranspositions). Many multi- national bioinformatics teams will perform downstream analysis on these somatic mutations with the objective of arriving at one standard analytic pipeline.
Annai Systems’ software will be used initially by the cloud computing facilities to upload, store, provide access to and manage approximately one petabyte of data that will be replicated at six prominent data centers around the world.
“We selected Annai Systems as our data management partner based on their previous work with the Cancer Genome Hub (CGHub), which currently houses the Cancer Genome Atlas (TCGA) data,” commented Dr. Lincoln Stein, Director of OICR’s Informatics and Bio-Computing Program and Director of the ICGC’s Data Coordination Centre housed in Toronto, Canada. “It was important to be able to rapidly establish these data centers and ensure that the data could be securely uploaded and stored, and easily queried and accessed by a number of researchers. Annai’s GNOS platform was the ideal solution for this project,” added Stein.
“Our partnership with OICR and ICGC represents the next step in the evolution of Annai Systems as the premiere provider of genomic data management software solutions,” said Michael Penley, Annai’s Chief Executive Officer. “We are proud to have been chosen to provide the infrastructure and support for this important project and to work with these leading organizations in advancing cancer research and genomic-based medicine.”
About Annai Systems
Annai Systems Inc., with offices in Silicon Valley and Carlsbad, California, is a technology company with core competency in big data management solutions. Annai Systems offers products and services to producers and consumers of genomic data that enable them to generate high quality results faster and at lower cost by efficiently overcoming key data-related bottlenecks and roadblocks that impede progress in research and medicine.
What our customers are saying . . .
“Using GTFuse got us a 99.9% reduction in the data size and doing it on Annai’s BioCompute Farm was ten times faster. It took us less than a day instead of eight and a half weeks of downloads.”GTFuse
“Human time was three weeks and, when they used ShareSeq, less than a day. Without GTFuse and ShareSeq, I would have done it but I would have had to carefully set expectations and it would have taken a huge amount of time.”ShareSeq
“Before, we had discussed doing analyses across the whole RNASeq data set of TCGA. Now, that’s become a lot more possible with GTFuse.”GTFuse
Frequently Asked Questions
What is unique about Annai Systems’ products?
What is Annai-GNOS?
Annai-GNOS is a unique integration of the data repository infrastructure and high-speed networking capabilities needed to more effectively use large genomic data sets. These data sets are characterized by diverse file formats, extensive meta-data, large file sizes and individual sequence datasets ranging from 10 Gigabytes to more than 1 Terabyte in size (depending on the depth of coverage).
Annai-GNOS allows the entire user community to see the state of data throughout the submission lifecycle, including data that has not yet been approved or submitted for download. Researchers can query the state of data as soon as it is submitted and quickly identify submissions that require intervention before they are available to users of the repository.
Flexible meta-data searching greatly simplifies finding the right sequence file and highly fault tolerant design ensures services continue to be available. The Annai-GNOS network functionality integrates secure, high-speed network protocols to mobilize petabyte scale genomic data analysis.
What is GeneTorrent?
GeneTorrent is an encrypted and accelerated file transfer software that enables whole genome sequence files to be transferred quickly and securely to and from data repositories. GeneTorrent was modeled after BitTorrent to optimize the file transfer and to enable the transfer of multiple genome sequences in a reasonable amount of time.
What is GTFuse?
GTFuse enables researchers to directly access remote sequence data files as if they were on the local file system. GTFuse allows researchers to “mount” the desired data and immediately run any existing tools such as SamTools to inspect the header and begin accessing specific regions of the sequence data (i.e. if you are interested in analyzing data from a particular chromosome, gene, or region).
What is ShareSeq?
ShareSeq is a high performance compute environment that is co-located with CGHub. This means that a researcher can do data analysis on CGHub files without having to move them into one’s own compute environment. Users have full access to their secure, private virtual machines on ShareSeq and are able to load and use any tools they wish. Additionally, Annai provides a local library with common tools such as GATK, BowTie, Picard, SAMTools and many others. Several reference genomes are installed on the golden image. You may add any reference genome you wish to your own machine image.