Powerful Data Management

Our group has a powerful dedicated computing cluster for the research on big biomedical data. The computing cluster has 24 computer nodes, 674 CPU cores, 6 Terabytes of operating memory and 2 Petabytes of storage capacity.

Efficient Big Data Management Platform

The cluster runs on the  HOPS  (Hadoop Open Platform-as-a-Service) which is a next generation distribution of Apache Hadoop. The platform is leveraged with the Hopsworks,  the user interface front-end to Hops, which lowers the barrier to entry for users getting started with Hadoop by providing graphical access to the many Hops platform services such as Spark, Flink, Kafka, HDFS, YARN, Python, Jupyter and Zeppelin Notebooks.

Hopsworks includes: user management systems, projects and dataset management, analytics, metadata mangement and free-text search.

Users

  • Secure authentication (normal/two-factor).
  • Defined roles.

Projects and Datasets

  • project-based multi-tenancy with dynamic roles.
  • the ability to share DataSets securely between projects (reuse of DataSets without copying).
  • DataSet browser.
  • import/export of data using the Browser.

Analytics

  • interactive data analytics; Jupyter and Zeppelin Notebooks.
  • general-purpose cluster computing system, analytics engine for big data processing; Apache Spark.
  • high-level APIs in Java, Scala, Python and R.
  • long running jobs, batch-based submission, analytics; Spark.

Metadata Management

  • intuitive user interface.
  • support for the design and entry of extended metadata for files and directories.

Free-text Search

  •  Free-text search with Elastic for files/directories and their extended metadata.
  • Global free-text search for projects and DataSets in the filesystem.
  • Project-based free-text search of all files and extended metadata within a project.
Posted on Posted in Slider