Cloud-native RStudio on Kubernetes for Hopsworks

07/18/2023
by   Gibson Chikafa, et al.
0

In order to fully benefit from cloud computing, services are designed following the "multi-tenant" architectural model, which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling, and customization. RStudio server is an open-source Integrated Development Environment (IDE) accessible over a web browser for the R programming language. We present the design and implementation of a multi-user distributed system on Hopsworks, a data-intensive AI platform, following the multi-tenant model that provides RStudio as Software as a Service (SaaS). We use the most popular cloud-native technologies: Docker and Kubernetes, to solve the problems of performance isolation, security, and scaling that are present in a multi-tenant environment. We further enable secure data sharing in RStudio server instances to provide data privacy and allow collaboration among RStudio users. We integrate our system with Apache Spark, which can scale and handle Big Data processing workloads. Also, we provide a UI where users can provide custom configurations and have full control of their own RStudio server instances. Our system was tested on a Google Cloud Platform cluster with four worker nodes, each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system.

READ FULL TEXT

page 1

page 4

research
04/18/2023

Multitenant Containers as a Service (CaaS) for Clouds and Edge Clouds

Cloud computing, offering on-demand access to computing resources throug...
research
02/10/2018

Distributed Log Analysis on the Cloud Using MapReduce

In this paper we describe our work on designing a web based, distributed...
research
06/27/2012

The Necessity for Hardware QoS Support for Server Consolidation and Cloud Computing

Chip multiprocessors (CMPs) are ubiquitous in most of today's computing ...
research
03/24/2021

A Multi-Tenant Framework for Cloud Container Services

Container technologies have been evolving rapidly in the cloud-native er...
research
04/03/2022

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Most AI projects start with a Python notebook running on a single laptop...
research
04/26/2021

Cloud computing as a platform for monetizing data services: A two-sided game business model

With the unprecedented reliance on cloud computing as the backbone for s...
research
07/15/2020

SRv6-PM: Performance Monitoring of SRv6 Networks with a Cloud-Native Architecture

Segment Routing over IPv6 (SRv6 in short) is a networking solution for I...

Please sign up or login with your details

Forgot password? Click here to reset