Sherlock in OSS: A Novel Approach of Content-Based Searching in Object Storage System
Object Storage Systems (OSS) inside a cloud promise scalability, durability, availability, and concurrency. However, open-source OSS does not have a specific approach to letting users and administrators search based on the data, which is contained inside the object storage, without involving the entire cloud infrastructure. Therefore, in this paper, we propose Sherlock, a novel Content-Based Searching (CoBS) architecture to extract additional information from images and documents and store it in an Elasticsearch-enabled database, which helps us to search for our desired data based on its contents. This approach works in two sequential stages. First, it will be uploaded to a classifier that will select the data type and send it to the specific model for the data. The images that are being uploaded are sent to our trained model for object detection, and the documents are sent for keyword extraction. Next, the extracted information is sent to Elasticsearch, which enables searching based on the contents. Because the precision of the models is so fundamental to the search's correctness, we train our models with comprehensive datasets (Microsoft COCO Dataset) for multimedia data and SemEval2017 Dataset for document data. Furthermore, we put our designed architecture to the test with a real-world implementation of an open-source OSS called OpenStack Swift. In addition, we upload images into the dataset in various segments to find out the efficacy of our proposed model in real-life Swift object storage.
READ FULL TEXT