home
vitrivr Logo
  1. Setting up the vitrivr stack
    1. Overall system requirements
    2. Setting up ADAMpro
      1. Simple Deployment using Docker Container
      2. Native Deployment
      3. Alternative Deployments
    3. Setting up Cineast
    4. Setting up the vitrivr UI
      1. Using the pre-built version
      2. Building the UI
      3. Configuring the UI
  2. Running the Extraction
  3. Retrieval
    1. Configuring features
    2. Setting up the paths
    3. Generating index structures
    4. The first query
  4. Extending and customizing vitrivr
    1. Adding features to Cineast
    2. Customizing the UI

Setting up the vitrivr stack

The following illustrates how to set up the individual components of the vitrivr stack and how to connect them in order to obtain one integrated retrieval system.

Overall system requirements

The vitrivr stack does not require any special hardware and can be deployed on any reasonably modern machine (or on multiple machines for a distributed deployment). Since both ADAMpro and Cineast are capable of operating on multiple CPU cores in parallel, it is recommended to use a multi-core machine. The memory consumption of ADAMpro and Cineast (during extraction) can also be substantial. While Cineast has an integrated swapping mechanism, it is highly recommended to run the vitrivr stack in an environment with 8 GB or more of main memory.

The requirements in terms of software are dependent on the flavor of the deployment. The following list provides an overview of the required software packages:

Setting up ADAMpro

ADAMpro can be easily set up using the provided Docker images.

Simple Deployment using Docker Container

A Docker image of ADAMpro has been released on Docker Hub.

Run the Docker container using (with the recommended ports being opened):

    docker run --name adampro -p 4040:4040 -p 5890:5890 -p 9099:9099 -d vitrivr/adampro:2.1-selfcontained

where port 5890 denotes the port to connect with an application to ADAMpro, port 9099 serves the ADAMpro web UI and port 4040 is for the Spark web UI.

Note that the Docker container makes use of a number of environment variables which can be adjusted, e.g., for better performance. Note in particular ADAMPRO_MEMORY which is set to 2g. A few other environment variables, e.g., ADAMPRO_START_WEBUI, can be used to denote whether certain parts of ADAMpro should be started or not.

Native Deployment

ADAMpro can be built using sbt. The ADAMpro source code can be obtained from the GitHub repository using

    git clone https://github.com/vitrivr/ADAMpro.git 

In order to start the build process, all the above mentioned dependencies need to be installed. We provide various sbt tasks to simplify the deployment and development.

Because of its project structure, for building ADAMpro, you have to first run

    sbt proto

in the main folder, which generates the proto-files and creates a jar file containing the proto sources into the ./lib/ folder.

Running

    sbt assembly

(and sbt web/assembly for the ADAMpro UI), a jar file is created which can then be submitted to Apache Spark using

    ./spark-submit --master "local[4]" --driver-memory 2g --executor-memory 2g --class org.vitrivr.adampro.main.Startup $ADAM_HOME/ADAMpro-assembly-0.1.0.jar

which will start four local workers and assign 2 GB of memory to the driver and the executor. Note the details of the Apache Spark on the corresponding documentation.

ADAMpro can also be started locally, e.g., from an IDE. For this, remove the % "provided" statements from build.sbt and the marked line ExclusionRule("io.netty"), and run the main class org.vitrivr.adampro.main.Startup. You can use

    sbt run

for running ADAMpro, as well. Note that the storage engines specified in the configuration have to be already running or you have to adjust the config file accordingly.

Configuration

ADAMpro can be configured using a configuration file. This repository contains a ./conf/ folder with configuration files.

When starting ADAMpro, you can provide a adampro.conf file in the same path as the jar, which is then used instead of the default configuration. (Note the file adampro.conf.template which is used as a template for the Docker container.)

The configuration file can be used to specify configurations for running ADAMpro. The file ADAMConfig.scala reads the configuration file and provides the configurations to the application.

The file contains information on

      parquet {
        engine = "ParquetEngine"
        hadoop = true
        basepath = "hdfs://spark:9000/"
        datapath = "/adampro/data/"
      }

or

      parquet {
        engine = "ParquetEngine"
        hadoop = false
        path = "~/adampro-tmp/data/"
      }
     

The parameters specified in here are passed directly to the storage engines; it may make sense to consider the code of the single storage engine to see which parameters are necessary to specify (or to consider the exemplary configuration files in the configuration folder). The name of the class is specified in the field engine.

Alternative Deployments

More information on further deployment strategies are available in the development wiki and in particular the deployment page providing a large variety of possibilities for deploying ADAMpro, including native deployments, or self-contained and distributed deployments using Docker.

Setting up Cineast

Cineast is built using Gradle which comes with a wrapper so it does not have to be installed beforehand. To download and build Cineast, the following commands can be used:

    git clone https://github.com/vitrivr/cineast.git cineast
    cd cineast
    ./gradlew deploy

The last command will download all of Cineast’s dependencies and build an executable jar file which will be in the buildlibs directory. After a successful build, this directory contains the cineast.jar file, a cineast.json configuration file, as well as all other files which are used by Cineast.

In order to tell Cineast how to connect to the previously installed ADAMpro instance, the cineast.json file needs to contain the correct values. By default, it contains the following entry:

    ...
    "database": {
            "host" : "127.0.0.1",
            "port": 5890,
            "plaintext": true
    },
    ...

If ADAMpro is running on the same machine on its default port, these values are already correct. In case ADAMpro is installed on a different host or is listening to a different port, the values need to be adjusted accordingly.

Once this configuration is done, Cineast should be able to communicate with ADAMpro. Next, Cineast can be instructed to create all necessary entities in ADAMpro which it needs for retrieval. To do this, the following command can be run from within the directory containing both cineast.jar and cineast.json:

    java -jar cineast.jar --setup

In case the Cineast CLI is enabled in the configuration – which it is by default – Cineast will not terminate after the setup but rather wait for further instructions. As there is nothing to do at this point, it can be terminated using either exit or quit.

Setting up the vitrivr UI

Using the pre-built version

We provide a pre-built version of the UI, consisting only of HTML, CSS and JavaScript files on the releases tab on our GitHub Repository. The content of the provided archive has just to be extracted in the top-level directory of a web host. Afterwards, the UI needs to be configured in order to be able to communicate with Cineast and to find the static multimedia content.

Building the UI

To build the UI from source, the following commands can be used:

    git clone https://github.com/vitrivr/vitrivr-ng.git vitrivr-ng
    cd vitrivr-ng
    npm install
    npm install -g @angular/cli
    ng build -prod

Afterwards, the content of the dist directory needs to be copied to the top-level directory of the web host which is to be used. From there, the configuration needs to be adjusted in order for the UI to know how to communicate with Cineast.

Configuring the UI

All the configuration in the UI is stored in the config.json file. The configuration contains several parts, only one of which is relevant at this point, the part prefixed with api. It tells the UI how to reach Cineast. In case Cineast is deployed on the same machine as the UI and vitrivr is only supposed to be operated locally, the default configuration does not need to be changed. In case vitrivr is also supposed to work over a network, the host field needs to contain an externally resolvable address of the machine running Cineast.

    ...
    "api": {
        "host" : "127.0.0.1", //replace with external address
        "port" : 4567,
        "protocol_http": "http",
        "protocol_ws": "ws",
        "ping_interval": 10000
    },
    ...

Running the Extraction

After the setup is complete, vitrivr is ready to receive some multimedia documents. To add documents to the collection known to vitrivr, Cineast has to perform an extraction process on them. The properties of such an extraction job are specified in a job file. Several examples of such job files can be found on GitHub.

Shown below is a basic example of an extraction job file for videos, as specified by the type property. For every extraction, only one type of multimedia document is processed.

    {
        "type": "VIDEO",
        "input": {
            "path": "/path/to/data/videos/",
            "depth": 2,
            "skip": 0,
            "id": {
                "name": "UniqueObjectIdGenerator",
                "properties": {}
            }
        },
        "extractors": [
            { "name": "AverageColor" },
            { "name": "AverageColorRaster" },
            { "name": "CLD" },
            { "name": "EdgeGrid16" },
            { "name": "EHD" },
            { "name": "DominantEdgeGrid16" },
            { "name": "SubDivMotionHistogram5" },
            { "name": "SubDivMotionHistogramBackground5" },
            { "name": "HOGMirflickr25K512" },
            { "name": "SURFMirflickr25K512" }
        ],
        "exporters": [{
            "name": "ShotThumbNails",
            "properties": {
                "destination": "/path/to/thumbnails/"
            }
        }]
    }

The first block after the type specification is the input block, which contains information on where the files are located and how they should be traversed in the file system. In case a relative file location is provided in the path property, it is considered to be relative to the location of the job file. Cineast will however store the file paths relative to itself, so the stored paths for the documents may differ from what is specified in the job file.

The following block lists the extractors which are to be used for this job. Each entry in this list refers to an extraction module and contains the java class to be loaded. These modules will produce the actual feature vectors during the extraction.

Similar to the extractors, the exporters also process the decoded information from the documents. They do however not produce any feature vectors but are instead used to export information which can be used elsewhere. In this example, only a single exporter is used which generates thumbnail images to be used as static preview during retrieval.

To run an extraction job, Cineast has to be called with the job file as a parameter. Since the extraction can have a large memory footprint, especially for video, it is recommended to pre-allocate sufficient memory to avoid triggering the internal swapping mechanisms – which would increase extraction time – or running out of memory, which would cause the extraction to abort. The following example starts an extraction job with 8 GB of pre-allocated memory:

    java -Xmx8g -Xms8g -jar cineast.jar --job /path/to/job.json

Retrieval

After the extraction of the first multimedia objects is complete, vitrivr will be able to use them for retrieval. For retrieval to work properly however, some settings might need to be adjusted.

Configuring features

Depending on the types of documents within a collection and the features used during extraction, the configuration which tells Cineast which features to use during retrieval needs to be changed. This is done in the cineast.json configuration file using the retriever.features property. It specifies which feature categories there are and out of which retrieval modules they are comprised. The categories themselves need to match those defined in the UI as they will be used as query parameters. For retrieval to work efficiently, only those features which were used during extraction should be present in the categories used during retrieval. Every feature should only be present in one category. Within a category, each feature can be assigned a weight which is used for result fusion within the category. These weights determine the influence of an individual feature in the result from a category and can be tuned depending the use case at hand.

After the features are configured correctly, start Cineast using

    java -jar cineast.jar

Setting up the paths

In order for the UI to be able to display the retrieved content correctly, it needs to know how the actual documents (original files) and optional preview images (thumbnails) can be accessed via HTTP(S). The relevant base paths must be specified in its config.json configuration file in the resources block using the host_thumbnails and host_object properties.

The final paths are constructed by the ResolverService class relative to the base paths setup in the configuration. For thumbnails, Vitrivr NG expects a folder structure under the base path based on the media type (‘audio’, ‘video’, ‘image’, ‘model3d’) followed by the media object’s ID. For the original files, Vitrivr NG expects there to be folders separating the media types under the base path. The path known to Cineast will be resolved directly against this media type specific folder. In case a different folder structure is required for any particular installation, the ResolverService can be adjusted to reflect these differences.

Generating index structures

Especially for large collections, it is beneficial to use the index support provided by ADAMpro. These indexes can be generated via the ADAMpro web interface which is available on the machine running ADAMpro on port 9099. Since Cineast currently does not use approximate indexes. Hence, the Vector Approximation index should be created.

The first query

After the setup, vitrivr should be ready for search. If you look at the UI, the side bar on the left hand side of the screen contains all the relevant elements for query formulation. Query formulation is centered around the concept of query containers and query terms. Upon execution, the query terms within a query container are connected through a logical AND relationship whereas different query containers are connected by a logical OR. Query containers can be added directly by clicking the green (+) button. Per container, the UI presents the end-user with a choice of up to five query terms — image, audio, 3D model, motion and text (the selection can depend on the configuration). Each term can be toggled and only one instance of a query term of each type can be active per container. Generally, a query term allows the end-user to either select or create a reference document for similarity. For example, the UI includes a sketchpad that can be used to draw sketches for Query-by-Sketch. It is also possible to upload files like images, audio snippets, or 3D models for Query-by-Example. Moreover, execution of some query terms can be refined through additional settings, which influence the feature modules that will be used.

Once a query has been formulated, a click on the search button on the top of the left side panel starts the retrieval process. On the Cineast side, partial results are aggregated and transmitted per feature category. Vitrivr NG displays these partial results as they become available and usually updates the view several times in the process. Currently, there are three different views for presenting results: Two types of gallery views and a simple list view. You can navigate between those views using the toolbar (top). Switching between views does not influence the result set and you can navigate even while the query is being executed.

As results become available, the panel on the right side of the screen is updated with additional options (if it is not visible, you can activate it using the appropriate button in the toolbar). Using this panel, the weights for the different categories can be adjusted which will update the ranking and thus the order of the results. Furthermore, one can toggle media type filters. Since these operations are executed entirely by the UI, no further communication with the back-end is necessary. However, they only operate on the result set that is available locally.

Extending and customizing vitrivr

The vitrivr stack is designed in such a way that it can easily be extended and adjusted to different use cases.

Adding features to Cineast

Due to its modular architecture, adding further feature modules to Cineast is simple and does not even require a rebuild of Cineast itself. Feature modules are loaded via java reflections whenever they are needed during extraction or retrieval. Therefore, they just need to be present in the classpath of Cineast. We provide an example repository which shows how such feature modules are constructed. Once the newly added modules are present in the classpath, they can be used in the same way as those provided with Cineast.

Customizing the UI

All the different UI components in Vitrivr NG have been realized as dedicated Angular components. Adjusting the UI therefore boils down to either adjusting existing or adding new such Angular components. All you need is knowledge about Angular and Typescript.

Results presentation

Adding new forms of results presentation is straightforward and only requires two steps. First, one must create a new component that deals with the presentation logic. Inheriting from the existing, AbstractResultsViewComponent class makes sure that all the wiring with the QueryService is already in place. We refer to the implementation of the GalleryComponent or ListComponent for example. Secondly, one must provide the user with means to navigate to the new view. This can be achieved by adding a navigation rule to the AppRoutingModule and adding a link to the toolbar. Of course it is also possible to create a component from scratch, in which case the interface with the QueryService must be established manually.

Communication endpoints

All communication facilities for the Cineast API are implemented as services. Angular services are re-usable classes that can be injected into components and other services. Currently, there is a QueryService singleton, which provides the similarity query functionality, and a MetadataLookupService, which enables lookup of metadata entries. It is easy to add new services that connect to other WebSocket or RESTful endpoints exposed by Cineast. The CineastAPI service class provides the basic communication primitives required for WebSocket communication and it can be re-used in other classes. The entire communication layer uses an Observer pattern powered by RxJS and we encourage users to adapt the same pattern when creating extensions.

Additional query components

The query container system described previously is modular by design and new types of query terms can be added by creating the respective model entities and UI components. If you want to support new modalities, however, then adding query terms to Vitrivr NG is only the first step in the process. Obviously, Cineast must be extended as well in order to support these modalities.