Requirements

Some or all of these may already be installed on your development machine.

Building

Clone the repository in your desired local directory and build it:

git clone https://github.com/OryxProject/oryx.git oryx
cd oryx
mvn -DskipTests package

This will build the following binaries:

  • Batch Layer: deploy/oryx-batch/target/oryx-batch-2.3.0.jar
  • Speed Layer: deploy/oryx-speed/target/oryx-speed-2.3.0.jar
  • Serving Layer: deploy/oryx-serving/target/oryx-serving-2.3.0.jar

Note that if you are interested in developing on Oryx, you should probably fork this repository and then work on your own fork, so that you can submit pull requests with changes.

Platform Only

The default build includes end-to-end ML applications based on Spark MLlib and other libraries. To build only the lambda tier and ML tier, for use with your own app, disable the app-tier profile: -P!app-tier. Note that in bash, ! is reserved, so you may need to add -P\!app-tier.

Testing

mvn test runs all unit tests. mvn verify will run all integration tests too, which takes significantly longer.

Module Mapping

Major modules and their relation to tiers and layers:

Serving Speed Batch
Binary oryx-serving oryx-speed oryx-batch
App oryx-app-serving oryx-app-mllib oryx-app oryx-app-mllib oryx-app
ML oryx-ml oryx-ml
Lambda oryx-lambda-serving oryx-lambda oryx-lambda

Supporting modules like oryx-common, oryx-app-common, oryx-api, and oryx-app-api are not shown.

Making an Oryx App

Oryx comes with an “app tier”, implementations of actual Batch, Speed and Serving Layer logic for recommendation, clustering and classification. However, any implementation may be used with Oryx. They can be mixed and matched too. For example, you could reimplement the Batch Layer for ALS-related recommendation and instead supply this alternative implementation while still using the provided ALS Serving and Speed Layers.

Creating an App

In each case, creating a custom Batch, Speed or Serving Layer app amounts to implementing a few key Java interfaces or Scala traits in com.cloudera.oryx.api. These interfaces/traits are found in the oryx-api module within the project.

Java Scala
Batch batch.BatchLayerUpdate batch.ScalaBatchLayerUpdate
Speed speed.SpeedModelManager speed.ScalaSpeedModelManager
Serving serving.ServingModelManager serving.ScalaServingModelManager

com.cloudera.oryx.api also contains key support classes and interfaces used by these interfaces. For example, serving.OryxResource is a starting point for building custom JAX-RS endpoints, but need not be used.

Building an App

To access these interfaces/traits in your application, add a dependency on com.cloudera.oryx:oryx-api. The scope should be provided.

In Maven, this would mean adding a dependency like:

<dependencies>
  <dependency>
    <groupId>com.cloudera.oryx</groupId>
    <artifactId>oryx-api</artifactId>
    <scope>provided</scope>
    <version>2.3.0</version>
  </dependency>
</dependencies>

A minimal skeleton project can be found at example/. In the spirit of “word count”, this application consumes lines of space-separated words, and counts the number of distinct words that each word occurs with in a line. For example, in “the quicker the better”, each word occurs with 2 other distinct words.

Compile your code and create a JAR file containing only your implementation, and any supporting third-party code. With Maven, this happens with mvn package.

Note: to enable native BLAS acceleration in your app, see additional notes about BLAS in the performance documentation.

Compiling the Word Count Example

Building the entire project with mvn ... package per above will actually build the example. The application JAR is produced at target/example-2.3.0.jar for example.

To rebuild or repackage just the word count example:

cd app/example
mvn package

Note that this won’t work when building from the head of a branch (a SNAPSHOT version) unless you first mvn ... install the project artifacts locally.

Customizing an Oryx App

When deploying the prepackaged applications that come with Oryx, in some cases, it’s possible to supply additional implementations to customize their behavior. For example, the ALS recommender application exposes a com.cloudera.oryx.app.als.RescorerProvider interface. These app-specific API classes are found in module oryx-app-api. Implementations of interfaces like these can be compiled, packaged and deployed in the same way described here for stand-alone applications.

<dependencies>
  <dependency>
    <groupId>com.cloudera.oryx</groupId>
    <artifactId>oryx-app-api</artifactId>
    <scope>provided</scope>
    <version>2.3.0</version>
  </dependency>
</dependencies>

Deploying an App

Copy the resulting JAR file – call it myapp.jar – to the directory containing the Oryx binary JAR file it will be run with.

Change your Oryx .conf file to refer to your custom Batch, Speed or Serving implementation class, as appropriate.

When running the Batch / Speed / Serving Layers, add --app-jar myapp.jar to the oryx-run.sh command line.

 Deploying the Word Count Example

For example, if you’ve built the packaged example “word count” app above, you can run it by copying and adapting the provided wordcount-example.conf configuration file:

./oryx-run.sh batch --conf wordcount-example.conf --app-jar example-2.3.0.jar

… and similarly for the speed and serving layers. Feed lines of input and then observe counts:

curl -X POST http://.../add/foo%20bar%20baz
...
curl http://.../distinct
{"foo":2,"bar":2,"baz":2}

Back to top

Reflow Maven skin by Andrius Velykis.