github.com/wlattner/mlserver@v0.0.0-20141113171038-895f261d2bfd/README.md (about)

     1  mlserver
     2  ========
     3  
     4  This is a simple application that provides an HTTP/JSON api for machine learning. Currently, only classification is implemented. The server is written in Go, the machine learning algorithms are Python, from the [Scikit-Learn](http://scikit-learn.org/stable/) library. Each model is run inside a separate child process. The models fitted in fit.py are pickled using [joblib](http://scikit-learn.org/stable/modules/model_persistence.html) and saved to a folder named models in the working directory of mlserver.
     5  
     6  ### Building/Installing
     7  
     8  Make sure you have mercurial and zeromq installed:
     9  
    10  **Mac/Homebrew**
    11  ```bash
    12  brew install zeromq
    13  brew install mercurial
    14  ```
    15  **Ubuntu**
    16  ```bash
    17  sudo apt-get install mercurial
    18  sudo apt-get install libtool autoconf automake uuid-dev build-essential
    19  ```
    20  **Fetch and Build zmq4**
    21  ```bash
    22  curl -O http://download.zeromq.org/zeromq-4.0.5.tar.gz
    23  tar zxvf zeromq-4.0.5.tar.gz && cd zeromq-4.0.5
    24  ./configure
    25  make
    26  sudo make install
    27  ```
    28  If you get `error while loading shared libraries: libzmq.so.4` when trying to run mlserver on Ubuntu, try updating the library cache.
    29  ```bash
    30  sudo ldconfig
    31  ```
    32  
    33  Building the app is fairly simple (assuming Go is installed and $GOPAH is set):
    34  
    35  ```bash
    36  go get github.com/wlattner/mlserver
    37  ```
    38  This will clone the repo to `$GOPATH/src/github.com/wlattner/mlserver` and copy the `mlserver` binary to `$GOPATH/bin`.
    39  
    40  The code in `fit.py` and `predict.py` require Python 3, NumPy, SciPy and Scikit-Learn; these are sometimes tricky to install, look elsewhere for help.
    41  
    42  **Ubuntu**
    43  ```bash
    44  sudo apt-get install build-essential python3-dev python3-setuptools python3-numpy python3-scipy libatlas-dev libatlas3gf-base
    45  sudo apt-get install python3-pip
    46  pip3 install scikit-learn
    47  pip3 install pyzmq
    48  ```
    49  
    50  If you modify fit.py or predict.py, run `make`. These two files must be included in the Go source as raw string values, `make` will rewrite fit_py.go and predict_py.go using the current version of fit.py and predict.py
    51  
    52  ### Running
    53  
    54  Start the server:
    55  ```bash
    56  mlserver
    57  ```
    58  
    59  By default, the server will listen on port 5000.
    60  
    61  TODO
    62  ====
    63  - [ ] error handling, especially with fit/predict input
    64  - [ ] automatically stop unused models
    65  - [ ] store models in S3
    66  - [ ] add regression, detect which based on input data
    67  - [ ] better model selection in fit.py
    68  - [ ] better project name
    69  - [ ] config options
    70  - [X] csv file upload for fit/predict input
    71  - [ ] docker container for fit.py and predict.py
    72  - [ ] use kubernetes for fit/predict workers
    73  - [ ] tests
    74  
    75  API
    76  ===
    77  
    78  Get Models
    79  ----------
    80  
    81  * `GET /models` will return all models on the server
    82  
    83  ```json
    84  [
    85    {
    86      "model_id": "0e12bb73-e49a-4dcd-87aa-cb0338b1c758",
    87      "metadata": {
    88        "name": "iris model 1",
    89        "created_at": "2014-11-06T21:52:16.143688Z"
    90      },
    91      "performance": {
    92        "algorithm": "GradientBoostingClassifier",
    93        "confusion_matrix": {
    94          "setosa": {
    95            "setosa": 50,
    96            "versicolor": 0,
    97            "virginica": 0
    98          },
    99          "versicolor": {
   100            "setosa": 0,
   101            "versicolor": 50,
   102            "virginica": 0
   103          },
   104          "virginica": {
   105            "setosa": 0,
   106            "versicolor": 0,
   107            "virginica": 50
   108          }
   109        },
   110        "score": 0.9673202614379085
   111      },
   112      "running": false,
   113      "trained": true
   114    },
   115    {
   116      "model_id": "26f786c1-5e59-432f-a3b0-8b87025043f8",
   117      "metadata": {
   118        "name": "ESL 10.2 Generated Data",
   119        "created_at": "2014-11-07T00:47:14.602932Z"
   120      },
   121      "performance": {
   122        "algorithm": "GradientBoostingClassifier",
   123        "confusion_matrix": {
   124          "-1.0": {
   125            "-1.0": 5931,
   126            "1.0": 111
   127          },
   128          "1.0": {
   129            "-1.0": 307,
   130            "1.0": 5651
   131          }
   132        },
   133        "score": 0.9285000000000001
   134      },
   135      "running": false,
   136      "trained": true
   137    }
   138  ]
   139  ```
   140  
   141  Get Model
   142  ---------
   143  * `GET /models/:model_id` will return the specified model.
   144  
   145  ```json
   146  {
   147    "model_id": "0e12bb73-e49a-4dcd-87aa-cb0338b1c758",
   148    "metadata": {
   149      "name": "iris model 1",
   150      "created_at": "2014-11-06T21:52:16.143688Z"
   151    },
   152    "performance": {
   153      "algorithm": "GradientBoostingClassifier",
   154      "confusion_matrix": {
   155        "setosa": {
   156          "setosa": 50,
   157          "versicolor": 0,
   158          "virginica": 0
   159        },
   160        "versicolor": {
   161          "setosa": 0,
   162          "versicolor": 50,
   163          "virginica": 0
   164        },
   165        "virginica": {
   166          "setosa": 0,
   167          "versicolor": 0,
   168          "virginica": 50
   169        }
   170      },
   171      "score": 0.9673202614379085
   172    },
   173    "running": false,
   174    "trained": true
   175  }
   176  ```
   177  
   178  Fit
   179  ---
   180  
   181  * `POST /models` will create and fit a new model with the supplied training data.
   182  
   183  The request body should be JSON with the following fields:
   184  
   185  * `name` the name of the model
   186  * `data` an array of objects, each element represents a single row/observation
   187  * `labels` an array of strings representing the target value/label of each training example
   188  
   189  To fit a model for predicting the species variable from the [Iris data](http://en.wikipedia.org/wiki/Iris_flower_data_set):
   190  
   191  sepal_length | sepal_width | petal_length | petal_width | species
   192  ------------ | ----------- | ------------ | ----------- | -------
   193  5.1 | 3.5 | 1.4 | 0.2 | setosa
   194  4.9 | 3.0 | 1.4 | 0.2 | setosa
   195  4.7 | 3.2 | 1.3 | 0.2 | setosa
   196  4.6 | 3.1 | 1.5 | 0.2 | setosa
   197  5.0 | 3.6 | 1.4 | 0.2 | setosa
   198  ... | ... | ... | ... | ...
   199  
   200  ```json
   201  {
   202    "name": "iris model",
   203    "data": [
   204      {
   205        "sepal_length": 5.1,
   206        "petal_length": 1.4,
   207        "sepal_width": 3.5,
   208        "petal_width": 0.2
   209      },
   210      {
   211        "sepal_length": 4.9,
   212        "petal_length": 1.4,
   213        "sepal_width": 3.0,
   214        "petal_width": 0.2
   215      },
   216      {
   217        "sepal_length": 4.7,
   218        "petal_length": 1.3,
   219        "sepal_width": 3.2,
   220        "petal_width": 0.2
   221      },
   222      {
   223        "sepal_length": 4.6,
   224        "petal_length": 1.5,
   225        "sepal_width": 3.1,
   226        "petal_width": 0.2
   227      },
   228      {
   229        "sepal_length": 5.0,
   230        "petal_length": 1.4,
   231        "sepal_width": 3.6,
   232        "petal_width": 0.2
   233      }
   234    ],
   235    "labels": [
   236      "setosa",
   237      "setosa",
   238      "setosa",
   239      "setosa",
   240      "setosa"
   241    ]
   242  }
   243  ```
   244  
   245  This will return `202 Accepted` along with the id of the newly created model. The model will be fitted in the background.
   246  
   247  ```json
   248  {
   249    "model_id": "07421303-62f9-40f3-bf14-23cf44af05e2"
   250  }
   251  ```
   252  
   253  Alternatively, the data for fitting a model can be uploaded as a csv file. The file must have a header row and the target variable must be the first column. The table above would be encoded as:
   254  	
   255  	"species","sepal_length","sepal_width","petal_length","petal_width"
   256  	"setosa",5.1,3.5,1.4,0.2
   257  	"setosa",4.9,3,1.4,0.2
   258  	"setosa",4.7,3.2,1.3,0.2
   259  	"setosa",4.6,3.1,1.5,0.2
   260  	"setosa",5,3.6,1.4,0.2
   261  	"setosa",5.4,3.9,1.7,0.4
   262  	"setosa",4.6,3.4,1.4,0.3
   263  	"setosa",5,3.4,1.5,0.2
   264  	"setosa",4.4,2.9,1.4,0.2
   265  
   266  The request should be encoded as multipart/form with the following fields:
   267  
   268  * `name` the name to use for the model
   269  * `file` the csv file
   270  
   271  ```bash
   272  curl --form name="iris model csv" --form file=@iris.csv http://localhost:5000/models
   273  ```
   274  
   275  Predict
   276  -------
   277  
   278  * `POST /models/:model_id` will return predictions using the model for the supplied data
   279  
   280  The request body should have the following fields:
   281  
   282  * `data` an array of objects, each element represents a single row/observation
   283  
   284  To make predict labels (species) for the following data:
   285  
   286  sepal_length | sepal_width | petal_length | petal_width | species
   287  ------------ | ----------- | ------------ | ----------- | -------
   288  6.7 | 3.0 | 5.2 | 2.3 | ?
   289  6.3 | 2.5 | 5.0 | 1.9 | ?
   290  6.5 | 3.0 | 5.2 | 2.0 | ?
   291  6.2 | 3.4 | 5.4 | 2.3 | ?
   292  5.9 | 3.0 | 5.1 | 1.8 | ? 
   293  
   294  ```json
   295  {
   296    "data": [
   297      {
   298        "sepal_length": 6.7,
   299        "petal_length": 5.2,
   300        "sepal_width": 3.0,
   301        "petal_width": 2.3
   302      },
   303      {
   304        "sepal_length": 6.3,
   305        "petal_length": 5.0,
   306        "sepal_width": 2.5,
   307        "petal_width": 1.9
   308      },
   309      {
   310        "sepal_length": 6.5,
   311        "petal_length": 5.2,
   312        "sepal_width": 3.0,
   313        "petal_width": 2.0
   314      },
   315      {
   316        "sepal_length": 6.2,
   317        "petal_length": 5.4,
   318        "sepal_width": 3.4,
   319        "petal_width": 2.3
   320      },
   321      {
   322        "sepal_length": 5.9,
   323        "petal_length": 5.1,
   324        "sepal_width": 3.0,
   325        "petal_width": 1.8
   326      }
   327    ]
   328  }
   329  ```
   330  
   331  The response will contain class probabilities for each example submitted:
   332  
   333  ```json
   334  {
   335    "labels": [
   336      {
   337        "versicolor": 0.000005590474449815602,
   338        "virginica": 0.9999925658927716,
   339        "setosa": 0.000001843632778976535
   340      },
   341      {
   342        "versicolor": 0.00003448150394080962,
   343        "virginica": 0.9999626991986605,
   344        "setosa": 0.0000028192973987744193
   345      },
   346      {
   347        "versicolor": 0.00000583767259563357,
   348        "virginica": 0.9999923186950813,
   349        "setosa": 0.0000018436323232313824
   350      },
   351      {
   352        "versicolor": 0.000025292685027774954,
   353        "virginica": 0.9999702563844668,
   354        "setosa": 0.000004450930505165
   355      },
   356      {
   357        "versicolor": 0.00006891207512697766,
   358        "virginica": 0.9999281614880159,
   359        "setosa": 0.000002926436856866432
   360      }
   361    ],
   362    "model_id": "0e12bb73-e49a-4dcd-87aa-cb0338b1c758"
   363  }
   364  ```
   365  
   366  Alternatively, the data could be uploaded as a csv file, see above description for fitting a model using a csv file. In the case of making predictions, the csv file should not have the label/target data in the first column.
   367  
   368  Start Model
   369  ----------
   370  The prediction woker is started with the first prediction request for a model. A model can be started manually however.
   371  
   372  * `POST /models/running` will start a model
   373  
   374  ```json
   375  {
   376    "model_id": "0e12bb73-e49a-4dcd-87aa-cb0338b1c758"
   377  }
   378  ```
   379  
   380  This will return `201 Created` with an empty body. The model will be started in the background.
   381  
   382  Stop Model
   383  ----------
   384  Once started, models will run until the server process exits. Models can be stopped manually.
   385  
   386  * `DELETE /models/running/:model_id` will stop a model
   387  
   388  This will return `202 Accepted` with an empty body. The model will be stopped in the background.