r-to-ruby-posting-workshop.md

In this gig, you'll rewrite two short-ish R programs into Ruby. You should have at least entry-level knowledge of R, and a strong familiarity with Ruby and RSpec. Knowledge of basic statistics is strongly recommended but not required.

I've tried to be as complete as I can here, but feel free to ask any additional questions by e-mailing me here.

Deliverable

You'll provide implementations of three Ruby classes:

TimeSeries
ExponentialMovingAverageStrategy
HoltWintersStrategy

based on the reference R code here which should implement the API described below.

Briefly,

TimeSeries is an simple array wrapper that exposes some properties as described below.
The two strategies should implement the same or similar prediction behavior as laid out in the reference code.

API summary

The API should wind up looking like the following.

First, you should be able to load an array of data into a TimeSeries instance.

# values from CSV
full_data = [
  {:date => {date in UTC}, :value => {value}},
  {:date => {date in UTC}, :value => {value}},
  {:date => {date in UTC}, :value => {value}},
  ...
]

# Load arr, assume data points are separated by 1 day each.
ts = TimeSeries.new(arr, :days => 1)

# Implements Enumerable by returning the elements of the array.
ts.each { ... }

# Exposes an #interval reader.
ts.interval
# => {:days => 1}

Second, the two Strategy classes should each implement a forecast method which implements the corresponding R code for making a forecast.

# predict 3 data points in the future
# use only first two years of data to make the prediction (365 * 2 = 730)
# use confidence interval 0.99
# pass parameters in :model to the strategy's model
HoltWintersStrategy.new(ts).forecast(3,
  :range => 0..729,
  :confidence => 0.99,
  :model => { :alpha => ..., :beta => ..., :gamma => ... })
# => [
  {:date => {date in UTC}, :forecast => {value}, :low => {low}, :high => {high}},
  {...},
  {...}
]

The above invocation would:

use indices 0..729 from the data in ts
make a prediction for index 730, 731, and 732 (that's the 3 in the first parameter)
use a confidence value of 0.99 to generate the low and high range for the prediction interval
pass the values of the :model hash for use in computing predictions with the strategy (in this case, the parameters of Holt-Winters)

When generating forecasts, each data point's UTC timestamp should be advanced from the previous one by the interval specified when you initialized the TimeSeries. For example, if ts.interval is {:minutes => 15}, then the first forecasted data point's :date should be 15 minutes afterwards. In the above example, the first data point would be timestamped 1 day after index 729's timestamp.

Finally, the strategies will also implement a comparison_forecast method that works a little differently. It does the following steps:

Start with an initial forecast dataset indicated by :range.
Make a forecast for N data points in the future.
Is there data in the full dataset that's not in the forecast dataset? * No: You're all done. Return the forecasted data. * Yes: Add the next data point to the forecast dataset. Repeat from step 2.

It will look at the :range you specify, and then make a forecast about the next data point.

# repeatedly predict one data point in the future until end of the dataset
# start at index 729
# use confidence interval 0.99
# pass parameters in :model to the strategy's model
HoltWintersStrategy.new(ts).comparison_forecast(3,
  :range => 0..729,
  :confidence => 0.99,
  :model => { :alpha => ..., :beta => ..., :gamma => ... })
# => [
  {:date => {date in UTC}, :forecast => {value}, :low => {low}, :high => {high}},
  {...}
  {...}
]

Verifying

To prove it works, your deliverable should include RSpec tests that demonstrate your code exhibits the correct behavior and runs in a reasonable time (for instance, it should take less than a second to produce predictions for the sample dataset once you have the TimeSeries instance ready).

We'll verify by checking your values for each Strategy's #comparison_forecast should be within 0.1% of the value, low, and high when using the sample visitors.csv data and comparing it to the forecasts in results.csv.

We'll also use a different sample dataset and verify the predictions match.

Timeline

We're flexible, but the sooner the better, and we'd like to see final results in about a week. We'll set up a git repository where you can commit your code; push at the end of every day that you work.

Applying

Compensation is $1,250 -- $250 payable immediately when you make your first substantive commit to the git repository, and the rest payable when it's done. We're open to any payment option that works for you (escrow service, PayPal, direct deposit, check, etc.).

To apply, just drop an e-mail to tech-team@uphex.com. Include your full name and at least two of the bullets below:

the URL of your Twitter account or blog
the URL of your LinkedIn account
the URL of your GitHub account, bitbucket.org account, or any other public source code repository

If you have questions but aren't ready to apply yet, e-mail me!

fj/r-to-ruby-posting-workshop.md Secret

Deliverable

API summary

Verifying

Timeline

Applying