Marketplace Scripts
The market place is a collection of scripts in various languages.
Data Cleaning  replace nulls with the mean
Replace the nulls of a column with the mean of that column
 Input a numeric column with nulls
 Output a numeric column with means instead of nulls
 The script can be used for cleaning nulls (by replacing them with the mean)
Data Cleaning  replace nulls with the median
Replace the nulls of a column with the median of that column
 Input a numeric column with nulls
 Output a numeric column with median instead of nulls
 The script can be used for cleaning nulls (by replacing them with the median)
Data Cleaning  replace nulls with zero
Replace the nulls of a column with zeros
 Input a numeric column with nulls
 Output a numeric column with zeros instead of nulls
 The script can be used for cleaning nulls (by replacing them with zeros)
Replace Empty String
Replace empty strings with 'Not Found'
 Input a nominal column
 Output a column with empty strings replaced with 'Not Found' (or any other string specified in the script)
Round Numbers
Round numbers in the given column to the nearest integer
 Input a numeric column

Output a numeric column with the numbers rounded to a specified precision (2.12345 with precision 2 is 2.12)
Upper case
Modify a nominal column to be in its upper case form
 Input a nominal column
 Output the column in upper case
Lower case
Modify a nominal column to be in its lower case form
 Input a nominal column
 Output the column in lower case
Date differences
Get the differences between dates in successive rows
 Input a date column
 Output column of differences between dates
Round numbers in strings
Finds and rounds all the numbers in a nominal column
 Input nominal column

Output a nominal column with the numbers rounded to a specified precision (2.12345 with precision 2 is 2.12)
Remove rows with missing data
Remove rows with missing data
 Input all columns
 Output all columns, excluding rows with missing data
Sentiment analysis
Estimates for each line of text if it’s a positive or negative phrase by counting the positive and negative words (dictionaries are download from pyramid website)
 Input a text column where each line has more than one word
 Output a categorical column with positive / negative categories.
 This script can be used for analyzing restaurant reviews/ book review etc.
Create outlier annotation
Analyzing numeric data in determine if a value is an outlier
Note the sensitivity of the outlier can be adjusted by changing outlier Upper Threshold, increase its value will produce more outlier and visaversa.
 Input a numeric column
 Output a categorical column with outlier / notoutlier categories.
 The script can be used for coloring outliers
Create outlier annotation by standard deviation
If an outlier is detected and confirmed by standard deviation, this function creates a column with an isoutlier annotation.
Note that the sensitivity of the outlier can be adjusted by changing standard deviation Number. Increasing its value will produce more outliers, while reducing it will produce fewer outliers.
 Input a numeric column
 Output a categorical column with outlier/notoutlier categories.
 The script can be used for coloring outliers
Remove the values most different from the mean
Analyzing numeric data and determine if a value is an outlier, if so replace it with the mean
 Input a numeric column
 Output numeric column with clean values
 The script can be used for using a cleaner data.
Clean text
Remove all nonAlphanumeric characters
 Input the text column to be cleaned
 Output the column without the nonAlphanumeric characters
 The script can only be used for health data for infants
Matrix transpose
Transpose matrix rows to columns columns to rows
 Input a matrix nXm the following implementation is 3X3 but can be adjusted
 Output the transpose of the matrix
Difference matrix
Return a numeric matrix with the difference between rows (instead of the actual values).
 Input numeric columns
 Output a numeric matrix with the differences between rows
Mack ChainLadder
The chainladder method is a prominent actuarial loss reserving technique.
The chainladder method is used in both the property and casualty and health insurance fields.
 Input numeric accident time (period), claim time (period), and losses.
 Output is a Chain ladder table
 The script can be used for an insurance data set for estimating future claims.
Clark LDF method
Analyze loss triangle using Clark's LDF (loss development factor) method.
 Input three numeric accident time (period), claim time (period) and losses.
 Output the Clark LDF estimation
 The script can be used for insurance dataset for estimating future claims.
Geo Distance
Calculates the distance between two geographical locations of each data row.
 Input two locations represented by longitude and latitude columns
 Output the distance between the two locations
Geo Distance from Central Location
Calculates the distance between a geographical location of each data row and a central location.
 Input a location represented by longitude and latitude
 Output the distance between the location and the central location that is hard coded in the script
Moving Average
Calculates the moving average of the last N rows.
 Input a numeric column and a column to use for sorting (i.e. date)
 Output a column with the moving average
Get stock daily data
Download data from Yahoo website, change stock name, start date and end date as needed
 Input a column with the names of the stocks
 Output the data from Yahoo for each of the stocks
Country from Address
Gets the country to which an address (full or partial) belongs.
 Input a column of addresses
 Output a column of the countries of the addresses
 The script can be used when the address is available but not the country of the address
Address from Coordinates
Gets the address that matches the latitude and longitude (coordinates) of the data entries.
 Input a column of coordinates (or two columns, one of longitudes and one of latitudes)
 Output a column of the matching addresses
Coordinates from Address
Gets coordinates the addresses
 Input a column of addresses
 Output the coordinates of each address
Get stocks data
Get stock exchange data for multiple stocks over a period of time using Google Finance API

Input a hardcoded array of the stock ids, or a nominal column with the stock IDs
 Output a table with stock exchange data
 Downloads googlefinance.client, pandas
Sharp Ratio
Calculates the sharp ratio for different stocks from a specified date
 Input a column of stock names
 Output the sharp ratio of each stock
Basket analysis
Basket analysis for estimating a group of products that will indicate the next group of products to be purchased.
Note it is required to change the 1st code line according to the input, adding each product to the data frame.
 Input 2 columns: 1. Transaction number (ID), 2. Product Name
 Output rules, confidence, lift and support table, with the name of the products in each cell.

Input N(unknown) Boolean matrix , rows should indicate a transaction, with each column representing a different product. The matrix is filled with false/true values indicating if a product was purchased at the transaction

Output rules table with the name of the products in each cell.

The script can be used for analyzing a group of products.
Risk return ratio
Calculates the return / risk ratio. Basically the Sharpe ratio without factoring in the riskfree rate.
 Input a vector with stock name
 Output a new table with a stock name and its risk return ratio
 The script is importing data from yahoo and the start date & end date can be adjusted.
Birch clustering
Implements the Birch clustering algorithm.
 Inputs are: numeric columns (the default are 3 but can be modified)
 Output a new vector with cluster number
 Hierarchical clustering for large dataset
Record expander by month
Date transformation from range to column: start dateend date columns to a monthly column.
 Inputs are: Start Date date, end date date, Param1 the parameter to expend
 Output a new table with a line for each month between the date pairs
 General expander by month (for future analysis)
Record expander by days
Date transformation from range to column: start dateend date columns to a daily column.
 Inputs are: Start Date date, end date date, Param1 the parameter to expend
 Output a new table with a line for each day between the date pairs
 General expander by day (for future analysis)
App is oncounter
Counts the number of time an application was opened.
 Input a Boolean array
 Output a new column with a switch counter
 Can be used for an application provider that is required to count the number of times it turned on
Servicecounter
Counts the number of times the service was requested.
 Input a Boolean array
 Output a new column with a switch counter
 Can be used for a service provider that is required to count the number of time it switched to true
Month counter
Counts the number of months and produces a count table.
 Input datetime column
 Output a new table with a max of 12 rows and for each month its counter
 The script can be used for any datetime input where a group by month is needed
Weekday counter
Counts the number of weekday and produces a count table.
 Input datetime column
 Output a new table with a max of 7 rows and for each weekday its counter
 The script can be used for any datetime input where a group by weekday is needed
Breakeven quantity
Unit Contribution of the new product/Unit Contribution of the old product.
 Input three numeric arrays: fixed costs, average Price Per Unit and average Cost Per Unit
 Output numeric column of the ratio
 The script can be used for analyzing the BECR
Breakeven cannibalization rate
Unit Contribution of the new product/Unit Contribution of the old product.
 Input two numeric arrays: unit contribution of new product and unit contribution of old product
 Output numeric column of the ratio
 The script can be used for analyzing the BECR
Estimate blood pressure in infants at birth based on body weight
A simple estimation of blood pressure by body weight.
 Input a numeric column of body weight
 Output numeric column of blood pressure
 The script can be only for health data for infants.
Estimate red cell volume in infants based on body weight
A simple estimation of red cell volume by body weight.
 Input a numeric column of body weight
 Output numeric column of red cell volume
 The script can be only for health data for infants.