Marketplace Scripts

The market place is a collection of scripts in various languages.

Data Cleaning - replace nulls with the mean

Replace the nulls of a column with the mean of that column

  • Input a numeric column with nulls
  • Output a numeric column with means instead of nulls
  • The script can be used for cleaning nulls (by replacing them with the mean)

Data Cleaning - replace nulls with the median

Replace the nulls of a column with the median of that column

  • Input a numeric column with nulls
  • Output a numeric column with median instead of nulls
  • The script can be used for cleaning nulls (by replacing them with the median)

Data Cleaning - replace nulls with zero

Replace the nulls of a column with zeros

  • Input a numeric column with nulls
  • Output a numeric column with zeros instead of nulls
  • The script can be used for cleaning nulls (by replacing them with zeros)

Replace Empty String

Replace empty strings with 'Not Found'

  • Input a nominal column
  • Output a column with empty strings replaced with 'Not Found' (or any other string specified in the script)

Round Numbers

Round numbers in the given column to the nearest integer

  • Input a numeric column
  • Output a numeric column with the numbers rounded to a specified precision (2.12345 with precision 2 is 2.12)

Upper case

Modify a nominal column to be in its upper case form

  • Input a nominal column
  • Output the column in upper case

Lower case

Modify a nominal column to be in its lower case form

  • Input a nominal column
  • Output the column in lower case

Date differences

Get the differences between dates in successive rows

  • Input a date column
  • Output column of differences between dates

Round numbers in strings

Finds and rounds all the numbers in a nominal column

  • Input nominal column
  • Output a nominal column with the numbers rounded to a specified precision (2.12345 with precision 2 is 2.12)

Remove rows with missing data

Remove rows with missing data

  • Input all columns
  • Output all columns, excluding rows with missing data

Sentiment analysis

Estimates for each line of text if it's a positive or negative phrase by counting the positive and negative words (dictionaries are download from pyramid website)

  • Input a text column where each line has more than one word
  • Output a categorical column with positive / negative categories.
  • This script can be used for analyzing restaurant reviews/ book review etc.

Create outlier annotation

Analyzing numeric data in determine if a value is an outlier

Note- the sensitivity of the outlier can be adjusted by changing outlier Upper Threshold, increase its value will produce more outlier and visa-versa.

  • Input a numeric column
  • Output a categorical column with outlier / not-outlier categories.
  • The script can be used for coloring outliers

Create outlier annotation by standard deviation

If an outlier is detected and confirmed by standard deviation, this function creates a column with an is-outlier annotation.

Note that the sensitivity of the outlier can be adjusted by changing standard deviation Number. Increasing its value will produce more outliers, while reducing it will produce fewer outliers.

  • Input a numeric column
  • Output a categorical column with outlier/not-outlier categories.
  • The script can be used for coloring outliers

Remove the values most different from the mean

Analyzing numeric data and determine if a value is an outlier, if so replace it with the mean

  • Input a numeric column
  • Output numeric column with clean values
  • The script can be used for using a cleaner data.

Clean text

Remove all non-Alphanumeric characters

  • Input the text column to be cleaned
  • Output the column without the non-Alphanumeric characters
  • The script can only be used for health data for infants

Matrix transpose

Transpose matrix- rows to columns- columns to rows

  • Input a matrix nXm the following implementation is 3X3 but can be adjusted
  • Output the transpose of the matrix

Difference matrix

Return a numeric matrix with the difference between rows (instead of the actual values).

  • Input numeric columns
  • Output a numeric matrix with the differences between rows

Mack Chain-Ladder

The chain-ladder method is a prominent actuarial loss reserving technique.

The chain-ladder method is used in both the property and casualty and health insurance fields.

  • Input numeric accident time (period), claim time (period), and losses.
  • Output is a Chain ladder table
  • The script can be used for an insurance data set for estimating future claims.

Clark LDF method

Analyze loss triangle using Clark's LDF (loss development factor) method.

  • Input three numeric accident time (period), claim time (period) and losses.
  • Output the Clark LDF estimation
  • The script can be used for insurance dataset for estimating future claims.

Geo Distance

Calculates the distance between two geographical locations of each data row.

  • Input two locations represented by longitude and latitude columns
  • Output the distance between the two locations

Geo Distance from Central Location

Calculates the distance between a geographical location of each data row and a central location.

  • Input a location represented by longitude and latitude
  • Output the distance between the location and the central location that is hard coded in the script

Moving Average

Calculates the moving average of the last N rows.

  • Input a numeric column and a column to use for sorting (i.e. date)
  • Output a column with the moving average

Get stock daily data

Download data from Yahoo website, change stock name, start date and end date as needed

  • Input a column with the names of the stocks
  • Output the data from Yahoo for each of the stocks

Country from Address

Gets the country to which an address (full or partial) belongs.

  • Input a column of addresses
  • Output a column of the countries of the addresses
  • The script can be used when the address is available but not the country of the address

Address from Coordinates

Gets the address that matches the latitude and longitude (coordinates) of the data entries.

  • Input a column of coordinates (or two columns, one of longitudes and one of latitudes)
  • Output a column of the matching addresses

Coordinates from Address

Gets coordinates the addresses

  • Input a column of addresses
  • Output the coordinates of each address

Get stocks data

Get stock exchange data for multiple stocks over a period of time using Google Finance API

  • Input a hardcoded array of the stock ids, or a nominal column with the stock IDs

  • Output a table with stock exchange data
  • Downloads googlefinance.client, pandas

Sharp Ratio

Calculates the sharp ratio for different stocks from a specified date

  • Input a column of stock names
  • Output the sharp ratio of each stock

Basket analysis

Basket analysis for estimating a group of products that will indicate the next group of products to be purchased.

Note- it is required to change the 1st code line according to the input, adding each product to the data frame.

  • Input 2 columns: 1. Transaction number (ID), 2. Product Name
  • Output- rules, confidence, lift and support table, with the name of the products in each cell.
  • Input N(unknown) Boolean matrix , rows should indicate a transaction, with each column representing a different product. The matrix is filled with false/true values indicating if a product was purchased at the transaction

  • Output- rules table with the name of the products in each cell.

  • The script can be used for analyzing a group of products.

Risk return ratio

Calculates the return / risk ratio. Basically the Sharpe ratio without factoring in the risk-free rate.

  • Input a vector with stock name
  • Output a new table with a stock name and its risk return ratio
  • The script is importing data from yahoo and the start date & end date can be adjusted.

Birch clustering

Implements the Birch clustering algorithm.

  • Inputs are: numeric columns (the default are 3 but can be modified)
  • Output a new vector with cluster number
  • Hierarchical clustering for large dataset

Record expander by month

Date transformation from range to column: start date-end date columns to a monthly column.

  • Inputs are: Start Date- date, end date- date, Param1- the parameter to expend
  • Output a new table with a line for each month between the date pairs
  • General expander by month (for future analysis)

Record expander by days

Date transformation from range to column: start date-end date columns to a daily column.

  • Inputs are: Start Date- date, end date- date, Param1- the parameter to expend
  • Output a new table with a line for each day between the date pairs
  • General expander by day (for future analysis)

App is on-counter

Counts the number of time an application was opened.

  • Input a Boolean array
  • Output a new column with a switch counter
  • Can be used for an application provider that is required to count the number of times it turned on

Service-counter

Counts the number of times the service was requested.

  • Input a Boolean array
  • Output a new column with a switch counter
  • Can be used for a service provider that is required to count the number of time it switched to true

Month counter

Counts the number of months and produces a count table.

  • Input date-time column
  • Output a new table with a max of 12 rows and for each month its counter
  • The script can be used for any date-time input where a group by month is needed

Weekday counter

Counts the number of weekday and produces a count table.

  • Input date-time column
  • Output a new table with a max of 7 rows and for each weekday its counter
  • The script can be used for any date-time input where a group by weekday is needed

Break-even quantity

Unit Contribution of the new product/Unit Contribution of the old product.

  • Input three numeric arrays: fixed costs, average Price Per Unit and average Cost Per Unit
  • Output numeric column of the ratio
  • The script can be used for analyzing the BECR

Break-even cannibalization rate

Unit Contribution of the new product/Unit Contribution of the old product.

  • Input two numeric arrays: unit contribution of new product and unit contribution of old product
  • Output numeric column of the ratio
  • The script can be used for analyzing the BECR

Estimate blood pressure in infants at birth based on body weight

A simple estimation of blood pressure by body weight.

  • Input a numeric column of body weight
  • Output numeric column of blood pressure
  • The script can be only for health data for infants.

Estimate red cell volume in infants based on body weight

A simple estimation of red cell volume by body weight.

  • Input a numeric column of body weight
  • Output numeric column of red cell volume
  • The script can be only for health data for infants.