Marketplace Scripts
The market place is a collection of scripts in various languages.
Data Cleaning - replace nulls with the mean
Replace the nulls of a column with the mean of that column
- Input a numeric column with nulls
- Output a numeric column with means instead of nulls
- The script can be used for cleaning nulls (by replacing them with the mean)
Data Cleaning - replace nulls with the median
Replace the nulls of a column with the median of that column
- Input a numeric column with nulls
- Output a numeric column with median instead of nulls
- The script can be used for cleaning nulls (by replacing them with the median)
Data Cleaning - replace nulls with zero
Replace the nulls of a column with zeros
- Input a numeric column with nulls
- Output a numeric column with zeros instead of nulls
- The script can be used for cleaning nulls (by replacing them with zeros)
Replace Empty String
Replace empty strings with 'Not Found'
- Input a nominal column
- Output a column with empty strings replaced with 'Not Found' (or any other string specified in the script)
Round Numbers
Round numbers in the given column to the nearest integer
- Input a numeric column
-
Output a numeric column with the numbers rounded to a specified precision (2.12345 with precision 2 is 2.12)
Upper case
Modify a nominal column to be in its upper case form
- Input a nominal column
- Output the column in upper case
Lower case
Modify a nominal column to be in its lower case form
- Input a nominal column
- Output the column in lower case
Date differences
Get the differences between dates in successive rows
- Input a date column
- Output column of differences between dates
Round numbers in strings
Finds and rounds all the numbers in a nominal column
- Input nominal column
-
Output a nominal column with the numbers rounded to a specified precision (2.12345 with precision 2 is 2.12)
Remove rows with missing data
Remove rows with missing data
- Input all columns
- Output all columns, excluding rows with missing data
Sentiment analysis
Estimates for each line of text if it's a positive or negative phrase by counting the positive and negative words (dictionaries are download from pyramid website)
- Input a text column where each line has more than one word
- Output a categorical column with positive / negative categories.
- This script can be used for analyzing restaurant reviews/ book review etc.
Create outlier annotation
Analyzing numeric data in determine if a value is an outlier
Note- the sensitivity of the outlier can be adjusted by changing outlier Upper Threshold, increase its value will produce more outlier and visa-versa.
- Input a numeric column
- Output a categorical column with outlier / not-outlier categories.
- The script can be used for coloring outliers
Create outlier annotation by standard deviation
If an outlier is detected and confirmed by standard deviation, this function creates a column with an is-outlier annotation.
Note that the sensitivity of the outlier can be adjusted by changing standard deviation Number. Increasing its value will produce more outliers, while reducing it will produce fewer outliers.
- Input a numeric column
- Output a categorical column with outlier/not-outlier categories.
- The script can be used for coloring outliers
Remove the values most different from the mean
Analyzing numeric data and determine if a value is an outlier, if so replace it with the mean
- Input a numeric column
- Output numeric column with clean values
- The script can be used for using a cleaner data.
Clean text
Remove all non-Alphanumeric characters
- Input the text column to be cleaned
- Output the column without the non-Alphanumeric characters
- The script can only be used for health data for infants
Matrix transpose
Transpose matrix- rows to columns- columns to rows
- Input a matrix nXm the following implementation is 3X3 but can be adjusted
- Output the transpose of the matrix
Difference matrix
Return a numeric matrix with the difference between rows (instead of the actual values).
- Input numeric columns
- Output a numeric matrix with the differences between rows
Mack Chain-Ladder
The chain-ladder method is a prominent actuarial loss reserving technique.
The chain-ladder method is used in both the property and casualty and health insurance fields.
- Input numeric accident time (period), claim time (period), and losses.
- Output is a Chain ladder table
- The script can be used for an insurance data set for estimating future claims.
Clark LDF method
Analyze loss triangle using Clark's LDF (loss development factor) method.
- Input three numeric accident time (period), claim time (period) and losses.
- Output the Clark LDF estimation
- The script can be used for insurance dataset for estimating future claims.
Geo Distance
Calculates the distance between two geographical locations of each data row.
- Input two locations represented by longitude and latitude columns
- Output the distance between the two locations
Geo Distance from Central Location
Calculates the distance between a geographical location of each data row and a central location.
- Input a location represented by longitude and latitude
- Output the distance between the location and the central location that is hard coded in the script
Moving Average
Calculates the moving average of the last N rows.
- Input a numeric column and a column to use for sorting (i.e. date)
- Output a column with the moving average
Get stock daily data
Download data from Yahoo website, change stock name, start date and end date as needed
- Input a column with the names of the stocks
- Output the data from Yahoo for each of the stocks
Country from Address
Gets the country to which an address (full or partial) belongs.
- Input a column of addresses
- Output a column of the countries of the addresses
- The script can be used when the address is available but not the country of the address
Address from Coordinates
Gets the address that matches the latitude and longitude (coordinates) of the data entries.
- Input a column of coordinates (or two columns, one of longitudes and one of latitudes)
- Output a column of the matching addresses
Coordinates from Address
Gets coordinates the addresses
- Input a column of addresses
- Output the coordinates of each address
Get stocks data
Get stock exchange data for multiple stocks over a period of time using Google Finance API
-
Input a hardcoded array of the stock ids, or a nominal column with the stock IDs
- Output a table with stock exchange data
- Downloads googlefinance.client, pandas
Sharp Ratio
Calculates the sharp ratio for different stocks from a specified date
- Input a column of stock names
- Output the sharp ratio of each stock
Basket analysis
Basket analysis for estimating a group of products that will indicate the next group of products to be purchased.
Note- it is required to change the 1st code line according to the input, adding each product to the data frame.
- Input 2 columns: 1. Transaction number (ID), 2. Product Name
- Output- rules, confidence, lift and support table, with the name of the products in each cell.
-
Input N(unknown) Boolean matrix , rows should indicate a transaction, with each column representing a different product. The matrix is filled with false/true values indicating if a product was purchased at the transaction
-
Output- rules table with the name of the products in each cell.
-
The script can be used for analyzing a group of products.
Risk return ratio
Calculates the return / risk ratio. Basically the Sharpe ratio without factoring in the risk-free rate.
- Input a vector with stock name
- Output a new table with a stock name and its risk return ratio
- The script is importing data from yahoo and the start date & end date can be adjusted.
Birch clustering
Implements the Birch clustering algorithm.
- Inputs are: numeric columns (the default are 3 but can be modified)
- Output a new vector with cluster number
- Hierarchical clustering for large dataset
Record expander by month
Date transformation from range to column: start date-end date columns to a monthly column.
- Inputs are: Start Date- date, end date- date, Param1- the parameter to expend
- Output a new table with a line for each month between the date pairs
- General expander by month (for future analysis)
Record expander by days
Date transformation from range to column: start date-end date columns to a daily column.
- Inputs are: Start Date- date, end date- date, Param1- the parameter to expend
- Output a new table with a line for each day between the date pairs
- General expander by day (for future analysis)
App is on-counter
Counts the number of time an application was opened.
- Input a Boolean array
- Output a new column with a switch counter
- Can be used for an application provider that is required to count the number of times it turned on
Service-counter
Counts the number of times the service was requested.
- Input a Boolean array
- Output a new column with a switch counter
- Can be used for a service provider that is required to count the number of time it switched to true
Month counter
Counts the number of months and produces a count table.
- Input date-time column
- Output a new table with a max of 12 rows and for each month its counter
- The script can be used for any date-time input where a group by month is needed
Weekday counter
Counts the number of weekday and produces a count table.
- Input date-time column
- Output a new table with a max of 7 rows and for each weekday its counter
- The script can be used for any date-time input where a group by weekday is needed
Break-even quantity
Unit Contribution of the new product/Unit Contribution of the old product.
- Input three numeric arrays: fixed costs, average Price Per Unit and average Cost Per Unit
- Output numeric column of the ratio
- The script can be used for analyzing the BECR
Break-even cannibalization rate
Unit Contribution of the new product/Unit Contribution of the old product.
- Input two numeric arrays: unit contribution of new product and unit contribution of old product
- Output numeric column of the ratio
- The script can be used for analyzing the BECR
Estimate blood pressure in infants at birth based on body weight
A simple estimation of blood pressure by body weight.
- Input a numeric column of body weight
- Output numeric column of blood pressure
- The script can be only for health data for infants.
Estimate red cell volume in infants based on body weight
A simple estimation of red cell volume by body weight.
- Input a numeric column of body weight
- Output numeric column of red cell volume
- The script can be only for health data for infants.