Improving lowest common ancestor algorithms with disjoint set off-line.

Image for post
Image for post
Tarjan’s Off-line LCA Algorithm (Image by Author)

Tarjan’s Off-line Lowest Common Ancestor Algorithm is an interesting application of the disjoint set structure for optimizing the performance of determining the lowest common ancestor(LCA) of two nodes within a tree, which also involves concepts such as caching and recursion. In preparation for this story, please refer to my previous story on Lowest Common Ancestor for context.

Ok, let’s get started!

Disjoint Set

In order to understand this algorithm, we first have to have a good grasp on the concept of disjoint sets…

Disjoint sets are collections of nodes, where there is no overlap between sets.

Sounds simple, right? Actually, it is, it’s just that how you enforce this rule when you build up your disjoint sets requires a bit of artistry. …


Lowest common ancestor of two nodes within a binary tree and the algorithms available to find it.

Image for post
Image for post
Image by Author

What is “Lowest Common Ancestor (LCA)”?

It is the lowest level parent shared by two nodes within a tree.

Let’s take a look at an example:

Image for post
Image for post
Image by Author

Within the above binary tree, nodes 8 and 10 are highlighted. What are their shared parents?

Image for post
Image for post
Image by Author

It’s quite obvious that the shared parents are 5, 7, and 9.

But the shared parent at the lowest level is 9, and it is referred to as the lowest common ancestor(LCA).

Lowest Common Ancestor(LCA) in a Binary Search Tree(BST)

Let’s warm up with a binary search tree.

A binary search tree is a special case of a binary tree, where the left subtree only contains smaller nodes and right subtree only contains bigger nodes. …


What about multivariate normal distributions? Are they basically a few normally distributed variables bundled together? Let’s take a look.

Image for post
Image for post

Normal distribution, also called gaussian distribution, is one of the most widely encountered distributions. One of the main reasons is that the normalized sum of independent random variables tends toward a normal distribution, regardless of the distribution of the individual variables (for example you can add a bunch of random samples that only takes on values -1 and 1, yet the sum itself actually becomes normally distributed as the number of sample you have becomes larger). This is known as the central limit theorem. …


The most classic graph algorithm on planet Earth, explained.

Image for post
Image for post

A classic formulation of the problem would be:

If I were given a graph of nodes, each node is connected to several other nodes and the connections are of varying distance, what is the shortest path to every node in the graph if I start from one of the nodes in the graph?

Depth First

In order to implement Dijkstra’s algorithm we first need to define a node and an edge:

class Node:
def __init__(self, value):

self.value = value
self.edges = set()

self.distance = -1

@staticmethod
def add_edge(a, b, dist):
a.edges.add((b, dist))
b.edges.add((a, dist))

In this case, I have defined a node that contains a value (an id basically), a distance initialized to be -1 (infinity). Additionally, the Node contains a list of edges, each is a tuple of the target node the edge connects to (from self) and the distance to the target node. …


In this short story, we are going to examine the deficiencies of Sharpe ratio, and how we can complement it with Sortino Ratio and Calmar Ratio to gain a clearer picture of the performance of a portfolio.

Image for post
Image for post

In portfolio performance analysis, sharpe ratio is the usually the first number that people look at. However, it does not tell us the whole story (nothing does…). So, let’s spend some time looking at a few more metrics that can be very helpful at times.

Sharpe Ratio Revisited

Sharpe ratio is the ratio of average return divided by the standard deviation of returns annualized. We had an introduction to it in a previous story.

Let’s take a look at it again with a test price time series.

import pandas as pd
import numpy as np

from pandas.tseries.offsets import BDay


def daily_returns(prices):

res = (prices/prices.shift(1) - 1.0)[1:]
res.columns = ['return']

return res


def sharpe(returns, risk_free=0):
adj_returns = returns - risk_free
return (np.nanmean(adj_returns) * np.sqrt(252)) \
/ np.nanstd(adj_returns, ddof=1)


def test_price1():

start_date = pd.Timestamp(2020, 1, 1) + BDay()

len = 100

bdates = [start_date + BDay(i) for i in range(len)]
price = [10.0 + i/10.0 for i in range(len)]

return pd.DataFrame(data={'date': bdates,
'price1': price}).set_index('date')


def test_price2():

start_date = pd.Timestamp(2020, 1, 1) + BDay()

len = 100

bdates = [start_date + BDay(i) for i in range(len)]
price = [10.0 + i/10.0 for i in range(len)]

price[40:60] = [price[40] for i in range(20)]

return pd.DataFrame(data={'date': bdates,
'price2': price}).set_index('date')


def test_price3():

start_date = pd.Timestamp(2020, 1, 1) + BDay()

len = 100

bdates = [start_date + BDay(i) for i in range(len)]
price = [10.0 + i/10.0 for i in range(len)]

price[40:60] = [price[40] - i/10.0 for i in range(20)]

return pd.DataFrame(data={'date': bdates,
'price3': price}).set_index('date')


def test_price4():

start_date = pd.Timestamp(2020, 1, 1) + BDay()

len = 100

bdates = [start_date + BDay(i) for i in range(len)]
price = [10.0 + i/10.0 for i in range(len)]

price[40:60] = [price[40] - i/8.0 for i in range(20)]

return pd.DataFrame(data={'date': bdates,
'price4': price}).set_index('date')


price1 = test_price1()
return1 = daily_returns(price1)

price2 = test_price2()
return2 = daily_returns(price2)

price3 = test_price3()
return3 = daily_returns(price3)

price4 = test_price4()
return4 = daily_returns(price4)

print('price1')
print(f'sharpe: {sharpe(return1)}')

print('price2')
print(f'sharpe: {sharpe(return2)}')

print('price3')
print(f'sharpe: {sharpe(return3)}')

print('price4')
print(f'sharpe…


I live in Manhattan and I love New York, so when the superbug invaded the island, I was concerned. Luckily there is a wealth of data collected on NYC everyday, so let’s dig in!

Image for post
Image for post

New York government provides a large variety of datasets related to the city, making New York one of the best cities to do data analysis on (I love NYC!). For example: NYC OpenData. From driver applications data to rodent inspection, you can pretty much find any data you can think of…

The Coronavirus data we are going to look at today comes from NYC Health Department. It is a daily updated Github repository that contains NYC coronavirus cases data. All data shown below are as of date 2020–09–29.

Rates by Racial Group

Image for post
Image for post
numbers per 100,000 people

Rates by Poverty Level

Image for post
Image for post
numbers per 100,000 people
  • Higher poverty level corresponds to higher rates, as expected.
Image for post
Image for post
  • Low poverty has a lower ratio of hospitalization vs case rate, presumably financially better off people tend to stay at home than going to hospital. …

Graph algorithms are some of the most fascinating algorithms you’ll encounter. In this story we will implement such an algorithm to solve a problem involving combination locks.

Image for post
Image for post
start=01, end=78, blocks=88,77

Graphs are everywhere. from database to parallel processing, graphs underlie architectures of all kinds in real life. The concept behind graph is so ubiquitous that people develop such an intuitive understanding of it, sometimes they don’t actually realize they are dealing with graphs! So now, we are going to play with a couple of graph algorithms that involves a discussion around depth first vs breath first traversal.

Locks

The problem is simple. Suppose I have a combination lock of 4 digits, such as 0000, 0001, etc. I start out at a particular combination of digits, say 1010, and I want to reach another combination of digits, say 9876. At each step, I can increase or decrease one digit by one, like a real lock you use at the gym. …


Explores the relationships of beta and covariance in the context of stock performance.

Image for post
Image for post
Linear Regression

The fundamental idea behind beta and linear correlation, of course, goes back to the least square approximation that we all know and love.

Briefly reviewing the idea behind linear regression:

Suppose I have an independent variable y, for example number of views I get for my story, and a dependent variable x, the amount of time I spent on the story. I make a guess that the relationship between the two variables are linear:

Image for post
Image for post

The beta then, is the slope of the line that best fit x to y. There is a constant alpha as well.

I am sure everyone has seen them countless times, if you have only two data points (x_1, y_1), (x_2, y_2), there is a unique solution to beta, essentially you are drawing a linear segment between the two points. But if you have more than two points, then there is no unique solution to the fit. But in general, there is a solution, (beta, alpha), such that the line defined by them minimizes the square of the difference of the line and the y variable values, that’s the least square solution.


In programming, the series that correspond to the triangular numbers seem to occur quite often. In this short story, I would like to discuss it a little with a couple fun algorithms to give some intuitions behind it.

Image for post
Image for post
Equilateral triangles corresponding to each triangular number

I think most people have seen the triangular numbers at some point, they are the results of the following series:

Image for post
Image for post

As you can see, it is quite simple. I remember the first time I learned it was in middle school, it is usually written more casually like this:

Image for post
Image for post

Closed Form

as you may have guessed, this series has a closed form that’s quite easy to remember:

Image for post
Image for post

expanding it:

Image for post
Image for post

As you can see it is of order O(n²), many algorithms reduce down to the triangular numbers in performance, for example bubble sort, thus are of order n².

How do you arrive at this closed form? …


Recursion is beautiful, and when applied in combination it is truly a form of art. In this story, we are going to implement an interesting algorithm to create pyramids with recursion.

Image for post
Image for post
Pyramids

Ancient civilizations around the world have built giant pyramids for worship and other holy purposes. The most famous, perhaps, are the pyramids built by the Egyptians. These huge structures are built with white, limestone surfaces so that they appear to be shining when seen from a distance. They are so magnificent, in fact, that many people today still believe that they were built by the aliens. So, today we are going to build them in code with recursion. It is very fitting.

Problem:

Suppose I am given a base like this:

[a, b, c ]

and that the following triangles are…

About

Shuo Wang

Interesting pieces on various topics in finance and technology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store