Programming Challenge: Secret Santa
Imagine that every year your extended family does a “Secret Santa” gift exchange.
For this gift exchange, each person draws another person at random and then gets a gift for them.
Write a program that will choose a Secret Santa for everyone given a list of all the members of your extended family.
Obviously, a person cannot be their own Secret Santa.
After the third year of having the Secret Santa gift exchange,
you’ve heard complaints of having the same Secret Santa year after year.
Modify your program so that a family member can only have the same Secret Santa once every 3 years.
As your extended family has grown, members have gotten married and/or had children.
Families usually get gifts for members of their immediate family,
so it doesn’t make a lot of sense for anyone to be
a Secret Santa for a member of their immediate family (spouse, parents, or children).
Modify your program to take this constraint into consideration when choosing Secret Santas.
I have introduced random shift index for pair family member candidate.
That algorithm allows building a Secret Santa pair in O(n) in runtime and O(n) for memory complexity
I added a sliding window algorithm that store history for the last three years in a pair year/family member.
To store family relations, I used a weighted bi-directed graph.
SecretSanta interface provides two methods with/without filter interface.
public interface SecretSanta<T extends RelationEntity> {
Map<T, T> getPairs();
Map<T, T> getPairs(Predicate<T> filter);
}
A client can define a custom filter by specifying filter predicate.
For example, filter all family members older 18 years
secretSanta.getPairs(p -> p.getAge() > 18);
We should implement FamilyRepository interface for persistent database and wire it for wire calculation algorithm.
public interface FamilyRepository<T extends RelationEntity> {
void addMemberPair(T source, T destination, RelationType type);
List<T> getAllMembers();
List<T> getMembers(Predicate<Member> filter);
boolean isFamilyMembers(T source, T destination);
}
Let’s assume we have 350M total users
Storage Estimates
Total size: (16 + 256 + 1 + 68 + 160) * 350M ~ 195 GB
Before we choose storage, let’s have a look at available options, and picked the best suitable.
MySQL is an open source relation database
Summary
We will have a hard time to tune performance for a large dataset, where
user table: 350 M records
relation table: 3,500 M records
We can store relations between in the same table in JSON format, but still, 350 M records is a huge amount for MySQL
MongodB is a document-oriented database
Pros
Cons
Summary
At first glance, we might think this suitable for us, and we can store family in one document.
But when two families establish a new connection (e.g., get married), we should merge two documents in a single document, that might cause the problem when our document will grow significantly (MongoDB has a restriction for 16MB per document)
Cassandra is a wide column database
Summary
The main focus of cassandra database in fast writes and horizontal scalability, unfortunately, it comes with a price,
cassandra reads is considerably slow and minimum system requirements are quite high compared to other databases.
For the system, we build where we should keep the system on low specs
for 11 months where no activity happened on our application might be a problem for the price perspective.
To build an HA cluster, we should have at least three nodes.
Neo4j is a graph database
Summary
Secret Santa application looks like a perfect use case for graph databases where you have a lot of relations
between entities and powerful query language (cypher) will cover most of your filter needs.
The main problem with the graph database is scalability and performance, but according to Neo4J documentation,
community edition limit for relations/node is 34 Bn. A slow reads performance we can fix by introducing cache mechanism.
The detailed workflow would look like this:
Note: we might consider implementing a user notification system by email/SMS in a future
We can add the Load balancing layer between Clients and Application servers.
Initially, we could use a simple Round Robin approach that distributes incoming requests equally among backend servers.
We should cache users that are frequently accessed.
We can use some off-the-shelf solution like Redis, which can store full user information.
The application servers, before hitting the database, should check if the cache has the desired user.
Least Recently Used (LRU) can be a reasonable eviction policy for our system.
To further increase the efficiency, we can replicate our caching servers to distribute load between them.
Logging and monitoring is a significant part of the system.
While monitoring helps to gauge the health/state of the system at any instance or interval of time,
logging give you details about the event such as a resource that was accessed, who accessed it, and the time.
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.