Wednesday, March 16, 2011

DataBase Caching

Recently I went to interview for a Software Developer position. I was asked this interesting question:

Given a database and an API to fetch a record from a database, design a datastructure to implement the cache. Also, design and implement the API used to fetch record from the cache.

My Solution:
Access source code here
You need Visual Studio 2010 to open the solution
The solution is coded in C
Data Structures
All records are stored in a Hash Table for fast access
At any point in time if the cache is full, Least Recently Used record is kicked out to give space to the new record. To keep track of the records in the order of their last access time, the records are also maintained as double linked list.
// represents cache record
struct cacheRec
{
    struct rec * databaseRec;
    struct cacheRec * prevRec;
    struct cacheRec * nextRec;
    bool valid;
};

struct cacheRec HashTable[CACHESIZE];

// pointer to the oldest record in the linked list
struct cacheRec * oldestCacheRec;

// pointer to the newest record in the linked list
struct cacheRec * newestCacheRec;

Time Complexity
Hash table provides lookup in O(1) time. If record is not present in the cache, reading it from database and inserting it in the end of the linked list (thereby making it the newest record) can be done in constant time. If cache is full, deleting record from cache, before inserting the new record, is also achieved in constant time. If record is present in the cache, deleting it from its current location in the linked list and inserting it in the end of the list, can also be done in O(1) time. Thus total time taken by FetchRecord() API is O(1).

Space complexity
If size of cache is n, algorithm takes O(n) space for hash tables and O(n) space for linked list pointers. Thus total space complexity is O(n)

Test Cases
The console app invokes following test cases. Each test case displays PASS/FAIL based on the number of expected Cache and Database hits

1.       Call FetchRecord() with invalid key
2.       Fetch records while re-fetching oldest record in the cache
3.       Fetch records while re-fetching newest record in the cache
4.       Fetch records while re-fetching a record in the middle of the cache linked list
5.       Fetch higher than cache-size number of unique records
6.       Fetch higher than cache-size number of non-unique records