This is Amazing! Will share to others.

#1
by yzyw0702 - opened

πŸ˜ƒ: Could you use Python to implement LRU algorithm and give me an example to test?
πŸ€–: Sure! Below is a simple implementation of the Least Recently Used (LRU) cache using Python. This implementation uses a combination of a dictionary and a doubly linked list to achieve efficient get and put operations.

Here's the code:

class Node:
    def __init__(self, key=None, value=None):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = {}  # Dictionary to store key-node pairs
        self.head = Node()  # Dummy head node
        self.tail = Node()  # Dummy tail node
        self.head.next = self.tail
        self.tail.prev = self.head

    def _add_node(self, node: Node):
        """Add a node right after the head."""
        node.prev = self.head
        node.next = self.head.next
        self.head.next.prev = node
        self.head.next = node

    def _remove_node(self, node: Node):
        """Remove a node from its current position."""
        prev = node.prev
        new = node.next
        prev.next = new
        new.prev = prev

    def _move_to_head(self, node: Node):
        """Move a node to the head."""
        self._remove_node(node)
        self._add_node(node)

    def _pop_tail(self) -> Node:
        """Pop the last node (least recently used)."""
        res = self.tail.prev
        self._remove_node(res)
        return res

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        node = self.cache[key]
        self._move_to_head(node)
        return node.value

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            node = self.cache[key]
            node.value = value
            self._move_to_head(node)
        else:
            if len(self.cache) >= self.capacity:
                tail = self._pop_tail()
                del self.cache[tail.key]
            new_node = Node(key, value)
            self.cache[key] = new_node
            self._add_node(new_node)

# Example usage
if __name__ == "__main__":
    lru_cache = LRUCache(2)

    lru_cache.put(1, 1)
    print(lru_cache.get(1))  # returns 1

    lru_cache.put(2, 2)
    print(lru_cache.get(2))  # returns 2

    lru_cache.put(3, 3)
    print(lru_cache.get(1))  # returns -1 (not found)

    lru_cache.put(4, 4)
    print(lru_cache.get(2))  # returns -1 (not found)

Explanation:

  1. Node Class: Represents each element in the cache with a key, value, and pointers to the previous and next nodes.
  2. LRUCache Class:
    • capacity: The maximum number of items the cache can hold.
    • cache: A dictionary that maps keys to nodes.
    • head and tail: Dummy nodes to simplify edge cases when adding or removing nodes.
    • _add_node, _remove_node, _move_to_head, _pop_tail: Helper methods to manage the linked list.
    • get: Retrieves a value by key. If the key exists, it moves the corresponding node to the head and returns the value. Otherwise, it returns -1.
    • put: Inserts a new key-value pair. If the key already exists, it updates the value and moves the node to the head. If the cache exceeds its capacity, it removes the least recently used item (the node just before the tail).

Example Usage:

  • The example demonstrates basic operations like put and get. It also shows how the LRU policy works when the cache reaches its capacity and new items are added.

πŸ˜ƒ: Imagine we got 1 million inputs to add into the LRU cache, can we modify the implementation to further increase its performance?
πŸ€–: When dealing with large datasets such as 1 million inputs, performance optimization becomes crucial. Here are some strategies to improve the performance of the LRU Cache implementation:

  1. Hash Table for O(1) Access: Ensure that accessing elements through the hash table remains constant time.
  2. Efficient Linked List Operations: Minimize the overhead of moving nodes around in the linked list.
  3. Memory Management: Efficiently manage memory allocation and deallocation.

Below is an optimized version of the LRU Cache implementation in Python:

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity: int):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)  # Move the accessed key to the end
        return self.cache[key]

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self.cache.move_to_end(key)  # Move the updated key to the end
        elif len(self.cache) >= self.capacity:
            self.cache.popitem(last=False)  # Remove the first item (least recently used)
        self.cache[key] = value

# Example usage
if __name__ == "__main__":
    lru_cache = LRUCache(2)

    lru_cache.put(1, 1)
    print(lru_cache.get(1))  # returns 1

    lru_cache.put(2, 2)
    print(lru_cache.get(2))  # returns 2

    lru_cache.put(3, 3)
    print(lru_cache.get(1))  # returns -1 (not found)

    lru_cache.put(4, 4)
    print(lru_cache.get(2))  # returns -1 (not found)

Explanation:

  1. OrderedDict: Using OrderedDict from the collections module provides both fast lookups and efficient reordering of elements based on their access order. This simplifies our implementation significantly.
  2. get Method: When retrieving a value, we move the accessed key to the end of the ordered dictionary to mark it as recently used.
  3. put Method:
    • If the key already exists, we update its value and move it to the end.
    • If the cache is at full capacity, we remove the first item (least recently used) before inserting the new key-value pair.

Performance Considerations:

  • Time Complexity: Both get and put operations now run in (O(1)) average time complexity due to the properties of OrderedDict.
  • Space Complexity: The space complexity remains (O(capacity)), which is efficient for handling large datasets.

This implementation should perform well even when processing 1 million inputs efficiently.

Sign up or log in to comment