Sorting ========================= Comparison-based sorting algorithm implementations. All functions sort **in-place** and return the modified array (except :func:`~core_algorithms.sorting.merge_sort.merge_sort`, which returns a new array). +------------------+-------------+-------------+-------------+-----------+ | Algorithm | Best | Average | Worst | Space | +==================+=============+=============+=============+===========+ | Bubble Sort | O(n) | O(n²) | O(n²) | O(1) | +------------------+-------------+-------------+-------------+-----------+ | Insertion Sort | O(n) | O(n²) | O(n²) | O(1) | +------------------+-------------+-------------+-------------+-----------+ | Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | +------------------+-------------+-------------+-------------+-----------+ | Quick Sort | O(n log n) | O(n log n) | O(n²) | O(log n) | +------------------+-------------+-------------+-------------+-----------+ | Selection Sort | O(n²) | O(n²) | O(n²) | O(1) | +------------------+-------------+-------------+-------------+-----------+ Performance Notes ------------------------------------------------------------------------------- All implementations are **pure Python** and serve educational and algorithmic understanding purposes, they are not intended for production sorting. Python's built-in :func:`sorted` and :meth:`list.sort` both use **Timsort**, a hybrid algorithm implemented in C that is highly optimised and adaptive. Benchmarks confirm the gap is substantial: * **Random data**: built-ins are ~7–10x faster than ``quick_sort`` and ~500–1,500x faster than ``bubble_sort`` at 5,000 elements. * **Sorted data**: the gap widens dramatically — ``selection_sort`` is ~12,000x slower than ``list.sort()`` at 5,000 elements because it has no adaptive behaviour. * **``quick_sort``** is the closest custom implementation on random data, but degrades catastrophically on sorted or reversed input (``RecursionError``) due to the first-element pivot choice. * **``bubble_sort``** and **``insertion_sort``** are the only custom implementations that benefit from nearly-sorted data via their early-exit and shift-optimisation respectively. The interpreter overhead alone makes pure-Python sorts uncompetitive with the built-ins. Use the built-ins in production; use these implementations to study algorithm behaviour and complexity trade-offs. Run the benchmark yourself:: python tests/functional/benchmark_sorting.py ------------------------------------------------------------------------------- Bubble Sort **************************************************** Repeatedly compares adjacent elements and swaps them if they are out of order, "bubbling" the largest unsorted element to its correct position on each pass. An early-exit flag skips remaining passes once the array is already sorted. **Pros** * Extremely simple to understand and implement. * Adaptive: best-case O(n) when the array is already sorted (early-exit flag). * Stable: equal elements preserve their original relative order. * In-place: requires O(1) extra memory. **Cons** * O(n²) average and worst-case time — impractical for large datasets. * High number of swaps compared to Selection Sort on the same input. **When to use** * Tiny arrays (< ~20 elements) where simplicity matters more than speed. * Nearly-sorted data where the early-exit optimisation kicks in quickly. * Educational contexts or situations where a stable, zero-dependency sort is needed. .. automodule:: core_algorithms.sorting.bubble_sort :members: :undoc-members: :show-inheritance: ------------------------------------------------------------------------------- Insertion Sort **************************************************** Builds the sorted array one element at a time by picking each new element and shifting larger elements one position right until the correct insertion point is found. Conceptually similar to sorting a hand of playing cards. **Pros** * Simple and intuitive implementation. * Adaptive: O(n) on nearly-sorted data. * Stable: preserves the relative order of equal elements. * In-place: O(1) extra memory. * Low overhead — outperforms merge sort and quick sort on very small arrays. **Cons** * O(n²) average and worst-case time. * Many element shifts on reverse-sorted input. **When to use** * Small arrays (typically < 20–50 elements). * Streaming / online scenarios where elements arrive one at a time. * As the base case inside hybrid algorithms (e.g. Timsort uses it for small runs). * When a stable, in-place sort with minimal overhead is required. .. automodule:: core_algorithms.sorting.insertion_sort :members: :undoc-members: :show-inheritance: ------------------------------------------------------------------------------- Merge Sort **************************************************** Divide-and-conquer algorithm that recursively splits the array in half, sorts each half, then merges them back together in sorted order. The merge step compares elements from both halves and places them in the correct position in the result array. **Pros** * Guaranteed O(n log n) in all cases — no worst-case degradation. * Stable: equal elements preserve their original relative order. * Predictable and consistent performance regardless of input distribution. * Well-suited for linked lists and external sorting (e.g. sorting files larger than RAM). **Cons** * Requires O(n) extra memory for the merge step. * Higher constant factor than quick sort in practice due to memory allocation. * Not in-place (standard implementation). **When to use** * Large datasets where a guaranteed O(n log n) worst case is required. * When stability is important (e.g. sorting records by a secondary key). * External sorting (data that does not fit in memory). * Sorting linked lists, where random access is not available. .. automodule:: core_algorithms.sorting.merge_sort :members: :undoc-members: :show-inheritance: ------------------------------------------------------------------------------- Quick Sort **************************************************** Divide-and-conquer algorithm that selects a pivot, partitions the array into elements ≤ pivot and elements > pivot, then recursively sorts each partition in place. This implementation uses the first element as pivot and supports optional ``begin``/``end`` parameters for sorting subarrays. **Pros** * O(n log n) average-case time with small constants — fastest in practice for most random inputs. * In-place: O(log n) stack space (recursion depth). * Cache-friendly due to sequential memory access during partitioning. * Highly optimisable (pivot strategies, three-way partitioning, etc.). **Cons** * O(n²) worst case on already-sorted or reverse-sorted input with a naive pivot choice. * Not stable: equal elements may change relative order. * Recursive implementation can hit stack limits for very large arrays. **When to use** * General-purpose sorting of large, randomly-distributed datasets. * When average-case performance and cache efficiency are the priority. * Situations where in-place sorting is required and stability is not needed. * As the default choice when no strong reason exists to prefer another algorithm. .. automodule:: core_algorithms.sorting.quick_sort :members: :undoc-members: :show-inheritance: ------------------------------------------------------------------------------- Selection Sort **************************************************** Divides the array into a sorted prefix and an unsorted suffix. On each pass, it scans the unsorted suffix to find the minimum element and swaps it into the next position of the sorted prefix, extending it by one. **Pros** * Simple to implement and easy to reason about. * Minimises the number of swaps: exactly O(n) swaps regardless of input — useful when write cost is high. * In-place: O(1) extra memory. **Cons** * O(n²) in all cases — no early-exit, no adaptive behaviour. * Not stable: swapping non-adjacent elements can disturb equal elements' order. * Generally slower than Insertion Sort on nearly-sorted data. **When to use** * When the cost of writing/swapping is significantly higher than the cost of reading (e.g. flash memory with limited write cycles). * Very small arrays where simplicity outweighs performance. * Situations where a fixed, predictable number of swaps is required. .. automodule:: core_algorithms.sorting.selection_sort :members: :undoc-members: :show-inheritance: