# C brush refers to the minimum K number of offer |

2020-11-10 14:23:18 【C# Brush problem author  / Edison Zhou

This is a EdisonTalk Of the 299 Learn to share

Let's use the data structure knowledge we learned before to brush 《 The finger of the sword Offer》 Some of the core topics of （ Selected among them 30+ questions ）, I hope it helps you ！ The title of this paper is ： The smallest k Number .

1 Topic introduction

subject ： Input n It's an integer , Find out the smallest k Number . For example, the input 4、5、1、6、2、7、3、8 this 8 A digital , The smallest 4 The number is 1、2、3、4.

This problem is typical TopK problem , The simplest way of thinking is to put the input n An integer sort , At the top of the list k The number is the smallest k Number . The time complexity of this idea is O(nlogn), But the interviewer will ask that the time complexity be kept at O(n).

2 How to solve problems and realize

Ideas 1： Need to modify the data source O(n) solution

Based on Partition Function to solve this problem . If based on the first k A number to adjust , Make Biddy k All numbers smaller are on the left side of the array , Biti k All numbers larger than are on the right side of the array . After this adjustment , On the left of the array k One number is the smallest k A digital （ this k Numbers don't have to be sorted ）.

But, There are limits to adopting this approach . We need to modify the input array , Because of the function Partition Will adjust the order of the numbers in the array .

Ideas 2： Suitable for processing massive data O(nlogk) solution

You can first create a size of k To store the smallest data container k A digital , Next, every time we input n Read in a number of integers .

If the number in the container is less than k individual , Put the integer read in this time directly into the container ;

If the container already has k A number , That is, the container is full , At this point, we can't insert new numbers, but we can only replace the existing ones .

Find out what you already have k Maximum number , Then compare the integer to be inserted with the maximum value . If the value to be inserted is smaller than the current maximum , Replace the current maximum value with this number ; If the value to be inserted is larger than the current maximum , So this number can't be the smallest k One of the integers , So we can get rid of this integer .

So when the container is full , We have to do it. 3 thing ： One is in k Find the largest number of integers ; Second, it is possible to delete the maximum number in this container ; Third, it is possible to insert a new number . If you use a binary tree to implement this data container , So we can be in O(logk) Time to achieve these three steps of operation . So for n In terms of input numbers , The total time efficiency is O(nlogk).

Specific code implementation ：

Follow the above steps , Here the C# The implementation code is as follows ：（ The red black tree structure is used as the container , Of course, you can also use the heap to achieve , You can read more about the red and black trees yangecnu Of 《 On the algorithm and data structure of the red black tree 》） ``````public static void GetLeastNumbersByRedBlackTree(List<int> data, SortedDictionary<int, int> leastNumbers, int k)
{
leastNumbers.Clear();

if (k < 1 || data.Count < k)
{
return;
}

for (int i = 0; i < data.Count; i++)
{
int num = data[i];
if (leastNumbers.Count < k)
{
}
else
{
int greastNum = leastNumbers.ElementAt(leastNumbers.Count - 1).Value;
if (num < greastNum)
{
leastNumbers.Remove(greastNum);
}
}
}
}
``````

This solution is a little slower , But it has two obvious advantages ：

The input data is not modified （ Variables in the code data）. We just start from data Read in the numbers , All writes are in containers leastNumbers In the .

Second, the algorithm is suitable for massive data input （ Many companies, including Baidu, like the problems related to massive input data ）.

Suppose the topic is to find the smallest from the vast amount of data k A digital , Because the size of memory is limited , It may not be possible to load all these massive data into memory at one time . This is the time , We can get from the secondary storage space （ For example, hard disk ） One number at a time , according to GetLeastNumbers To determine whether it needs to be put into the container leastNumbers that will do .

This idea only requires memory to hold leastNumbers that will do , So it's best suited for situations where n Big and k Smaller problems .

3 unit testing

Test aided method encapsulation ：

``````public static void TestPortal(string testName, int[] data, int[] expected, int k)
{
if (!string.IsNullOrEmpty(testName))
{
Console.WriteLine("{0} begins:", testName);
}

Console.WriteLine("Result for solution:");
if (expected != null)
{
Console.WriteLine("Expected result:");
for (int i = 0; i < expected.Length; i++)
{
Console.Write("{0}\t", expected[i]);
}
Console.WriteLine();
}

if(data == null)
{
return;
}

List<int> dataList = new List<int>();
for (int i = 0; i < data.Length; i++)
{
}
SortedDictionary<int, int> leastNumbers = new SortedDictionary<int, int>();
GetLeastNumbersByRedBlackTree(dataList, leastNumbers, k);
Console.WriteLine("Actual result:");
foreach (var item in leastNumbers)
{
Console.Write("{0}\t", item.Value);
}
Console.WriteLine("\n");
}

``````

Unit Test Case ：

``````// k Less than the length of the array
public static void Test1()
{
int[] data = { 4, 5, 1, 6, 2, 7, 3, 8 };
int[] expected = { 1, 2, 3, 4 };
TestPortal("Test1", data, expected, expected.Length);
}

// k Equal to the length of the array
public static void Test2()
{
int[] data = { 4, 5, 1, 6, 2, 7, 3, 8 };
int[] expected = { 1, 2, 3, 4, 5, 6, 7, 8 };
TestPortal("Test2", data, expected, expected.Length);
}

// k Greater than the length of the array
public static void Test3()
{
int[] data = { 4, 5, 1, 6, 2, 7, 3, 8 };
int[] expected = null;
TestPortal("Test3", data, expected, 10);
}

// k be equal to 1
public static void Test4()
{
int[] data = { 4, 5, 1, 6, 2, 7, 3, 8 };
int[] expected = { 1 };
TestPortal("Test4", data, expected, expected.Length);
}

// k be equal to 0
public static void Test5()
{
int[] data = { 4, 5, 1, 6, 2, 7, 3, 8 };
int[] expected = null;
TestPortal("Test5", data, expected, 0);
}

//  There are the same numbers in the array
public static void Test6()
{
int[] data = { 4, 5, 1, 6, 2, 7, 2, 8 };
int[] expected = { 1, 2 };
TestPortal("Test6", data, expected, expected.Length);
}

//  Enter a null pointer
public static void Test7()
{
TestPortal("Test7", null, null, 0);
}
``````

test result ： 4 Distributed computing

Hadoop MapReduce It's a software framework , Based on this framework, it is easy to write applications , These applications can run on large clusters of thousands of commercial machines , And with a reliable , To handle in parallel in a fault-tolerant manner TB Level of massive data sets . therefore , about MapReduce, It can be simply said that , It's a software framework , Massive data is its “ food ”, It runs in parallel on large-scale clusters in a reliable and fault-tolerant manner “ Cook this dish ”.

Use MapReduce solve TopK problem

Here we use a randomly generated 100 Ten thousand numbers of files , That is to say, all we have to do is 100 Find the largest number in ten thousand 100 A digital .

（1）map Method

``````public static class MyMapper extends
Mapper<LongWritable, Text, NullWritable, LongWritable> {
public static final int K = 100;
private TreeMap<Long, Long> tm = new TreeMap<Long, Long>();

protected void map(
LongWritable key,
Text value,
Mapper<LongWritable, Text, NullWritable, LongWritable>.Context context)
throws java.io.IOException, InterruptedException {
try {
long temp = Long.parseLong(value.toString().trim());
tm.put(temp, temp);
if (tm.size() > K) {
tm.remove(tm.firstKey());
//  If it is to ask for topk The smallest, then use the following statement
//tm.remove(tm.lastKey());
}
} catch (Exception e) {
context.getCounter("TopK", "errorLog").increment(1L);
}
};

protected void cleanup(
throws java.io.IOException, InterruptedException {
for (Long num : tm.values()) {
context.write(NullWritable.get(), new LongWritable(num));
}
};
}
``````

among , It's used here java The data structure corresponding to the red black tree in TreeMap class ,cleanup() The method is in map Methods that are not executed until the end of the method , Here we are going to map Before the task 100 Data in reduce Tasks ;

（2）reduce Method

``````public static class MyReducer extends
Reducer<NullWritable, LongWritable, NullWritable, LongWritable> {
public static final int K = 100;
private TreeMap<Long, Long> tm = new TreeMap<Long, Long>();

protected void reduce(
NullWritable key,
java.lang.Iterable<LongWritable> values,
Reducer<NullWritable, LongWritable, NullWritable, LongWritable>.Context context)
throws java.io.IOException, InterruptedException {
for (LongWritable num : values) {
tm.put(num.get(), num.get());
if (tm.size() > K) {
tm.remove(tm.firstKey());
//  If it is to ask for topk The smallest, then use the following statement
//tm.remove(tm.lastKey());
}
}
//  In descending order, i.e. from large to small Key aggregate
for (Long value : tm.descendingKeySet()) {
context.write(NullWritable.get(), new LongWritable(value));
}
};
}
``````

stay reduce In the method , In turn map The data passed in the method is put into TreeMap in , And rely on the balance of red and black to maintain the order of the data .

（3） Realization effect ： Limited image size , Only the front... Is shown here 12 individual ; Although the example is very simple , The business is also very simple , But we introduced the idea of Distributed Computing , take MapReduce It is applied to the maximum value problem , It's an improvement .

Ref Reference material

He Haitao ,《 The finger of the sword Offer》

???? Sweep yards attention EdisonTalk Set to star , No more lost contact ！

A collection of past tweets ：2020 A collection of tweets in the first half of