0%

There is a large block of cheese with a height of h. We can assume its length and width are infinitely large. Inside this cheese, there are many spherical holes of the same radius. We can establish a spatial coordinate system within this cheese, where the bottom surface of the cheese is at z = 0 and the top surface of the cheese is at z = h.

Currently, there is a little mouse named Jerry on the bottom surface of the cheese. Jerry knows the coordinates of the centers of all the holes in the cheese. If two holes are tangent or intersect, Jerry can move from one hole to another. Specially, if a hole is tangent to or intersects the bottom surface, Jerry can enter the hole from the bottom surface of the cheese; if a hole is tangent to or intersects the top surface, Jerry can exit from the hole to the top surface of the cheese.

Jerry, located on the bottom surface of the cheese, wants to know if he can utilize the existing holes to reach the top surface of the cheese without damaging the cheese.

Each input stream contains varies of data.

In the first line, it includes an integer T which indicates the total number of input data lines. Then it followed by T lines of data, each line is formatted as n, h, r, separated by space, representing number of holes, height of cheese, and the radius of holes. The following n lines contains three integers x, y, z, separated by space, representing the coordinate of each hole.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
class Cheese {
private:
vector<int> parent;
vector<int> rank;
vector<Long64> X;
vector<Long64> Y;
vector<Long64> Z;
Long64 radius;
Long64 height;

bool isConnect(Long64 x1, Long64 y1, Long64 z1, Long64 x2, Long64 y2, Long64 z2, Long64 r) {
return pow((x1 - x2), 2) + pow((y1 - y2), 2) + pow((z1 - z2), 2) <= 4 * r * r;
}

int find(int x) {
return x == parent[x] ? x : (parent[x] = find(parent[x]));
}

void merge(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX != rootY) {
if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else {
parent[rootX] = rootY;
rank[rootY]++;
}
}
}

public:
Cheese(vector<Long64> X, vector<Long64> Y, vector<Long64> Z, Long64 radius, Long64 height) {
X = X;
Y = Y;
Z = Z;
radius = radius;
height = height;

int n = X.size();

// use n + 1 and n + 2 to represent top and bottom
parent.resize(n + 2);
rank.resize(n + 2, 1);

for (int i = 0; i < n + 2; i++) {
parent[i] = i;
}
}

bool canReach() {
int n = X.size();
for (int i = 1; i <= n; i++) {
if (Z[i] <= radius) {
merge(i, n + 1);
}
if (Z[i] + radius >= height) {
merge(i, n + 2);
}
}

for (int i = 1; i <= n; i++) {
for (int j = i + 1; j <= n; j++) {
if (isConnect(X[i], Y[i], Z[i], X[j], Y[j], Z[j], radius)) {
merge(i, j);
}
}
}

return find(n + 1) == find(n + 2);
}
};

Explanation:

This is a typical Merge-find problem. We can split the holes into two sets: those whom connected to the bottom surface and those whom connected to the top surface. If a hole tangent to or intersects the bottom surface, then merge it into the bottom set, similarly, we can merge those whom are connected to the top surface to the top set. To determine whether there are holes can form a pathway, we just need to know whether the top surface and the bottom surface are in the same set. All above can be achieved by using merge-find algorithm.

Merge-find(并查集), also known as Disjoint-set, is a data structure that efficiently handles the dynamic connectivity problem. It supports two primary operations: Union (to merge two sets) and find (to determine the representative or root of a set). This structure is especially useful in algorithms that deal with partitioning elements into disjoint subsets, such as Kruskal’s algorithm for finding the Minimum Spanning Tree.

Key Concepts

Data Structure

  • Parent Array: Each element has a pointer to its parent. For a representative element (root), the parent points to itself.
  • Rank Array (or Size Array): Keeps track of the rank (or size) of the tree rooted at a particular element. It helps in optimizing the union operation by always attaching the smaller tree under the larger one.

Operations

  1. Find:
    • Determines which subset a particular element is in.
    • Can be used to check if two elements are in the same subset.
    • Uses path compression to flatten the structure, making future operations faster.
  2. Union:
    • Merges two subsets into a single subset.
    • Uses union by rank or union by size to keep the tree shallow, improving efficiency.

Pseudocode and Explanation

Initialization

  • Parent Array: Each element is its own parent initially.
  • Rank Array: Initialize rank or size of each element as 0 or 1.

Find with Path Compression

Pseudocode:

1
2
3
4
5
6
int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}

Explanation: - Recursively traverses the parent pointers until the root is found. - Path compression flattens the tree by making every node point directly to the root, reducing the time complexity for future operations.

Union by Rank

Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void union(int x, int y) {
int rootX = find(x);
int rootY = find(y);

if (rootX != rootY) {
// union by rank
if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else {
parent[rootY] = rootX;
rank[rootX]++;
}
}
}
Explanation: - Find the roots of the elements x and y. - Attach the smaller tree under the larger tree to keep the overall structure balanced. - If both trees are of the same rank, make one the parent of the other and increase its rank.

Complete C++ Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class UnionFind {
private:
vector<int> parent;
vector<int> rank;

public:
UnionFind(int n) {
parent.resize(n);
rank.resize(n, 0);
for (int i = 0; i < n; i++) {
parent[i] = i;
}
}

int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}

void merge(int x, int y) {
int rootX = find(x);
int rootY = find(y);

if (rootX != rootY) {
if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else {
parent[rootY] = rootX;
rank[rootX]++;
}
}
}

bool connected(int x, int y) {
return find(x) == find(y);
}
};

The Knapsack Problem (背包问题) is a classic optimization problem in computer science and mathematics. It involves a set of items, each with a weight and a value, and a knapsack with a maximum weight capacity. The goal is to determine the most valuable combination of items to include in the knapsack without exceeding its weight capacity.

Variants of the Knapsack Problem

0/1 Knapsack Problem

In this version, each item can either be included in the knapsack or excluded; you can’t take a fraction of an item. Formally, given n items with weights w1, w2, ..., wn and values v1, v2, ..., vn, and a knapsack with capacity W, the problem is to maximize the total value V while ensuring that the total weight W dose not exceed W.

Mathematical formulation: Maximize: $$ V=\sum_{i=1}^n v_i x_i $$ Subject to: $$ \sum_{i=1}^n w_i x_i \le W $$ Where xi is 0 or 1, indicating whether item i is included.

Fractional Knapsack Problem

In this version, you can take fractions of items. This allow for a greedy algorithm to achieve the optimal solution by taking items in decreasing order of their value-to-weight ratio until the knapsack is full. Mathematical formulation: Maximize: $$ V=\sum_{i=1}^{n}v_i x_i $$ Subject to: $$ \sum_{i=1}^n w_i x_i \le W $$ Where $0 x_i $, indicating the fraction of item i included.

Solution Approaches

Dynamic Programming

Used primarily for the 0/1 knapsack problem. It builds up a table of solutions to subproblems to find the optimal solution.

In the context of the 0/1 kanpsack problem, the transition equation helps build the dynamic programming solution. This equation is used to determine the maximum value achieveable by either including or excluding an item based on previous solutions. Let’s delve into the specifics.

Dynamic Programming Transition Equation for 0/1 Knapsack Problem

For the 0/1 knapsack problem, the transition equation determines the maximum value dpi, w achievable using the first i items with a maximum weight capacity w. The equation can be expressed as: dpi, w = max(dpi − 1, w, dpi − 1, w − wi + vi) Where: - dpi, w is the maximum value achievable with the first i items and capacity w. - vi and wi are the value and weight of the i-th item. - dpi − 1, w represents the maximum value without including the i-th item. - dpi − 1, w − wi + vi represents the maximum value including the i-th item, given that w − wi must be non-negative.

Explanation: 1. Initialzation: - dp0, w = 0 for all w, meaning if no items are considered, the maximum value is zero regradless of the weight capacity. 2. Recurrence: - For each item i and each weight w - If the item’s weight wi is greater than w, the item cannot be included, so the maximum value remains dpi − 1, w. - If the item’s weight wi is less than or equal to w, consider two cases: - Exclude the item: The maximum value is dpi − 1, w. - Include the item: The maximum value is dpi − 1, w − wi + vi. - Take the maximum of these two cases.

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int knapsack_DP(const vector<int>& values, const vector<int>& weights, int W) {
int n = values.size();

// create a 2-D DP table to store the maximum value for each subproblem
vector<vector<int>> dp(n + 1, vector<int>(W + 1, 0));

// iterate through each item
for (int i = 1; i <= n; i++) {
for (int w = 0; w <= W; w++) {
if (weights[i - 1] <= w) {
dp[i][w] = max(dp[i - 1][w], dp[i - 1][w - weights[i - 1]] + values[i - 1]);
} else {
dp[i][w] = dp[i - 1][w];
}
}
}

return dp[n][W];
}

Greedy Algorithm

Effective for the fractional knapsack problem. It involves selecting items based on their value-to-weight ratio until knapsack is full.

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
struct Item {
int value;
int weight;
double value_per_weight;
};

bool compare(const Item& a, const Item& b) {
return a.value_per_weight > b.value_per_weight;
}

double fractionalKnapsack(const vector<int>& values, const vector<int>& weights, int W) {
int n = values.size();
vector<Item> items(n);

for (int i = 0; i < n; i++) {
items[i] = {values[i], weights[i], (double)values[i] / weights[i]};
}

sort(items.begin(), items.end(), compare);

double total_value = 0.0;
int current_weight = 0;

for (const auto& item : items) {
if (current_weight + item.weight <= W) {
current_weight += item.weight;
total_value += item.value;
} else {
int remain = W - current_weight;
total_value += item.value_per_weight + remain;
break;
}
}

return total_value;
}

What is P-Problem and NP-Problem

Before introducing the concept of P-problem and NP-problem, we need to know something about time complexity.

Some types of time complexity of a program are proper to computer, which means the time usage is affordable, or worthy. And some others types are not that time-saving! So, to categorize those algorithms, computer scientists discover that some specific pattern of time complexity are time-saving, and others are not. Those time-efficient time complexity are polynomial, which is letter P in P-problem stands for, and others that can not determine whether this problem can be solved in polynomial time, then they are nondeterministic polynomial, which is the so-called NP. Problems in P are considered efficiently solvable.

  • Time complexity like $O(1), O(n), O(\\log n), O(n^c)$ are called polynomial.
  • Time complexity like O(cn), O(n!) are called non-polynomial.

So after understanding the term polynomial means in computer algorithm efficiency, we can go on to known the defination of P-Problem and NP-Problem.

These conceptions actually are discussing the sovlability of a problem.

P Problem

If a problem can be solved in polynomial time, this problem can be deemed as a P Problem,

NP Problem

Given a problem, if any of its proposed solution can be varified within polynomial time, this problem can be defined as a nondeterministic polynomial problem.

And it is important that many problems in NP have no known polynomial-time solutions.

Relationship between P-Problem and NP-Problem

First, it is obvious that all of P-Problem are NP-Problem. However, does it still is a true statement when reversed? This is hard to say, so this question becomes the famous P = NP? problem.

NPC Problem

The NP-Complete problem is a subset of of NP problems that are as hard as any problem in NP. If any NP-complete problem can be solved in polynomial time, then every problem in NP can be solved in polynomial time.

In short, the NP-Complete problems are the hardest NP problems!

Suppose here is an NP problem, all of NP problem can be reduced to it. Then if only find the polynomial time solution of this problem, every NP problem can find a polynomial time solution by reducing to that NP problem.

And how to prove a problem is NP-Complete?

Steps to Prove a Problem is NP-Complete

  1. Show the Problem is in NP Verification in polynomial time: demonstrate that given a candidate solution, you can verify its correctness in polynomial time.
  2. Select a Known NP-Complete Problem Source problem: choose a problem already known to be NP-Complete for the reduction.
  3. Construct a Polynomial Time Reduction Transform: develop a polynomial time algorithm that converts instances of the chosen NP-Complete problem to instances of the problem you are proving NP-Complete
    1. Input transformation: map any instance of the known NP-Complete problem to an instance of your problem.
    2. Solution transformation: ensure that a solution to the transformed instance of your problem corresponds to a solution to the original NP-Complete problem.

In summary, only two steps needs to be done:

  • first show you can verify solutions in polynomial time.
  • then transform a known NP-Complete problem to your problem in polynomial time.

Common NP-Complete Problems for Reductions:

  • Satisfiability (SAT)
  • 3-SAT
  • Clique
  • Vertex Cover
  • Hamiltonian Path (Cycle)
  • Traveling Salesman Problem (TSP)

Given an lexicographical order array words with size of n, the strings among words are all writen in lower case letters. And given a specific string prefix with a length of m, count the number of strings in words that have a prefix of prefix.

Example:

1
2
Input: ["aaa", "ba", "bac", "bat", "batbat", "batq", "bta", "c"];
Output: 3;

Constraints:

  • The time complexity should be O(mlog n)

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class CountPrefix{
private:
string prefix;
vector<string> words;

bool hasPrefix(string& word, string& pref) {
if (word.size() < pref.size()) return false;
for (int i = 0; i < pref.size(); i++) {
if (word[i] != pref[i])
return false;
}
return true;
}

int findFirst(int left, int right) {
int low = left, high = right, label = -1;
while (low <= high) {
int mid = (low + high) / 2;
if (hasPrefix(words[mid], prefix)) {
label = mid, high = mid - 1;
} else if (prefix[0] < words[mid][0]) {
high = mid - 1;
} else {
low = mid + 1;
}
}
return label;
}

int findLast(int left, int right) {
int low = left, high = right, label = -1;
while (low <= high) {
int mid = (low + high) / 2;
if (hasPrefix(words[mid], prefix)) {
label = mid, low = mid + 1;
} else if (prefix[0] < words[mid][0]) {
high = mid - 1;
} else {
low = mid + 1;
}

}
return label;
}

public:
CountPrefix(vector<string> words, string prefix) : words(words), prefix(prefix) {}

int count() {
int from = findFirst(0, words.size() - 1);
int to = findLast(0, words.size() - 1);

if (from < 0) return 0;
else return to - from + 1;
}
};

Explanation:

The main idea of this algorithm is using binary search to find the boundary of words which contain the specific prefix. The function findFirst denotes the position where first word has the prefix and function findLast denotes the last one. The only difference between findFirst and findLast is that when hit a proper case, findFirst continues to search the section before but the findLast search the section behind. By doing this in repetition, the first one and the last one will be found.

About the time complexity, the time complexity of function hasPrefix is O(m). Suppose the time complexity of function findFirst and findLast is T(n), then $$ T(n)=T(\frac{n}{2}) + O(m), T(1) \in O(m), $$ which means T(n) ∈ O(mlog n). Thus the time complexity of countPrefix is O(mlog n).

Given an integer array coins representing coins of different denominations and integer amount representing a total amount of money.

Return the fewest number of coins that you need to make up that amount. If that amount of money cannot be made up by any combination of coins, return −1.

Example:

1
2
3
4
Input: coins = [1, 2, 5], amount = 11;
Output: 3;
Input: coins = [2], amount = 3;
Output: -1;

Constraints:

  • 1 ≤ lcoins ≤ 12
  • 1 ≤ coinsi ≤ 231 − 1
  • 0 ≤ amount ≤ 104

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class CoinChange{
public:
int coinChange(vector<int>& coins, int amount) {
int *dp = new int[amount + 1];
fill_n(dp, amount + 1, amount + 1);
dp[0] = 0;
for (int i = 1; i <= amount; i++) {
for (int j = 0; j < coins.size(); j++) {
if (coins[j] <= i)
dp[i] = dp[i - coins[j]] + 1 < dp[i] ? dp[i - coins[j]] + 1 : dp[i];
}
}
return dp[amount] > amount ? -1 : dp[amount];
}
}

Explanation:

dpi represents the minimum number of coins that makes up the amount i. Assume that the values from dp0 to dpi − 1 are already known, thus the transition equation of dpi can be deduced as dpi = minj = 0lcoins − 1dpi − coinsj + 1 which means the value of dpi is determined by dpi − coinsj. If dpi can transfer from dpi − coinsj via coinsj, the number of coins that used should add 1.

什么是个人图床

简单举例,当在使用markdown或者HTML等文本框架时,插入图片(或其他媒体资源)通常使用两种插入方式:

  • 通过本地路径获取;一般通过指定图片在本机或服务器上的路径(一般是绝对路径或者相对路径)来获取图片,图片信息保存在个人媒介上。
  • 通过网络URL获取;一般通过指定图片在其他云服务提供商上的URL路径来实时加载图片资源。这个用来保存图片的云服务一般被称为图床

而在写个人博客时,通常希望博文中可以插入一些图片来方便撰写。除了直接在markdown中插入本机图片,使用个人图床也是一个较好的选择,尤其适合我这种访问量低也不会上传大量图片资源的博主。

Telegraph-image

Telegraph-image相当于一个Flickr/imgur的平替,通过在Cloudfare Pages中部署该Function可以起到图片上传和引用等寄存服务。

Telegraph-image的github仓库:Telegraph-Image

该仓库的README中也介绍了如何部署该服务,这里只是简单的重复一下。

注册Cloudfare

前往Cloudflare(这里使用的国区代理)注册即可。(建议)直接使用个人邮箱注册即可,注册成功后邮箱会收到一封验证邮件,点击验证邮件即可。

Fork仓库

将Telegraph-image的仓库Fork到自己属下。Fork都是用默认配置,只Fork main分支即可。另一种方法是将该仓库clone到本地再部署,这样比较麻烦故不推荐。

部署Telegraph-image

点击Cloudfare侧边栏的Works & Pages栏

Cloudfare部署:前往Pages

前往Pages界面然后点击”连接到Git“:

链接Git

之后点击添加Github账户,添加自己的Github账户。添加成功后选择Fork过来的Telegraph-Image仓库来部署:

选择Github仓库开始部署

然后点击右下方的开始设置按钮。进入下一个页面后可以更改项目名称,项目分支仍然是main分支,其余配置无需改动(都是空即可)。这一页的内容完成后点击保存并部署即可开始部署。

此时网页内会有仿命令行窗口来提示部署进度,最后直至完成。

点击访问站点

代理生成之后会出现访问站点的链接,点击该链接即可前往个人图床网站。在图床网站可以选择文件上传或者直接ctrl-v粘贴,拖拽上传图片。上传图片成功之后会在下方自动生成一个图片链接,复制该链接至需要用到的地方即可(如Markdown,本博客的图片均来自个人图床的链接)。

个人图床界面

到这里一个简单的个人图床就大功告成啦。后续添加图像上传验证Token之类也会慢慢更新,尝试发掘出更多的拓展功能。

Introduction

SHA-256 encryption algorithm is a division from the SHA-2 group, a.k.a. Secure Hash Algorithm 2. It can be known from the name that, the SHA is a type of hash function. This encryption algorithm encode the input message by hashing to encrypt and compress. The result of SHA encryption usually contains random alphanumeric characters.

Explanations of SHA-256 Algorithm

Before explicitly explain the detailed algorithm of SHA-256, it is essential to figure out what should be done in advance. Three sections can be extracted: the constants that are used in the algorithm, how to preprocess the data, and the logical operations that are used. After understanding these, the algorithm itself is easy to handle.

Before Dive into SHA-256

Constants

In the SHA-256 algorithm, 8 hash initial values and 64 hash constants are used. First, the eight initial values are as follow:

1
2
3
4
uint32_t H[8] = {
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};

These numbers are selected from the first 32 bits of the decimal part of the square roots of the first eight natural prime numbers.

And the 64 hash constants are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static const uint32_t K[64] = {
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

Similar to before, these 64 constants are selected from the first 32 bits of the cube roots of the first 64 natural prime numbers.

Data Preprocessing

The preprocessing stage of the SHA-256 algorithm has two steps:

  • Append ‘1’ followed by enough ’0’s at the end of the data so that the length of the message modulo 512 equals 448.
  • Then append 64 bits of message length information at the end with big endian.

Notice that: albeit current length modulo 512 equals 448, the append operation is still necessary!

For example, now here’s data:

1
01100001 01100010 01100011

Then we need to append a ‘1’ and 423 ’0’s:

1
2
01100001 01100010 01100011 1000000
00000000 ...... 0000000

Finally, the length of the original message should be appended to the end of current data by big endian.

Logical Operations

All of logical operations that involved with SHA-256 algorithm are as follow: Ch(x, y, z) = (x ∧ y) ⊕ (⌝x ∧ z) Maj(x, y, z) = (x ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z) Σ0(x) = SR2(x) ⊕ SR13(x) ⊕ SR22(x) Σ1(x) = SR6(x) ⊕ SR11(x) ⊕ SR25(x) σ0(x) = SR7(x) ⊕ SR18 ⊕ R3(x) σ1(x) = SR17(x) ⊕ SR19(x) ⊕ R10(x)

Explanations:

  • : and.
  • x: not.
  • : xor.
  • SRn: Cycled right shift.
  • Rn: Right shift.

Abstraction

How here comes to ultimately main part of the SHA-256 algorithm! All the details related to the algorithm will be elaborate here.

The first things is: break message into 512-bit chunks.

Suppose that the message m can be splited into n chunks, so the whole algorithm needs to iterate for n times in order to get a 256-bit hash result. Initially, a 256-bit initial value H0 is given, then operate with data chunk 1 and get H1. Then operate with data chunk 2 and get H2. Repeat above step until all data chunks are operated.

The pseudo-code are:

1
2
3
4
5
data_chunks = divide_by_512_bit(message)
handle = H0
for (chunk in data_chunks):
handle = operate(handle, chunk)
result = handle

Among each operation, each data chunk with 512-bit chunk are divided into 16 words, each word contains 32 bits and encoded with big endian, as W0, ..., W15. Which means the first sixteen words can be directly derived from the ith chunk data. The rest words are derived through following equation:

Wt = σ1(Wt − 2) + Wt − 7 + σ0(Wt − 15) + Wt − 16

Then, iterate by 8 words for 64 times follow these rules:

A = H + Σ1(E) + Ch(E, F, G) + Ki + Wi + Maj(A, B, C) + Σ0(A) B = A C = B D = C E = D + H + Σ1(E) + Ch(E, F, G) + Wi + Ki F = E G = F H = G Notice that: the initial value of word A-H is H0-H7. And Ki is the ith value of the initial hash value.

Code Implementation

This notes is according to MIT’s course: missing-semester: Editors(Vim)

Vim

Vim is a command line based code editor, designed for programmers and designed by programmers. And other visual editors like VS Code can also integrate vim mode for advanced coding experience.

The core philosophy of vim editor is command, all operations are done by typing keys, using without mouse. After being familiar with these key settings, the speed of programming can be extremely improved.

Modes of vim

Vim contains several operating modes for different using:

  • Normal: navigating, moving, and simple editing. Press ESC for back to normal mode.
  • Insert: inserting text. Press i for switching to insert mode.
  • Replace: replacing text. Press shift-r for switching to replace mode.
  • Visual:
    • Visual-plain: selecting text by cursor movement. Press v.
    • Visual-line: selecting text by line. Press shift-v.
    • Visual-block: selecting text by block. Press ctrl-v.
  • Command-line: running command. Press shift-;.

Command-line

By pressing : to get into command-line mode, where can send command to operating thw vim in a whole.

  • :q | quit (or close current window).
  • :w | write and save.
  • :wq | save and quit.
  • :e {filename} | open file and edit.
  • :ls | show open buffers.
  • :help {topic} | open help
    • :help :w | help info for command :w.
    • :help w | help info for command w.

Cursor movement

  • Basic movement: hjkl | left, down, up, right.
  • Move by word: w, b, e | next word, beginning of word, end of word.
  • Move by line: 0, ^, $ | beginning of line, first non-blank character, end of line.
  • Move by screen: H, M, L | top of screen, middle of screen, bottom of screen.
  • Scroll: ctrl-u, ctrl-d | scroll up, scroll down.
  • Move by file: gg, G | beginning of file, end of file.
  • Misc: % | corresponding item.
  • Find: f{char}, t{char}, F{char}, T{char} | findforwardcharacter on the current line.
  • Search: /{regex}, n, N | navigating match items.

Edits

  • i | enter insert mode.
  • o, O | insert line below, insert line above.
  • d{motion} | delete by motion command.
    • dw | delete word.
    • d$ | delete to end of line.
    • d0 | delete to beginning of line.
  • c{motion} | change by motion command.
  • x | delete character.
  • s | substitute character.
  • u | undo.
  • ctrl-r | redo.
  • y | copy
  • p | paste.

Counts

By using command pattern {number}{command} can do this command by times.

  • 3w move 3 words forward.
  • 5j move 5 lines down.
  • 7dw delete 7 words.

Modifiers

While using delete or change command, add i or a can specify the command is done inside or around.

  • ci( | change content inside current pair of parentheses.
  • ci[ | change content inside current pair of square brackets.
  • da' | delete a single-quoted string, including the surrounding single quotes.

Attention is All You Need is a paper published in 2017 by Google Brain team. In this paper, the prominent self-attention mechanism and multi-headed self-attention mechanism are proposed to enhance the general capacity of neural networks. Meanwhile, a basic neural network architecture Transformer is proposed to adopt such mechanism and its been tested on translation task.

What is Attention?

The attention mentioned in neural network is similar with the one we mentioned about human intelligence: an ability to capture information which seems to be more important (subjectively). The “attention mechanism” in human congnition is prominently manifested across various sensory perceptions. For instance, when using our eyes, certain features such as bright colors, rapid movements, or familiar faces tend to capture our gaze. Similarly, during reading, specific names, events, or elegant phrases can leave a lasting impression. This attention mechanism also extends to other senses like smell, touch, and hearing. Furthermore, the stream of consciousness during thought processes is often influenced by particularly striking information, all of which illustrate the attention mechanism in human cognition.

In the context of a nerual network, what does the attention mechanism refer to? For example, when we want a neural network to process an image, how does it determine which elements of the image are more important. Consider objects like people, pets, or cars in an image–how does the network decide which one is more noteworthy in the image’s “context”? The attention mechanism addresses this problem. When processing an image, the neural network assigns higher weights, or “attention scores”, to what it deems more important information. During subsequent forwarding process, the high-weight information is emphasized, enabling the network to capture more detailed and deeper features. This, in turn, aids in handling the semantic importance of the image. Besides image processing, the mechanism is applicable to text, audio, graph, and other data types. Since data in neural networks exists as discrete tensors, the fundamental principles of calculating attention scores remain consistent across different data types.

Self-Attention Mechanism

The so-called self-attention mechanism refers to a process where the attention scores are computed by comparing the elements within the sequence itself. In other words, the reference for calculating attention scores is the sequence as a whole. For a tensor sequence input into a neural network, the self-attention mechanism requires computing the attention scores between each element and every other element in the sequence. The basic process can be represented in pseudocode as follows:

1
2
3
4
5
for (element_x : input_sequence) {
for (element_y : input_sequence) {
attention_score[x][y] = compute_score(element_x, element_y);
}
}

However, for deep learning tasks, using to nested loops to compute values is highly time-consuming, especially when the input sequence length can be in the tens of thousands. Therefore, researchers use vectorized operations to calculate all pairwise attention scores efficiently. By using vectorized multiplication, we can obtain all attention scores with a single dot product operation between the input tensor sequence and its transposed version.

$$ X= \left[ \begin{matrix} x_1, ..., x_n \end{matrix} \right] $$

$$ X^T= \left[ \begin{matrix} x_1 \\ \vdots \\ x_n \\ \end{matrix} \right] $$

$$ AttentionScore = X \cdot X^T = \left[ \begin{matrix} x_1 \cdot x_1 & ... & x_1 \cdot x_n\\ \vdots & & \vdots \\ x_n \cdot x_1 & ... & x_n \cdot x_n\\ \end{matrix} \right] $$

Through this matrix multiplication operation, combined with parallel computing technology and the parallel processing capabilities of GPU devices, the efficiency of this computation can be greatly enhanced. To ensure that the attention scores are not fixed and be gradually optimized during the training process, we use parameter matrices to adjust the calculation of attention scores. This enable the model to truely capture the information that deserves attention. Thus, the equation for computing attention score can be expressed as follows: $$ Attention(Q,K,V) = softmax(\frac{QK^T}{\sqrt{d_k}})V $$ Here, $ Q, K, V $ refer to each element in the input sequence being multiplied by three shared weighted matrices $ W_q, W_k $ and $ W_v $. The data multiplied by $ W_q $ represents the element that will compute attention scores with other elements in the sequence. The data multiplied by $ W_k $ represents the data being compared. After comparing each element with every other element, the resulting sequence is the attention score sequence for that element. This sequence is processed through the softmax function and then multiplied by $ W_v $ to perform a weighted sum. The entire process essentially calculates the correlations between all elements and the performs a weighted sum, with the result being the output of the attention mechanism. Through this process, the information that is considered more important is multiplied by greater weights, as a result their “information” can be magnified in the subsequent processes.

Transformer Model

The Transformer model adopts an encoder-decoder architecture which is a popular architecture in current deep learning field. Two independent models are separatedly used as an encoder and a decoder. For most cases, they are used to process different types of data. For example, in image captioning, encoder is used for process image signal and decoder is for generating captions. In this paper, the Transformer is tested on English-German translation task, so, unprecisely, you can image the encoder is trying to “understand” a sentence in English and then generate some kind of metaphysical or abstract information, these information contains the concepts, subjects and objectives that are “mentioned” in the original language. Then, these information is passed to decoder, and the task of decode is to decrypt these information into German.

Importantly, the comparison I mentioned above is extremely unprecise, this model cannot translate as human do, in fact it’s still a prediction or simulation, but not understanding.