There is a large block of cheese with a height of h. We can assume its length and
width are infinitely large. Inside this cheese, there are many spherical
holes of the same radius. We can establish a spatial coordinate system
within this cheese, where the bottom surface of the cheese is at z = 0 and the top surface of the
cheese is at z = h.
Currently, there is a little mouse named Jerry on the bottom
surface of the cheese. Jerry knows the coordinates of the centers of all
the holes in the cheese. If two holes are tangent or intersect, Jerry
can move from one hole to another. Specially, if a hole is tangent to or
intersects the bottom surface, Jerry can enter the hole from the bottom
surface of the cheese; if a hole is tangent to or intersects the top
surface, Jerry can exit from the hole to the top surface of the
cheese.
Jerry, located on the bottom surface of the cheese, wants to know if
he can utilize the existing holes to reach the top surface of the cheese
without damaging the cheese.
Each input stream contains varies of data.
In the first line, it includes an integer T which indicates the total number
of input data lines. Then it followed by T lines of data, each line is
formatted as n, h, r, separated
by space, representing number of holes, height of cheese,
and the radius of holes. The following n lines contains three integers
x, y, z,
separated by space, representing the coordinate of each
hole.
intfind(int x){ return x == parent[x] ? x : (parent[x] = find(parent[x])); }
voidmerge(int x, int y){ int rootX = find(x); int rootY = find(y); if (rootX != rootY) { if (rank[rootX] < rank[rootY]) { parent[rootX] = rootY; } elseif (rank[rootX] > rank[rootY]) { parent[rootY] = rootX; } else { parent[rootX] = rootY; rank[rootY]++; } } }
public: Cheese(vector<Long64> X, vector<Long64> Y, vector<Long64> Z, Long64 radius, Long64 height) { X = X; Y = Y; Z = Z; radius = radius; height = height;
int n = X.size();
// use n + 1 and n + 2 to represent top and bottom parent.resize(n + 2); rank.resize(n + 2, 1);
for (int i = 0; i < n + 2; i++) { parent[i] = i; } }
boolcanReach(){ int n = X.size(); for (int i = 1; i <= n; i++) { if (Z[i] <= radius) { merge(i, n + 1); } if (Z[i] + radius >= height) { merge(i, n + 2); } }
for (int i = 1; i <= n; i++) { for (int j = i + 1; j <= n; j++) { if (isConnect(X[i], Y[i], Z[i], X[j], Y[j], Z[j], radius)) { merge(i, j); } } }
returnfind(n + 1) == find(n + 2); } };
Explanation:
This is a typical Merge-find problem. We can split
the holes into two sets: those whom connected to the bottom surface and
those whom connected to the top surface. If a hole tangent to or
intersects the bottom surface, then merge it into the bottom
set, similarly, we can merge those whom are connected to the top
surface to the top set. To determine whether there are holes
can form a pathway, we just need to know whether the top surface and the
bottom surface are in the same set. All above can be achieved by using
merge-find algorithm.
Merge-find(并查集), also known as
Disjoint-set, is a data structure that efficiently
handles the dynamic connectivity problem. It supports two primary
operations: Union (to merge two sets) and
find (to determine the representative or root of a
set). This structure is especially useful in algorithms that deal with
partitioning elements into disjoint subsets, such as Kruskal’s
algorithm for finding the Minimum Spanning Tree.
Key Concepts
Data Structure
Parent Array: Each element has a pointer to its parent. For a
representative element (root), the parent points to itself.
Rank Array (or Size Array): Keeps track of the rank (or size) of the
tree rooted at a particular element. It helps in optimizing the union
operation by always attaching the smaller tree under the larger
one.
Operations
Find:
Determines which subset a particular element is in.
Can be used to check if two elements are in the same subset.
Uses path compression to flatten the structure,
making future operations faster.
Union:
Merges two subsets into a single subset.
Uses union by rank or union by
size to keep the tree shallow, improving efficiency.
Pseudocode and Explanation
Initialization
Parent Array: Each element is its own parent initially.
Rank Array: Initialize rank or size of each element as 0 or 1.
Explanation: - Recursively traverses the parent
pointers until the root is found. - Path compression flattens the tree
by making every node point directly to the root, reducing the time
complexity for future operations.
Union by Rank
Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
voidunion(int x, int y){ int rootX = find(x); int rootY = find(y);
if (rootX != rootY) { // union by rank if (rank[rootX] > rank[rootY]) { parent[rootY] = rootX; } elseif (rank[rootX] < rank[rootY]) { parent[rootX] = rootY; } else { parent[rootY] = rootX; rank[rootX]++; } } }
Explanation: - Find the roots of the elements x and y. - Attach the smaller tree under
the larger tree to keep the overall structure balanced. - If both trees
are of the same rank, make one the parent of the other and increase its
rank.
The Knapsack Problem (背包问题) is a classic
optimization problem in computer science and mathematics. It involves a
set of items, each with a weight and a value, and a knapsack with a
maximum weight capacity. The goal is to determine the most valuable
combination of items to include in the knapsack without exceeding its
weight capacity.
Variants of the Knapsack
Problem
0/1 Knapsack Problem
In this version, each item can either be included in the knapsack or
excluded; you can’t take a fraction of an item. Formally, given n items with weights w1, w2, ..., wn
and values v1, v2, ..., vn,
and a knapsack with capacity W, the problem is to maximize the
total value V while ensuring
that the total weight W′ dose not exceed W.
Mathematical formulation: Maximize: $$
V=\sum_{i=1}^n v_i x_i
$$ Subject to: $$
\sum_{i=1}^n w_i x_i \le W
$$ Where xi is 0 or 1,
indicating whether item i is
included.
Fractional Knapsack Problem
In this version, you can take fractions of items. This allow for a
greedy algorithm to achieve the optimal solution by taking items in
decreasing order of their value-to-weight ratio until the knapsack is
full. Mathematical formulation: Maximize: $$
V=\sum_{i=1}^{n}v_i x_i
$$ Subject to: $$
\sum_{i=1}^n w_i x_i \le W
$$ Where $0 x_i $, indicating the fraction of item i included.
Solution Approaches
Dynamic Programming
Used primarily for the 0/1 knapsack problem. It builds up a table of
solutions to subproblems to find the optimal solution.
In the context of the 0/1 kanpsack problem, the transition
equation helps build the dynamic programming solution. This
equation is used to determine the maximum value achieveable by either
including or excluding an item based on previous solutions. Let’s delve
into the specifics.
Dynamic
Programming Transition Equation for 0/1 Knapsack Problem
For the 0/1 knapsack problem, the transition equation determines the
maximum value dpi, w
achievable using the first i
items with a maximum weight capacity w. The equation can be expressed as:
dpi, w = max(dpi − 1, w, dpi − 1, w − wi + vi)
Where: - dpi, w
is the maximum value achievable with the first i items and capacity w. - vi and wi are the value
and weight of the i-th item. -
dpi − 1, w
represents the maximum value without including the i-th item. - dpi − 1, w − wi + vi
represents the maximum value including the i-th item, given that w − wi
must be non-negative.
Explanation: 1. Initialzation: - dp0, w = 0
for all w, meaning if no items
are considered, the maximum value is zero regradless of the weight
capacity. 2. Recurrence: - For each item i and each weight w - If the item’s weight wi is greater
than w, the item cannot be
included, so the maximum value remains dpi − 1, w.
- If the item’s weight wi is less than
or equal to w, consider two
cases: - Exclude the item: The maximum value is dpi − 1, w.
- Include the item: The maximum value is dpi − 1, w − wi + vi.
- Take the maximum of these two cases.
Implementation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
intknapsack_DP(const vector<int>& values, const vector<int>& weights, int W){ int n = values.size();
// create a 2-D DP table to store the maximum value for each subproblem vector<vector<int>> dp(n + 1, vector<int>(W + 1, 0));
// iterate through each item for (int i = 1; i <= n; i++) { for (int w = 0; w <= W; w++) { if (weights[i - 1] <= w) { dp[i][w] = max(dp[i - 1][w], dp[i - 1][w - weights[i - 1]] + values[i - 1]); } else { dp[i][w] = dp[i - 1][w]; } } }
return dp[n][W]; }
Greedy Algorithm
Effective for the fractional knapsack problem. It involves selecting
items based on their value-to-weight ratio until knapsack is full.
Before introducing the concept of P-problem and NP-problem, we need
to know something about time complexity.
Some types of time complexity of a program are proper to computer,
which means the time usage is affordable, or worthy. And some others
types are not that time-saving! So, to categorize those algorithms,
computer scientists discover that some specific pattern of time
complexity are time-saving, and others are not. Those time-efficient
time complexity are polynomial, which is letter
P in P-problem stands for, and others that can not
determine whether this problem can be solved in polynomial time, then
they are nondeterministic polynomial, which is the
so-called NP. Problems in P are considered
efficiently solvable.
Time complexity like $O(1), O(n), O(\\log
n), O(n^c)$ are called polynomial.
Time complexity like O(cn), O(n!)
are called non-polynomial.
So after understanding the term polynomial means in computer
algorithm efficiency, we can go on to known the defination of
P-Problem and NP-Problem.
These conceptions actually are discussing the sovlability
of a problem.
P Problem
If a problem can be solved in polynomial time, this problem
can be deemed as a P Problem,
NP Problem
Given a problem, if any of its proposed solution can be
varified within polynomial time, this problem can be defined as a
nondeterministic polynomial problem.
And it is important that many problems in NP have no
known polynomial-time solutions.
Relationship
between P-Problem and NP-Problem
First, it is obvious that all of P-Problem are
NP-Problem. However, does it still is a true statement
when reversed? This is hard to say, so this question becomes the famous
P = NP?
problem.
NPC Problem
The NP-Complete problem is a subset of of NP problems that are as
hard as any problem in NP. If any NP-complete problem can be solved in
polynomial time, then every problem in NP can be solved in polynomial
time.
In short, the NP-Complete problems are the hardest NP
problems!
Suppose here is an NP problem, all of NP problem can be reduced to
it. Then if only find the polynomial time solution of this problem,
every NP problem can find a polynomial time solution by reducing to that
NP problem.
And how to prove a problem is NP-Complete?
Steps to Prove a
Problem is NP-Complete
Show the Problem is in NP Verification in polynomial time:
demonstrate that given a candidate solution, you can verify its
correctness in polynomial time.
Select a Known NP-Complete Problem Source problem: choose a problem
already known to be NP-Complete for the reduction.
Construct a Polynomial Time Reduction Transform: develop a
polynomial time algorithm that converts instances of the chosen
NP-Complete problem to instances of the problem you are proving
NP-Complete
Input transformation: map any instance of the known NP-Complete
problem to an instance of your problem.
Solution transformation: ensure that a solution to the transformed
instance of your problem corresponds to a solution to the original
NP-Complete problem.
In summary, only two steps needs to be done:
first show you can verify solutions in polynomial time.
then transform a known NP-Complete problem to your problem in
polynomial time.
Given an lexicographical order array words
with size of n, the strings
among words
are all writen in lower case letters. And given a specific string prefix
with a length of m, count the
number of strings in words
that have a prefix of prefix.
boolhasPrefix(string& word, string& pref){ if (word.size() < pref.size()) returnfalse; for (int i = 0; i < pref.size(); i++) { if (word[i] != pref[i]) returnfalse; } returntrue; }
intfindFirst(int left, int right){ int low = left, high = right, label = -1; while (low <= high) { int mid = (low + high) / 2; if (hasPrefix(words[mid], prefix)) { label = mid, high = mid - 1; } elseif (prefix[0] < words[mid][0]) { high = mid - 1; } else { low = mid + 1; } } return label; }
intfindLast(int left, int right){ int low = left, high = right, label = -1; while (low <= high) { int mid = (low + high) / 2; if (hasPrefix(words[mid], prefix)) { label = mid, low = mid + 1; } elseif (prefix[0] < words[mid][0]) { high = mid - 1; } else { low = mid + 1; }
intcount(){ int from = findFirst(0, words.size() - 1); int to = findLast(0, words.size() - 1);
if (from < 0) return0; elsereturn to - from + 1; } };
Explanation:
The main idea of this algorithm is using binary search to
find the boundary of words which contain the specific prefix. The
function findFirst
denotes the position where first word has the prefix and function findLast
denotes the last one. The only difference between findFirst
and findLast
is that when hit a proper case, findFirst
continues to search the section before but the findLast
search the section behind. By doing this in repetition, the first one
and the last one will be found.
About the time complexity, the time complexity of function
hasPrefix
is O(m). Suppose the
time complexity of function findFirst
and findLast
is T(n), then $$
T(n)=T(\frac{n}{2}) + O(m), T(1) \in O(m),
$$ which means T(n) ∈ O(mlog n).
Thus the time complexity of countPrefix
is O(mlog n).
Given an integer array coins representing coins of different
denominations and integer amount representing a total amount of
money.
Return the fewest number of coins that you need to make up that
amount. If that amount of money cannot be made up by any combination of
coins, return −1.
dpi
represents the minimum number of coins that makes up the amount i. Assume that the values from dp0 to dpi − 1
are already known, thus the transition equation of dpi can
be deduced as dpi = minj = 0lcoins − 1dpi − coinsj + 1
which means the value of dpi is
determined by dpi − coinsj.
If dpi can
transfer from dpi − coinsj
via coinsj,
the number of coins that used should add 1.
SHA-256 encryption algorithm is a division from the
SHA-2 group, a.k.a. Secure Hash Algorithm 2. It can be
known from the name that, the SHA is a type of hash
function. This encryption algorithm encode the input message by hashing
to encrypt and compress. The result of SHA encryption
usually contains random alphanumeric characters.
Explanations of SHA-256
Algorithm
Before explicitly explain the detailed algorithm of
SHA-256, it is essential to figure out what should be
done in advance. Three sections can be extracted: the constants that are
used in the algorithm, how to preprocess the data, and the logical
operations that are used. After understanding these, the algorithm
itself is easy to handle.
Before Dive into SHA-256
Constants
In the SHA-256 algorithm, 8 hash initial values and
64 hash constants are used. First, the eight initial values are as
follow:
Finally, the length of the original message should be appended to the
end of current data by big endian.
Logical Operations
All of logical operations that involved with SHA-256
algorithm are as follow: Ch(x, y, z) = (x ∧ y) ⊕ (⌝x ∧ z)Maj(x, y, z) = (x ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z)Σ0(x) = SR2(x) ⊕ SR13(x) ⊕ SR22(x)Σ1(x) = SR6(x) ⊕ SR11(x) ⊕ SR25(x)σ0(x) = SR7(x) ⊕ SR18 ⊕ R3(x)σ1(x) = SR17(x) ⊕ SR19(x) ⊕ R10(x)
Explanations:
∧: and.
⌝x: not.
⊕: xor.
SRn:
Cycled right shift.
Rn:
Right shift.
Abstraction
How here comes to ultimately main part of the
SHA-256 algorithm! All the details related to the
algorithm will be elaborate here.
The first things is: break message into 512-bit
chunks.
Suppose that the message m
can be splited into n chunks,
so the whole algorithm needs to iterate for n times in order to get a 256-bit
hash result. Initially, a 256-bit initial value H0 is given, then operate
with data chunk 1 and get H1. Then operate with
data chunk 2 and get H2. Repeat above step
until all data chunks are operated.
The pseudo-code are:
1 2 3 4 5
data_chunks = divide_by_512_bit(message) handle = H0 for (chunk in data_chunks): handle = operate(handle, chunk) result = handle
Among each operation, each data chunk with 512-bit chunk are
divided into 16 words, each word contains 32 bits and
encoded with big endian, as W0, ..., W15.
Which means the first sixteen words can be directly derived from the
ith
chunk data. The rest words are derived through following equation:
Wt = σ1(Wt − 2) + Wt − 7 + σ0(Wt − 15) + Wt − 16
Then, iterate by 8 words for 64 times follow these rules:
A = H + Σ1(E) + Ch(E, F, G) + Ki + Wi + Maj(A, B, C) + Σ0(A)B = AC = BD = CE = D + H + Σ1(E) + Ch(E, F, G) + Wi + KiF = EG = FH = GNotice that: the
initial value of word A-H is H0-H7. And Ki is the ith
value of the initial hash value.
This notes is according to MIT’s course: missing-semester:
Editors(Vim)
Vim
Vim is a command line based code editor, designed for programmers and
designed by programmers. And other visual editors like VS Code can also
integrate vim mode for advanced coding experience.
The core philosophy of vim editor is command, all
operations are done by typing keys, using without mouse. After being
familiar with these key settings, the speed of programming can be
extremely improved.
Modes of vim
Vim contains several operating modes for different
using:
Normal: navigating, moving, and simple editing. Press
ESC for back to normal mode.
Insert: inserting text. Press i for switching to
insert mode.
Replace: replacing text. Press shift-r for switching to
replace mode.
Visual:
Visual-plain: selecting text by cursor movement. Press
v.
Visual-line: selecting text by line. Press
shift-v.
Visual-block: selecting text by block. Press
ctrl-v.
Command-line: running command. Press shift-;.
Command-line
By pressing : to get into command-line mode, where can
send command to operating thw vim in a whole.
:q | quit (or close current window).
:w | write and save.
:wq | save and quit.
:e {filename} | open file and edit.
:ls | show open buffers.
:help {topic} | open help
:help :w | help info for command :w.
:help w | help info for command w.
Cursor movement
Basic movement: hjkl | left, down, up, right.
Move by word: w, b, e | next word, beginning of word,
end of word.
Move by line: 0, ^, $ | beginning of line, first
non-blank character, end of line.
Move by screen: H, M, L | top of screen, middle of
screen, bottom of screen.
Scroll: ctrl-u, ctrl-d | scroll up, scroll down.
Move by file: gg, G | beginning of file, end of
file.
Misc: % | corresponding item.
Find: f{char}, t{char}, F{char}, T{char} |
findforwardcharacter on the current line.
Search: /{regex}, n, N | navigating match items.
Edits
i | enter insert mode.
o, O | insert line below, insert line above.
d{motion} | delete by motion command.
dw | delete word.
d$ | delete to end of line.
d0 | delete to beginning of line.
c{motion} | change by motion command.
x | delete character.
s | substitute character.
u | undo.
ctrl-r | redo.
y | copy
p | paste.
Counts
By using command pattern {number}{command} can do this
command by times.
3w move 3 words forward.
5j move 5 lines down.
7dw delete 7 words.
Modifiers
While using delete or change command, add
i or a can specify the command is done inside
or around.
ci( | change content inside current pair of
parentheses.
ci[ | change content inside current pair of square
brackets.
da' | delete a single-quoted string, including the
surrounding single quotes.
Attention is All You Need is a paper published in
2017 by Google Brain team. In this paper, the prominent
self-attention mechanism and multi-headed
self-attention mechanism are proposed to enhance the general
capacity of neural networks. Meanwhile, a basic neural network
architecture Transformer is proposed to adopt such
mechanism and its been tested on translation task.
What is Attention?
The attention mentioned in neural network is similar
with the one we mentioned about human intelligence: an ability to
capture information which seems to be more important (subjectively). The
“attention mechanism” in human congnition is prominently manifested
across various sensory perceptions. For instance, when using our eyes,
certain features such as bright colors, rapid movements, or familiar
faces tend to capture our gaze. Similarly, during reading, specific
names, events, or elegant phrases can leave a lasting impression. This
attention mechanism also extends to other senses like smell, touch, and
hearing. Furthermore, the stream of consciousness during thought
processes is often influenced by particularly striking information, all
of which illustrate the attention mechanism in human cognition.
In the context of a nerual network, what does the attention mechanism
refer to? For example, when we want a neural network to process an
image, how does it determine which elements of the image are more
important. Consider objects like people, pets, or cars in an image–how
does the network decide which one is more noteworthy in the image’s
“context”? The attention mechanism addresses this problem. When
processing an image, the neural network assigns higher weights, or
“attention scores”, to what it deems more important information. During
subsequent forwarding process, the high-weight information is
emphasized, enabling the network to capture more detailed and deeper
features. This, in turn, aids in handling the semantic importance of the
image. Besides image processing, the mechanism is applicable to text,
audio, graph, and other data types. Since data in neural networks exists
as discrete tensors, the fundamental principles of calculating attention
scores remain consistent across different data types.
Self-Attention Mechanism
The so-called self-attention mechanism refers to a process where the
attention scores are computed by comparing the elements within the
sequence itself. In other words, the reference for calculating attention
scores is the sequence as a whole. For a tensor sequence input into a
neural network, the self-attention mechanism requires computing the
attention scores between each element and every other element in the
sequence. The basic process can be represented in pseudocode as
follows:
1 2 3 4 5
for (element_x : input_sequence) { for (element_y : input_sequence) { attention_score[x][y] = compute_score(element_x, element_y); } }
However, for deep learning tasks, using to nested loops to compute
values is highly time-consuming, especially when the input sequence
length can be in the tens of thousands. Therefore, researchers use
vectorized operations to calculate all pairwise attention scores
efficiently. By using vectorized multiplication, we can obtain all
attention scores with a single dot product operation between the input
tensor sequence and its transposed version.
Through this matrix multiplication operation, combined with parallel
computing technology and the parallel processing capabilities of GPU
devices, the efficiency of this computation can be greatly enhanced. To
ensure that the attention scores are not fixed and be gradually
optimized during the training process, we use parameter matrices to
adjust the calculation of attention scores. This enable the model to
truely capture the information that deserves attention. Thus, the
equation for computing attention score can be expressed as follows:
$$
Attention(Q,K,V) = softmax(\frac{QK^T}{\sqrt{d_k}})V
$$ Here, $ Q, K, V $ refer to each element in the input sequence
being multiplied by three shared weighted matrices $ W_q, W_k $ and $
W_v $. The data multiplied by $ W_q $ represents the element that will
compute attention scores with other elements in the sequence. The data
multiplied by $ W_k $ represents the data being compared. After
comparing each element with every other element, the resulting sequence
is the attention score sequence for that element. This sequence is
processed through the softmax function and then multiplied by $ W_v $ to
perform a weighted sum. The entire process essentially calculates the
correlations between all elements and the performs a weighted sum, with
the result being the output of the attention mechanism. Through this
process, the information that is considered more important is multiplied
by greater weights, as a result their “information” can be magnified in
the subsequent processes.
Transformer Model
The Transformer model adopts an encoder-decoder
architecture which is a popular architecture in current deep learning
field. Two independent models are separatedly used as an encoder and a
decoder. For most cases, they are used to process different types of
data. For example, in image captioning, encoder is used for
process image signal and decoder is for generating captions. In this
paper, the Transformer is tested on English-German translation task, so,
unprecisely, you can image the encoder is trying to “understand” a
sentence in English and then generate some kind of metaphysical
or abstract information, these information contains the
concepts, subjects and objectives that are “mentioned” in the original
language. Then, these information is passed to decoder, and the task of
decode is to decrypt these information into German.
Importantly, the comparison I mentioned above is extremely unprecise,
this model cannot translate as human do, in fact it’s still a
prediction or simulation, but not
understanding.