A lot of fixes and updates on all languages

This commit is contained in:
Sergei Asanov
2023-04-17 10:23:27 +04:00
parent e8fbc92245
commit 33dac5ecbf
219 changed files with 5718 additions and 4719 deletions

View File

@ -0,0 +1,39 @@
The Null symbol was developed for use in computer terminals, printers, text processing systems, and telecommunications equipment to indicate an empty or invalid position in a data stream.
It's the first symbol in Unicode and ASCII taking the zeroth position. Null is used in various ways, such as indicating the end of lines or data blocks, filling space between data elements, preventing the processing of data after a certain stage, etc. As for programming and text data processing, Null can be used as a marker for the end of a string or array of characters, especially in programming languages such as C and C++. They are called C-strings, other names include null-terminated strings or ASCIZ strings. According to this approach, the code working with the string initially does not know its length and processes the characters consequently, until it encounters a null character.
[[[code:c
/* Type a string */
i = 0; // start from the beginning of the string
while (s[i] != 0) { // work until the current character is \0
echo s[i]; // type the next character
i++; // move to the next one
}
]]]
The disadvantages of this approach are the following:
[*] The length of the string is not known in advance;
[*] The string can't contain null directly \0;
[*] If you forget to write \0 at the end or delete it accidentally, the code will continue to work, but the consequences will be unpredictable.
In case of using fixed-width multibyte encodings, the null character should also occupy the required number of bytes. For example, in UCS-2 it's two null bytes. An alternative approach to organizing strings would be to store the length of the string in a separate variable.
However, complete ignorance of this character may sometimes lead to unwanted consequences. For example, some old browsers interpreted a string like [code <\0script>] as [code <script>], which enabled attackers to inject XSS into other sites (whose authors did not anticipate this nuance when processing data).
Escape sequence [code \0] is available in many programming languages to insert this character.
Like other control characters, this symbol doesn't possess any visual representation and does not occupy a lot of space on the screen or in printing. In the [BLOCK:control-pictures] section, there is a separate symbol representing the graphical representation of the null character in the form of the abbreviation NUL - [U:2400].
Escape sequence: [code \0].
This symbol is one of the eight control characters, the presence of which is required by the POSIX standard:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,5 @@
The Start of Heading symbol was used in teletype and other communication systems to indicate the beginning of a message title. A heading usually contains metadata such as sender and recipient addresses and is used to organize the transfer of information between devices.
In modern computer systems and applications, the use of the U+0001 symbol has become rare; its functions are often replaced by other methods of encoding metadata or structuring data.
Like other control characters, this symbol has no visual representation and does not occupy space on the screen or in print. The block [BLOCK:control-pictures] includes a separate symbol representing the graphic image of the header one. It's the abbreviation called SOH (Start of Heading) — [U:2401].

View File

@ -0,0 +1,5 @@
The Start of Text (STX) symbol was developed for use in telegraph and other communication systems to indicate the beginning of the text part of a message following a header that begins with the character sequence [U:0001] [U:0001 *#].
In modern computer systems and applications, the use of the U+0002 character has become less common as its functions are often replaced by other methods of encoding and structuring data.
Like other control symbols, this one has no visual representation and does not occupy a lot of space on screen or in printed text. There is a separate symbol in [BLOCK:control-pictures] representing the graphic image of the Start of Text character as the STX (Start of Text) abbreviation - [U:2402].

View File

@ -0,0 +1,5 @@
The End of Text symbol was originally designed for use in teletype and other communication systems. It indicated the end of the text part of a message following the header, which began with the Start of Heading symbol [U+0001]. Besides, it also noted the text content, which began with the Start of Text symbol [U+0002].
The use of the U+0003 symbol has become less popular in modern computer systems and applications. Its functions are often replaced by other methods of encoding and structuring data.
Like other control characters, this symbol has no visual representation and does not occupy much space on screen or in typed text. There is a separate symbol in [BLOCK:control-pictures] representing the graphical depiction of the End of Text symbol. It shows as the ETX (End of Text) abbreviation - [U+2403].

View File

@ -0,0 +1,5 @@
The End of Transmission character was designed for use in telegraph and other communication systems. It indicated the end of data or message delivery between devices.
The use of the U+0004 character has become less popular in modern computer systems and applications. Its functions are often replaced by other methods of encoding and structuring data, such as data transmission protocols and checksums.
Like other control characters, this symbol has no visual representation and does not occupy a lot of space on screen or in typed text. There is a separate symbol in [BLOCK:control-pictures] representing the graphical depiction of the End of Text symbol. It shows as the EOT (End of Transmission) abbreviation - [U:2404].

View File

@ -0,0 +1,5 @@
The Enquiry symbol was designed for teletype and other communication systems. It was used to request a response from a remote device. As for communication protocols, the Enquiry symbol is applied to initiate data exchange by requesting confirmation to receive or deliver data.
The use of the U+0005 symbol has become less popular in modern computer systems and applications, since ts functions are often replaced by other communication methods and protocols such as TCP/IP, HTTP, and others.
Like other control characters, this symbol has no visual representation and doesn't occupy much space on screen or in typed text. There is a separate symbol in [BLOCK:control-pictures]. It represents a graphic image of the Enquiry symbol as the ENQ (Enquiry) abbreviation - [U:2405].

View File

@ -0,0 +1,5 @@
The Acknowledge symbol was designed for teletype and other communication systems. It was used to confirm the successful reception of data or messages from a remote device. As for communication protocols, Acknowledge signals successful data reception, allowing the sender to ensure the integrity and accuracy of the information delivery.
Nowadays the U+0006 symbol has become less popular, since ts functions are often replaced by other communication methods and protocols such as TCP/IP, which include built-in mechanisms for acknowledging data reception.
Like other control characters, this symbol has no visual representation and does not take up space on the screen or in typed text. There is a separate symbol in [BLOCK:control-pictures] representing a graphic image of the Acknowledge symbol. It shows as the ACK (Acknowledge) abbreviation - [U:2406].

View File

@ -0,0 +1,27 @@
The Bell symbol was developed for telegraph and other communication systems. It was used to trigger an audio or visual signal on the receiving device. As for computer terminals and text editors, Bell was often used to notify the user about some event, such as the completion of a task or an accidental error. It was commonly done via the system speaker.
To enter the Bell symbol, the Escape sequence [code \a] was introduced:
[[[php
s = "Hey you: \a!!";
echo s;
]]]
Apart from that, you could also send this symbol using the Ctrl-G key combination. Fortunately, most modern systems don't burst out squeaking after being triggered by a simple text command.
Nowadays the use of the U+0007 Bell symbol has become less common in modern computer systems and applications, since its functions are often replaced by other methods of notifications, such as pop-up messages or sound effects.
Like other control characters, this symbol has no visual representation and doesn't occupy a lot of space on screen or in typed text. There is a separate symbol in [BLOCK:control-pictures] that represents a graphical image of the Bell symbol as the abbreviation BEL - [U:2407].
Escape sequence: [code \a].
It's one of the eight control symbols, the presence of which is required by POSIX:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,24 @@
The Backspace symbol, also known as "delete", was originally designed for use in teletype and other communication systems to erase the previous character and move the cursor one step back.
When it comes to text editors and computer terminals, the Backspace symbol is typically used to delete the character before the current cursor position. In modern computer systems and applications, the Backspace symbol is a standard control element and is often associated with the ← Backspace button on the keyboard.
On some devices Backspace could be used to lay one character over another. For example, [code c\b^] (where \b is the escape sequence for U+0008) would type [code ĉ].
[[[php
echo "ab\bc"; // would type "ab c", not "ac"
]]]
Like all other control characters, this symbol has no visual representation and does not occupy much space on screen or in typed text. There is a separate symbol in [BLOCK:control-pictures] representing a graphical image of the Backspace symbol. It's the abbreviation BS (Backspace) - [U:2408].
Escape sequence: [code \b].
It's one of the eight control symbols, the presence of which is required by POSIX:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,57 @@
The Horizontal Tab symbol was developed to simplify text formatting. It provided a mechanism for automatic alignment in vertical columns on output devices, such as printers and computer terminals.
When it comes to text editors and computer terminals, the Horizontal Tab symbol is typically used to move the cursor to the next fixed tab position. Tabs are pre-defined at equal intervals, such as every 8 characters or any other number set by the user. Horizontal tabs make it easier to align text and structure information in tables.
The symbol appeared in the era of typewriters. Expensive typewriters had a special key that was pressed and that moved the carriage forward until it encountered the tabulator, which indicated full stop. This mechanism accelerated the processing of typing and allowed to eliminate errors.
This mechanism proved useful in computers as well: when outputting tabular data, it wasn't necessary to keep track of column widths in a programme. When transmitting the Tab character, the terminal or printer would move the carriage to the next tab position itself. If not specified otherwise, the tab width would be 8 - so the positions were: 9, 17, 25, 33, 41...
When using the keyboard, you would type Tab and (historcially) Ctrl+I. IT-specialists have their own slang, where they call this symbol a tab. «[i]Put a couple of tabs here[/i]».
This symbol is used in programming languages to set indentations. Most often the tabulation equals 4 spaces, but you may come across other options too.
[[[code:html
<div class="first">
<div class="second">
It's an example of source code formatting using tabulation.
</div>
</div>
]]]
Depending on the device or application, tabulation may not have a fixed length. For example, it may indicate the shift from one column to another in a table:
[[[code
One Two Three
1 2 3
111 222 333 — here the spaces are smaller
]]]
You can use Escape sequence in source code [code \t]:
[[[php
echo "one\two";
]]]
Lots of text processors like Microsoft Word still make it possible to do text formatting using tabulators (tabs) rather than tables. Sometimes it's even more convenient - for instance, when creating a table of contents.
Many text editors can be configured to automatically replace tab characters with a sequence of spaces (usually four).
Some formats like TSV use the Tab symbol to divide data. This can be more convenient than using a space or comma, which are pretty common and require special escaping.
Like other control characters, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. There is a separate symbol in [BLOCK:control-pictures] that represents the graphical image of the Horizontal Tab symbol. It shows up as the abbreviation HT (Horizontal Tabulation) - [U:2409].
There is also [U:000B] [U:000B *#].
Escape sequence: [code \t].
It's one of the eight control symbols, the presence of which is required by POSIX:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,24 @@
The symbol U+000A is known as Line Feed (LF) or New Line (NL) character. It was developed to indicate the end of a text line and the beginning of a new line in text documents, computer terminals, and text processing systems. The line feed moves the printer's drum by one line. On a video terminal, it moves the cursor down and, if necessary, scrolls the image.
As for computer terminals and text editors, Line Feed is used there to move the cursor to the beginning of the next line. It is the standard way of indicating the end of a line in UNIX-like operating systems, including Linux and macOS.
However, Windows operating systems feature a combination of Carriage Return (CR) [U:000D] [U:000D *#] and Line Feed (LF) [U:000A] [U:000A *#] (represented as "\r\n") in order to indicate the end of the line.
In the early teleprinters that operated at a few baud rates, the division of control characters into CR and LF was not random. It concealed the fact that the carriage far to the right might not have enough time to reach the next character. What's even worse, the concepts of "driver" and "buffering" did not exist yet.
When it came to Morse code, there was used a separator -•••− , mnemonic BT (Break Text).
Just like other control characters, this character has no visual representation and does not occupy much space on screen or in typed text. There are two separate symbols in [BLOCK:control-pictures] representing a graphical image of the U+000A character: the Line Feed symbol in the form of an abbreviation LF (Line Feed) - [U:240A] and the New Line symbol in the form of an abbreviation NL (New Line) - [U:2424].
Escape sequence: [code \n].
It's one of the eight control symbols, the presence of which is required by POSIX:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,18 @@
Vertical tabulation was originally used in computer terminals and text processing systems to move the cursor down a fixed number of lines, usually one line. It is also known as vertical tab or VT.
Speaking of modern computer systems and applications, the vertical tabulation symbol is not as popular as [U:0009] [U:0009 *#] (Horizontal Tab), but it can still be found in text files or code. Usually it serves as a delimiter between data elements or text lines.
Like other control characters, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Vertical Tabulation as the abbreviation VT — [U:240B].
Escape sequence: [code \v].
It's one of the eight control symbols, the presence of which is required by POSIX:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,20 @@
The Form Feed symbol, also known as clear screen, was used in computer terminals, printers, and text processing systems to indicate the end of a page and move to the beginning of the next page.
As for text editors and computer terminals, the Form Feed symbol is typically used to separate pages of text within a single file. Speaking of printers, the U+000C symbol signals that the printer should finish the current page and start printing the next page.
When it comes to modern computer systems and applications, this symbol is rarely used. It happens so because other mechanisms are usually employed for dividing and formatting pages. However, Form Feed can still be found in text files or code where it performs its original function of separating pages in the text.
Like other control characters, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Form Feed, which shows up as the abbreviation FF (Form Feed) — [U:240C].
Escape sequence: [code \f].
It's one of the eight control symbols, the presence of which is required by POSIX:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,18 @@
The Carriage Return symbol was used in computer terminals, printers, and text processing systems to move the cursor to the beginning of the current line.
As for computer systems and text files, the Carriage Return symbol usually denotes the end of a line. Different operating systems feature different combinations of symbols. Speaking of UNIX and Linux-based systems, the [U:000A] [U:000A *#] symbol is used, while the Windows operating systems have the following combination of symbols: [U:000D] [U:000D *#] (Carriage Return) and [U:000A] [U:000A *#] (Line Feed), showing as "\r\n".
Like other control characters, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Carriage Return. It pops up as the abbreviation CR — [U:240D].
Escape sequence: [code \r].
It's one of the eight control symbols, the presence of which is required by POSIX:
[*] [code \0] [U:0000] [U:0000 *#];
[*] [code \a] [U:0007] [U:0007 *#];
[*] [code \b] [U:0008] [U:0008 *#];
[*] [code \t] [U:0009] [U:0009 *#];
[*] [code \n] [U:000A] [U:000A *#];
[*] [code \v] [U:000B] [U:000B *#];
[*] [code \f] [U:000C] [U:000C *#];
[*] [code \r] [U:000D] [U:000D *#].

View File

@ -0,0 +1,7 @@
The Shift Out symbol is also known as the lowercase mode. It was used in computer terminals, printers, and text processing systems to switch between different character sets or modes of device operation.
This symbol was often utilized to select an alternative set of characters defined for a particular device or encoding. In that situation the device would switch to the alternative character set until it received the Shift In symbol [U:000F] [U:000F *#], which would return the device to its original character set.
As for modern computer systems and applications, the U+000E symbol is rarely used there. It happens so because there are other mechanisms and encodings that perform the function of switching between different sets and languages. For example, Unicode.
Like other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of the Shift Out symbol in the form of the abbreviation SO - [U:240E].

View File

@ -0,0 +1,5 @@
The ASCII Normal Mode (Shift In) symbol, also known as the uppercase mode, was used in computer terminals, printers, and text processing systems. Its aim was to return the default set of characters after switching to an alternative character set using the Shift Out symbol [U:000E] [U:000E *#]. This way the Shift Out and Shift In symbols worked together, allowing devices to switch between two sets of characters simultaneously.
As for modern computer systems and applications, the U+000F symbol is rarely used there. It happens so because there are other mechanisms and encodings that perform the function of switching between different sets and languages. For example, Unicode.
Like other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Shift In as the abbreviation SI — [U:240F].

View File

@ -0,0 +1,7 @@
The Data Link Escape symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment to control the data flow in the communication channel.
It was commonly present in communication protocols. The first function was about modifying the value of the following symbol. The second function was to signal the start and end of a special sequence of symbols that were to be understood as commands or control instructions. Thus, it was used to ensure the correct transmission and interpretation of control characters and data in the information stream.
As for modern computer systems and applications, the U+0010 symbol is rarely used there, since other mechanisms and protocols are employed to control data flow and encode commands.
Like other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Data Link Escape as the abbreviation DLE (Data Link Escape) — [U:2410].

View File

@ -0,0 +1,7 @@
The Device Control One symbol was used mostly in computer terminals, printers, word processing systems, and telecommunication equipment in order to control functions belonging to various devices.
It is one of the four device control characters: [U:0011] DC1, [U:0012] DC2, [U:0013] DC3, [U:0014] DC4. They all provided the ability to transmit special instructions for controlling device functions and modes of operation. The assignment and interpretation of device control symbols depended on the specific device and communication protocol.
As for modern computer systems and applications, the U+0011 symbol is rarely used there, since other mechanisms and protocols are employed to control device settings and functions.
Like other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Device Control One as the abbreviation DC1 — [U:2411].

View File

@ -0,0 +1,7 @@
The Device Control Two symbol was used mostly in computer terminals, printers, word processing systems, and telecommunication equipment in order to control functions of various devices.
It is one of the four device control characters: [U:0011] DC1, [U:0012] DC2, [U:0013] DC3, [U:0014] DC4. They all provided the ability to transmit special instructions for controlling device functions and modes of operation. The assignment and interpretation of device control symbols depended on the specific device and communication protocol.
As for modern computer systems and applications, the U+0012 symbol is rarely used there, since other mechanisms and protocols are employed to control device settings and functions.
Like other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Device Control Two as the abbreviation DC2 — [U:2412].

View File

@ -0,0 +1,7 @@
The Device Control Three symbol was used mostly in computer terminals, printers, word processing systems, and telecommunication equipment in order to control functions of various devices.
It is one of the four device control characters: [U:0011] DC1, [U:0012] DC2, [U:0013] DC3, [U:0014] DC4. They all provided the ability to transmit special instructions for controlling device functions and modes of operation. The assignment and interpretation of device control symbols depended on the specific device and communication protocol.
As for modern computer systems and applications, the U+0013 symbol is rarely used there, since other mechanisms and protocols are employed to control functions and settings.
Like other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Device Control Three as the abbreviation DC3 — [U:2413].

View File

@ -0,0 +1,7 @@
The Device Control Four symbol was used mostly in computer terminals, printers, word processing systems, and telecommunication equipment in order to control functions of various devices.
It is one of the four device control characters: [U:0011] DC1, [U:0012] DC2, [U:0013] DC3, [U:0014] DC4. They all provided the ability to transmit special instructions for controlling device functions and modes of operation. The assignment and interpretation of device control symbols depended on the specific device and communication protocol.
As for modern computer systems and applications, the U+0014 symbol is rarely used there, since other mechanisms and protocols are employed to control device functions and settings.
Like other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Device Control Four as the abbreviation DC4 — [U:2414].

View File

@ -0,0 +1,7 @@
The Negative Acknowledgement symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment to control data flow and ensure the reliable transmission of information.
It tended to indicate an error in data transmission or a problem with receiving information. In communication protocols, after receiving the NAK symbol, the data sender was required to repeat the previous transmission to correct any errors or data loss. If the data was transmitted correctly, the opposite symbol was sent ([U:0006] [U:0006 *#]).
As for modern computer systems and applications, the U+0015 symbol is rarely used there, since other mechanisms and protocols are employed to ensure the reliability of data transmission and error handling.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of the Negative Acknowledge symbol as the abbreviation NAK — [U:2415].

View File

@ -0,0 +1,5 @@
The Synchronous Idle symbol is also known as the null symbol. It was used for synchronous data transmission in computer terminals, printers, text processing systems, and telecommunication equipment. The aim of this symbol was to synchronize data transmission between two devices. In synchronous communication protocols, the SYN symbol was applied to indicate the beginning and end of a block of data, allowing devices to correctly interpret and process the received information. Some communication lines were designed in such a way that they required continuous data transmission. Overall, Synchronous Idle was sent in case there was nothing to transmit.
As for modern computer systems and applications, the U+0016 symbol is rarely used there, since other mechanisms and protocols are employed to synchronize and process data.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Synchronous Idle as the abbreviation SYN — [U:2416].

View File

@ -0,0 +1,7 @@
The End of Transmission Block symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment to indicate the end of a block of data being transmitted.
Its function was to divide transmitted data into blocks, allowing receiving devices to determine the boundaries of data blocks and properly process the received information. This was especially useful when data was transmitted in synchronous mode, and devices needed to have the information about when the current block of data ended and the next one began.
As for modern computer systems and applications, the U+0017 symbol is rarely used there, since other mechanisms and protocols are employed to separate and process data blocks.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of End of Transmission Block as the abbreviation ETB (End of Transmission Block) — [U:2417].

View File

@ -0,0 +1,7 @@
The Cancel symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment to cancel the current operation or task.
Its main purpose was to interrupt processes and operations that were initiated earlier. In case of an error, an incorrect command, or when the user wanted to cancel the current operation, this symbol was sent. The reaction to the CAN symbol depended on the specific device or program that processed it.
As for modern computer systems and applications, the U+0018 symbol is rarely used there, since other mechanisms and protocols are employed to manage and interrupt tasks.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of the Cancel symbol as the abbreviation CAN — [U:2418].

View File

@ -0,0 +1,5 @@
The End of Medium symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment. Its main purpose was to indicate the end of a physical data storage, such as a tape or disk. It signalled to receiving devices that it was necessary to process the received data and prepare for the end of the information transmission.
As for modern computer systems and applications, the U+0019 symbol is rarely used there, since other mechanisms and protocols are employed to determine the size and boundaries of data.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of the End of Medium symbol. It's the abbreviation EM — [U:2419].

View File

@ -0,0 +1,7 @@
The substitute symbol was used in computer terminals, printers, text processing systems, and telecommunications equipment. Its main purpose was to indicate the position in the data stream where another character or sequence of characters should be inserted.
It tended to replace invalid, damaged, or missing characters in the data stream. For example, when transmitting text between different systems, you can replace some characters that cannot be correctly interpreted. Here the SUB symbol comes in handy. It indicates the position where an appropriate character should be placed.
As for modern computer systems and applications, the U+001A symbol is rarely used there. The main reason is because other mechanisms and protocols are applied to process and replace incorrect characters.
Like many other control symbols, it has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of the Substitute symbol as the abbreviation SUB (Substitute) — [U:241A].

View File

@ -0,0 +1,7 @@
The Escape symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment. Its main role was to input control sequences and change the operating modes of devices.
It also served to provide an additional level of control over devices and programs. Instead of using separate control characters for each command, ESC could be combined with other characters to create sequences that represent more complex commands or functions. This allowed for a wider range of control commands to be processed, especially in terminals and text editors.
When it comes to modern computer systems and applications, the U+001B symbol is still actively used there, especially in the context of terminal control, terminal emulators, and some text editors. For example, ESC is often used to switch between normal and visual modes of operation in the Vim editor.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of the Escape symbol as the abbreviation ESC (Escape) — [U:241B].

View File

@ -0,0 +1,5 @@
File Separator was used in computer terminals, printers, text processing systems, and telecommunication equipment. Its main purpose was to indicate the boundary between files or data parts within one stream, especially when they were transmitted or saved on a single physical device. It also allowed programs to determine the boundaries between different files or data blocks and process them accordingly.
When it comes to modern computer systems and applications, the U+001C symbol is rarely used there, since other mechanisms and protocols have been employed, such as file systems and specialized data formats. They can process files and determine data boundaries too.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of the File Separator as the abbreviation FS — [U:241C].

View File

@ -0,0 +1,7 @@
The Group Separator symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment. Its main purpose was to indicate the boundary between groups of data in a one stream.
It also served to separate groups of information in data streams. It allowed devices and programs to determine the boundaries between different groups of data and process them accordingly. This was particularly useful in processing structured data, which consisted of multiple groups or blocks of information.
When it comes to modern computer systems and applications, the U+001D symbol is rarely used there. It happens so because now other mechanisms, protocols (markup languages such as XML, JSON) and specialized data formats are used for the same purpose.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Group Separator as the abbreviation GS — [U:241D].

View File

@ -0,0 +1,7 @@
The Record Separator symbol was used in computer terminals, printers, text processing systems, and telecommunications equipment. Its purpose was to indicate the borders between data records in a single stream.
It was also employed to separate information records in data streams, allowing devices and programs to identify the boundaries between different data records and process them accordingly. This was particularly useful for processing structured data consisting of multiple information blocks, such as tables or lists.
In modern computer systems and applications, the U+001E symbol is rarely used because other mechanisms and protocols, such as markup languages (e.g., XML, JSON) and specialized data formats, are used for these objectives.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Record Separator as the abbreviation RS — [U:241E].

View File

@ -0,0 +1,5 @@
The Unit Separator symbol was used in computer terminals, printers, text processing systems, and telecommunication equipment. Its main purpose was to indicate the borders between units of data in a single stream. This allowed devices and programs to determine the boundaries between different units of data and process them accordingly. It was particularly useful in processing structured data consisting of multiple units or blocks of information, such as rows, columns, or fields.
When it comes to modern computer systems and applications, the U+001D symbol is rarely used there. It happens so because now other mechanisms, protocols (markup languages such as XML, JSON) and specialized data formats are used for the same purpose.
Like many other control symbols, this one has no visible representation and doesn't occupy a lot of space on screen or in typed text. However, there is a separate symbol in [BLOCK:control-pictures] representing the graphical image of Unit Separator as the abbreviation US (Unit Separator) — [U:241F].

View File

@ -0,0 +1,25 @@
[b]Space[/b] is basically the amount of free place between letters, separating words in a text.
First scripts were pictographic or ideographic. Each symbol stood for a word, and it wasn't necessary to separate them. Since alphabets got introduced, reading a monolith text became inconvenient. That's how new special symbols appeared, the purpose of which was to divide words. The symbol of Space, now considered standard, wasn't a one-day invention. As for Latin and Greek scripts, Space had already been used there for around one thousand years. However, Cyrillic script took a bit longer to adapt. There it got employed only in the XVII century. Speaking of Arabic, spaces appeared there a bit later, in the XX century.
In addition to this special symbol, word separation can be indicated in other ways. For example, using special letter forms for the end or beginning of a word. In the Arabic alphabet, several letters exist in four different forms of writing (for the beginning, end, middle, and separate forms). Although Arabs use spaces, letters still have different forms. Another alternative is a line above the letters. The words themselves are written without spaces, and the line is interrupted. In some writing systems, it may be that not words, but phrases, sentences, or syllables are separated. The true space is used in almost all modern writing systems. In Thai, only sentences are separated by spaces.
Unicode has several types of spaces. For example, there is a [U:00A0 non-breaking space]. Also, several space symbols are located in the [block:general-punctuation punctuation marks] block.
More symbols for word separation:
[U:00B7] Interpunct. Latin. Used until the 600-800s.
[U:1039F] Ugaritic cuneiform.
[U:103D0] Persian cuneiform.
[U:12470] Assyrian cuneiform.
[U:1361] Ethiopic.
[U:1680] Ogham.
[U:1091F] Phoenician.
[U:0830] Samaritan.

View File

@ -0,0 +1,15 @@
[b]Exclamation mark[/b] indicates emphasis on some important or emotionally charged information. According to one theory, the current form of this symbol has come from the Latin exclamation which meant joy (io - hooray). "I" began to be written below "o", and later the image became less sophisticated.
This mark is usually put at the end of a sentence. However, in the Russian language you may come across it in the middle of a sentence. It can be put in brackets (which means something like "attention!" or "I'll be damned"). Ih this case, the emotional emphasis applies only to some part of the sentence - a specific word or phrase.
You can find more Exclamation marks here: [U:203C] [U:203D] [U:2755] [U:2757] [U:2E18] [U:FE57] [U:2755] [U:2763]
Other similar characters:
[U:00A1] Spanish, known as the "Inverted Exclamation Mark". Used at the beginning of a sentence.
[U:055C] Armenian "Yerkaratsman nshan".
[U:07F9] N'Ko script.
[U:1944] Limbu script.

View File

@ -0,0 +1,5 @@
Plus Sign + depicts the operation of addition. It looks like a small cross vertically standing in the middle of the line. Similar Unicode characters are [U:00B1], [U:2213], [U:2795], [U:2A72].
"Plus" is translated to Latin as "more." The first use of this symbol was noticed in 1489. It was Johann Widman who mentioned the plus sign for the first time in his commercial treatise to illustrate some kind of increase.
You can find this symbol in Unicode here [BLOCK:basic-latin]. Other mathematical operators are located in the following blocks: [BLOCK:mathematical-operators] and [block:supplemental-mathematical-operators].

View File

@ -0,0 +1,25 @@
[b]Comma[/b] serves the function of separating parts in a sentence. The European form of the comma developed from the symbol [U:002F] which was previously used for a similar purpose. The English name "comma" is derived from the Greek word κόμμα, meaning "cut-off" or "short sentence." Commas first appeared in Russian texts in the 1520s.
Other symbols that function as commas include [U:0315 combining comma above], [U:2E32], [U:2E34], and [U:2E41].
There are more symbols that serve as commas in other writing systems:
[U:060C] Arabic.
[U:3001] Chinese and Japanese.
[U:055D] Armenian.
[U:07F8] N'Ko.
[U:1363] Mongolian.
[U:1808] Manchu Mongolian (Old Mongolian).
[U:A4FE] Lisu.
[U:A60D] Vai.
[U:A6F5] Bamum.
[U:1B5E] Balinese. It's called "karik siki" and it is placed before and after a number to separate it from the text.

View File

@ -0,0 +1,35 @@
[b]Full stop[/b], end of sentence. Probably, the oldest punctuation mark. It was used back in the 3rd century BC. In different periods of time it had various locations. For example, different authors could put it at the bottom of the line, at the upper border or even in the middle. It's interesting that in the Russian language this symbol has a prototype - a cross. Such cross was usually put where the writer stopped writing (and left for a break). Apart from that, full stops tended to indicate word shortenings.
You may find other full stop symbols in the following scripts:
[U:0964] Danda. Indian writing systems, Devanagari script.
[U:0589] Verjaket. Armenian.
[U:0020] Space. Thai. Words are written without spaces.
[U:3002] Chinese, Japanese, and Korean.
[U:06D4] Old Arabic.
[U:2CF9] Old Nubian.
[U:0701] Old Syriac.
[U:1362] Ethiopian.
[U:166E] Canadian Aboriginal syllabics.
[U:1803] Mongolian.
[U:2CFE] Coptic.
[U:A4FF] Lisu.
[U:A60E] Vai.
[U:A6F3] Bamum.
[U:083D] Sof Mashfaat. Samaritan.
[U:1B5F] Karik pareren. Balinese.

View File

@ -0,0 +1,4 @@
The Arabic digit zero is one of the numerals that is widely spread around the world. This positional system for writing numbers originated in India in the 5th century or earlier. It was around this time when the concept of zero was adopted and the digit 0 was created. The Arabs borrowed it from the Indians. Al-Khwarizmi wrote a book called "On the Indian Calculation," which helped to spread the use of Arabic numerals. Later this counting system came to Europe through Spain. Pope Sylvester II advocated for the replacement of Roman numerals with Arabic ones in the 10th century. In the 12th century, Al-Khwarizmi's book "On the Indian Calculation" was translated into Latin, which played an important role in the adoption of Arabic numerals.

View File

@ -0,0 +1,2 @@
The Arabic digit one belongs to the numerals which are widely spread around the [URL /en/1F30D/ planet]. This positional counting system appeared in India in the 5th century or even earlier. The Arabs borrowed it from the Indians. Al-Khwarizmi wrote a book called "On the Indian Calculation," which helped to spread [URL /en/collections/arabic-numerals/ the Arabic digits]. Later this counting system came to Europe through Spain. Pope Sylvester II advocated for the replacement of [URL /ru/collections/roman-numerals/ Roman numerals] with Arabic ones in the 10th century. In the 12th century, Al-Khwarizmi's book "On the Indian Calculation" was translated into Latin, which played an important role in the adoption of Arabic numerals.

View File

@ -0,0 +1,11 @@
[b]Colon[/b] is a punctuation mark that connects parts of a text logically. It usually indicates enumeration or direct speech. As for the Old Slavic language, the equivalent of [U+003B] is a semicolon. In some languages (such as Swedish, Finnish), it is also used to shorten words.
Symbols in other writing systems:
[U+1365] Ethiopic.
[U+A6F4] Bamum.
[U+1B5D] Carik Pamungkah. Balinese.
[U+2024] Armenian. Also used as a semicolon.