In Section 2.2, I wrote: “Information is not data, but the source from which data can be obtained. We produce data by determining that something is the case or is not the case, that is, by making decisions.” But are all decisions that something is so (and not otherwise) already data? Obviously, decisions need to be anyway represented by symbols, e.g., by digits or characters, as it is with birthdates, addresses, or measurements.
Suchlike ‘classical’ data are typically gathered in tables, lists, or registers, and for that purpose they need to have a certain (definite) form or structure. It data shall be handled by machines. such a definite form and structure is even more important because the machine does not understand the meaning of symbols, but handles them only on the basis of their forms and structure, that is, the machine works purely syntactically (not semantically). Therefore, form and structure of data for machine processing are strictly regulated by a set of rules called syntax, analogously to the syntax of a natural language. The syntax defines how data must be constituted in term of their forms and structures, i.e., as information, so that they are ‘wellformed’, and a machine can handle them without problem.
For processing in computers, data need to be digitalized, that is, completely transformed into sequences of digits. Famously, not the decimal system but the binary system comprising only the digits 0 and 1 is used for this purpose. Each 0 or 1, that is, each bit is equivalent to a Yes/No decision. Thus, classical data are transformed into digital data by many Yes/No decisions on how to represent them by a sequence of zeros and ones.
We can generally say: Data are decisions represented by symbols of a natural or formal language. These symbolic representations need to be well-formed in a defined manner such that humans and/or machines can handle them efficiently and without problem.
If the term ‘data’ is defined as I do above, then a consequence is that the term ‘sensory data’ is wrong. We don’t perceive data by our sense organs, but only information. Data can arise from this information only by an act of recognition or understanding, and this act always implies a decision: That’s Paul (and nobody else); that’s blue (and not green); that sounds like an ouzel (but not like a nightingale).
In Section 2.3, I said that transmission and storage of information takes place frequently in the non-living nature as well. Transmission and storage of data, by contrast, requires the processing of symbols. Speaking, for instance, is not only to send out information (a certain sequence of sound waves), but also to send out symbols (phonemes and words). Speaking is: sending out data.
However, these data are not received and processed as data in a human brain: Sense organs and then the brain only receive and process information. Thus the sound waves must be understood as phonemes, and the phoneme sequences as words, which requires knowledge of language. Only these acts of understanding turn the information back into signs, and with that into data. Computers, by contrast, process symbols immediately without understanding – the algorithms, the rules how to do this were implemented by humans who understand the symbols.
However, are humans not also able to process signs without understanding their meanings? Yes, they are – it’s just what John Searle described in his famous Chinese Room story [1]. But even there, Searle in the room needs to decide, on the basis of English instructions and by comparing only shapes of Chinese symbols, which is the correct Chinese ‘answer’ to an incoming Chinese ‘question’. And since Searle is a human, he can make mistakes – someone who doesn’t understand Chinese may sometimes confuse Chinese symbols. Thus, even if Searle attempts to strictly follow the rules and instructions, he nevertheless has to make decisions which also can be wrong. So Searle does the work of a machine, but he isn’t a machine. A machine doing this work wouldn’t decide by itself, and if errors occurred, the producer or the user of the machine, or a defect by attrition would be responsible.
A historical example of purely syntactic data transmission by humans (who transmitted signs without knowing their meanings) is the semaphore telegraph, a system of conveying messages by means of visual symbols, using towers on which stood pylons with pivoting arms (‘blades’ or ‘paddles’). The operator had to watch the neighboring tower through a spyglass, to recognize the changing positions of the semaphore arms there, and to exactly reproduce them at his own semaphore. For doing so, he didn’t need to know the meaning of the positions, and in fact he often didn’t know them when secret messages were transmitted.
The operator on the tower reproduces only the information (formed by the position of the ‘blades’) without necessarily understanding the symbols. He does, in a sense, the work of a machine, but he isn’t a machine, he is a human, thus he has to decide each time: yes, it is this (and no other) symbol that I have to reproduce. In these decisions, errors can occur, e.g., with poor sight or by distraction of attention. Correct decisions are facilitated by the limited number of valid positions of the semaphore arms making them well distinguishable.
After all we can say: Data processing is a special kind of information processing, namely the transmission and storage of Yes/No-decisions by means of symbols, or in short, the processing of symbols. Different form machines, humans are able to make decisions by themselves, therefore humans, but not machines produce data. On the other hand, humans (different from machines) cannot transmit data immediately: A human receiver must always decide on the basis of the incoming information (about what the information means for him) – even if this is sometimes facilitated by the formal uniqueness of symbols as with the semaphore and in other cases [2].
Machines, different from humans, transmit data, i.e., Yes/No- or 1/0-decisions immediately. We can even say that machines are generally, by their nature ‘decision transmitters’ and ‘decision stores’: Machines are artifacts with movable elements, in which the movement (e.g., rotation) of one element results in the movement of one or more other elements. where not only energy is transmitted, but also decisions on time, direction, speed, and duration of movement. All these decision were made by humans, either directly, e.g., by cranking, pedaling, pushing a button or key or indirectly by the design and/or program of the machine.
Design and program of a machine are decision stores. This statement is rather trivial regarding a program, but also a design, e.g., of a gearing comprises many decisions on how movements between elements (shafts, wheels, levers, etc.) shall be transmitted, if and how speed, rolling direction, or other parameters shall change. The decisions once made by the designer or programmer are executed again and again by the machine.
Gauges play a particular role in human data production. All tools and machines are extensions of human abilities, and so are tools and devices for measurement: They extend our ability to decide. We already make a decision by defining or choosing a scale; for example, we decide on temperature data by comparing the extension of the measuring fluid, e.g., mercury with the Kelvin- or Celsius scale. The information about the temperature, that is, the extension of the measurement fluid, is as it is, but the numeric values, the data are different depending on our decision.