FRIHOSTFORUMSSEARCHFAQTOSBLOGSCOMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Are we the Message?





cpatnaik
This could have gone into the Science section, But Philosophy might be better. This is from a post I made a some time ago on my blog.

Originally Posted here. http://www.marlinspike.in/2012/11/the-medium-is-message-are-we-same.html

Do share your thoughts

Quote:
The Medium is the message - Are we the same?
Recently Hitachi unveiled what it claimed was million year storage media. essentially binary encoding micro-etched into a quartz crystal which is readable by a microscope.

There are a couple of problems, the crystal in itself will need to be protected for a million years, and a microscope protected (or invented again).

A truly million (or even a billion) year storage would be a living organism in itself. It propagates itself. DNA is the carrier as well as the information being transmitted.

The more you think about it, the more logical it seems that this is a good solution. The sum of knowledge encoded in our genes and developments in the environment triggering further revelation (insight/genius or whatever).

Which makes you wonder, if, (Paraphrasing Marshall McLuhan) we ourselves are the message.




MOD - Quotes added; material you did not write, or previously published in another location, should be quoted and sourced. Please refer to the forum rules here: http://www.frihost.com/forums/vt-13011.html

- Ankhanu
Bikerman
Nice distraction for a few minutes but not a serious hypothesis.
Firstly the notion that information in encoded in DNA is problematic. To explain that fully would require more time than I currently have so I'll just flag it and move on.
Secondly, the whole point of information storage is to be able to retrieve the original information at a future point. DNA mutates and, over time, encodes a 'different message'.
Thirdly, the phrase 'we are the message' is trite in a way best illustrated by referencing the late great Douglas Adams.
The Answer is 42 - now go work out what the bloody question is.
cpatnaik
Bikerman wrote:
Nice distraction for a few minutes but not a serious hypothesis.
Firstly the notion that information in encoded in DNA is problematic. To explain that fully would require more time than I currently have so I'll just flag it and move on.
Secondly, the whole point of information storage is to be able to retrieve the original information at a future point. DNA mutates and, over time, encodes a 'different message'.
Thirdly, the phrase 'we are the message' is trite in a way best illustrated by referencing the late great Douglas Adams.
The Answer is 42 - now go work out what the bloody question is.


Interested to hear your thoughts on the first part. Do take the time out to comment.

As to your stated objections. Mutuation may be allowed by the designer to allow for the life form to survive changing/evolving ecosystems. The Message could be hidden lower in the DNA system. In terms of retrieval. I mentioned in the piece that it could come as insight (Newton's realization about gravity, Einstein with GTR etc). The Assassin's Creed series put tech behind it (Though the transfer mechanism is flawed in the sense, that memory transfer stops at the point of conception)
Bikerman
'Designer'? That would require DNA to be designed. That is a religious argument that I have no desire to waste time on, since it is generally dishonest (all the Intelligent Design proponents that I have debated proved themselves so dishonest that debate was neither possible nor desirable).
The notion that 'insight' is built-in in order to retrieve some original information seems to be fanciful to the point of silliness and is, in any event, the sort of pseudo-scientific notion that I find irritating since it cannot be tested in any way (the proponent will simply say 'ahh...the fact that x happened was part of the original design').
Finally. the point that there may be some 'deeper layer' is a standard appeal to ignorance fallacy.
cpatnaik
Bikerman wrote:
'Designer'? That would require DNA to be designed. That is a religious argument that I have no desire to waste time on, since it is generally dishonest (all the Intelligent Design proponents that I have debated proved themselves so dishonest that debate was neither possible nor desirable).
The notion that 'insight' is built-in in order to retrieve some original information seems to be fanciful to the point of silliness and is, in any event, the sort of pseudo-scientific notion that I find irritating since it cannot be tested in any way (the proponent will simply say 'ahh...the fact that x happened was part of the original design').
Finally. the point that there may be some 'deeper layer' is a standard appeal to ignorance fallacy.


First a disclaimer: I am not the least bit religious.

I understand that this reeks of Intelligent Design etc. But the point remains. Without the divinity aspect, one can argue that science can advance sufficiently to create life and hide a message.
Indi
It might be theoretically possible to use a life form - a species even, or possibly a whole ecosystem - as a storage device. However, if you're going to do that, you'd need a lifeform very different from anything we know. You would need it to have built-in mechanisms to detect and correct mistranscriptions of the data, which effectively entirely rules out evolution.

A life form that exists as a data store and can actually evolve would effectively need two sets of "DNA" - one would describe the life form itself, that can adapt and change as necessary according to evolutionary laws... but then the actual stored data would have to be be stored separately from that, because you don't want that data to be part of the grand and undirected experiment that is evolution. That's nothing like life as we know it, and i'm not even sure how that would be theoretically possible in practice.

Basically, you're talking about two contradictory goals. On the one hand you want a species that will survive basically anything, basically forever, which requires the species to be utterly adaptable... on the other you want to saddle the species with a crapton of dead-weight baggage that it has to truck around but cannot change, discard, or use. It's like saying you want to build the ultimate adaptable and reconfigurable vehicle that can survive any terrain and even easily adapt to entirely new challenges... then saying "oh, but i want to stick a big-ass billboard on the side to advertize the sponsors". One of those two constraints has to be sacrificed.

The bottom line, though, is that all the life we know know of - including ourselves - cannot be an information storage medium. There is no life form that we know of that trucks around out-of-band data in its genetic makeup - all of their genetic information is (theoretically) available for encoding, mutation, or selection. (While it is true that something like 98% of the human genome is non-coding, it is not actually "junk DNA" as it is popularly called - a lot of it does have functions, and its mere existence to provide structure may be important; around 80% of the DNA preserved unchanged since we diverged from mice ~70 million years ago is "junk DNA", so clearly there's a reason for keeping it around.) All of it is fair game for evolutionary processes. There's nothing in our makeup where information is stored that is inviolable or maintainable (ie, it can be garbled but there is an error-correcting mechanism).
Bikerman
cpatnaik wrote:
First a disclaimer: I am the least bit religious.
I understand that this reeks of Intelligent Design etc. But the point remains. Without the divinity aspect, one can argue that science can advance sufficiently to create life and hide a message.

I'm choosing to believe you missed the word NOT from your disclaimer by accident rather than design Smile
Basically I have little to add to Indi's post, so I'll elaborate on why DNA cannot be treated as a simple information storage system (this is a mistake common to many creationists apologists who then go on to give ridiculously large figures in improbability sums which are entirely misleading, as, of course, they are designed to be).
Information storage is measured in simple binary digits (bits). The temptation is, then, to look at DNA as a simple sequence of triplets (codons, where each codon is a sequence of 3 bases from a possible 4). That gives a bit value per condon of 4^3 = 64. There are around a billion condons in the human genome so it would be trivial (and wrong) to say the genome can encode 64 billion unique patterns. Whilst this is correct in strict informational terms, it is wrong in the context used. The reason it is wrong is because DNA encodes amino acids and there are only 20 of those. The amino acids themselves are combine into proteins and the laws of chemistry define how that can and cannot be done. The simple information storage paradigm is, therefore, not useful.

Creationists will often go to make ridiculous statements such as 'the DNA has more information than all the libraries in the world', or even 'there is more information in the DNA than atoms in the solar-system'*. Both are laughable nonsense. The information can be calculate as above quite validly (since now we are talking about how many bits would be required to uniquely encode the DNA, rather than extrapolating to say how that in turn encodes humans). The answer is simple - 20 amino acids = 4.3 bits (log base 2). 4.3 x 1 billion condons = 4.3 billion bits = 513 Megabytes. So you could easily store the human genome on a PC hard disk.
cpatnaik
Thanks, this is proving interesting.

Good catch bikerman. The missing NOT was indeed inadvertent.

Indi. My point is that message need not be very long. It could be a pages worth or a book. Which ofcourse begs the question. How important can a message be, that someone will got through so much trouble to encode it and encase it in a lifeform.

The secret of life (duh). An Easter Egg (so to speak). A warning. etc.
Indi
It's not a matter of how long the message is, it's a matter of preservation and recovery.

It is possible to use DNA as a data storage method, and you can store megashitloads of data in DNA. And DNA is pretty freaking resilient, too - we've recovered long DNA strands from species that died out hundreds of thousands of years ago. Scientists have stored huge amounts of data in DNA - like whole books and images and audio - and recovered it.

However, they were using plain old raw DNA. They were literally just packing the data onto a DNA strand, then reading it off later. That DNA did not contain the "blueprint" for an organism, just the data. And that DNA wasn't copied billions of times or transmitted across generations.

Now it might be theoretically possible to squeeze a message onto a DNA strand that is meant to code the "blueprint" of an actual living organism, somewhere in between the areas that actually code the organism itself. The problem is you can only get away with that for a single generation, or a very small number of generations, and maybe not even that. Because once the DNA starts replicating, replication errors start cropping up. Human genes, for example, have about 3 billion base pairs, but each human cell has - on average - 150,000 errors in the DNA in that cell... and we have a lot of cells. DNA is very robust about correcting errors, but obviously - as cancer teaches us - not perfect. In fact, it's perfectly imperfect - we all will get cancer, if we live long enough, because there are so many errors in DNA replication that it is inevitable. There is a second level of error correction at the species level - if the DNA gets totally messed up, the organism can't survive to pass it on (which is why cancer doesn't usually get passed on genetically)... but that doesn't help at all for protecting the sanctity of any data stored in the DNA that doesn't code the organism's development.

In other words, if you try to sneak a message into the part of an organism's DNA that doesn't code for the organism's development, you can't be sure the message will survive a single generation of that organism (assuming an organism of average complexity). If it gets garbled, there's nothing stopping it from being passed on anyway. There's no error correction mechanism (other than the most basic DNA error-correction, which is very flaky).

Okay, you say, so then you'll try to sneak the message into one of the coding sequences of the organism. That way natural selection will be on your side, and if the message gets garbled, the organism won't live... only the organisms that carry a proper copy of the message will survive and reproduce, so the message stays intact.

Clever, but now you're into an entirely different kind of problem: your message has become indistinguishable from the medium. In fact, there is no more message. To understand why, we have to take a detour into information theory.

In information theory there is a concept called "Shannon entropy" that measures how much information can possibly be stored in a data stream by measuring how much can be extracted. Without going too deep into the maths, the theory is basically this: for there to be a message in a data stream, there has to be some element of "surprise", which requires some element of regularity... but not too much of either. Imagine a stream of completely random bits - you can't possibly extract a message out of that, because it's pure static. Now imagine a stream of bits that alternate between 1 and 0 - 1010101010101.... - you also can't extract any information out of that because it's too regular. Now imagine a stream that is supposed to be all 0, but then you see this: 0000100100110001...; the 1s are unexpected (they are "surprising"), but there is still an element of regularity (it should be all 0s), so you could encode a message in that data stream. In technical terms, whenever the probability of the next bit is not completely random (50-50 for 1 or 0) or completely certain (100-0 for 1 or 100-0 for 0), you can extract information from that bit (and the precise mathematical amount of information you can extract is related to the probability).

Now suppose you have a message that you want to store. Suppose you find a place in a DNA sequence where that precise word encodes for some kind of advantageous characteristic. So you stick it in there, and now you can be assured that that message will be safely replicated and transmitted across generations (more or less - even then you can't be sure, but i'll be generous and say you can be "sure enough"). But... hang on... did you really put your message in the DNA... or did you just encode a good trait?

Here's another way to look at it. Suppose you wanted to encode a secret message in the timing of the traffic lights in a city. However, to ensure that no one messes with the message, you make sure it's encoded in such a way that it creates a perfect sequence for the lights to maximize the flow of traffic - now no-one will change it, so you can trust it will preserved. But here's the thing: if you ask someone to look for the hidden message in the traffic lights, they won't see it, because the traffic lights are doing precisely what you would expect them to do. It's like that sequence of 101010101010... the sequence is perfectly predictable, because it is the perfect timing, so there can't be a message in there (or rather, if there is a message in there, it can't be extracted).

So basically, you're damned any way you try to go about it:
  • If you try to store your message in a non-important sequence, it is unlikely to be preserved across more than a handful of generations.
  • If you try to store your message in a sequence that has some importance but make it "stand out", you fugger up whatever the sequence was supposed to be doing, and it is unlikely that your message will survive very long before it gets weeded out by natural selection.
  • If you try to store your message in a sequence that has some importance but make it "blend in" and function well, you lose the "surprise factor" of the message, so it cannot be retrieved.

The bottom line is that the DNA of organisms (at least, all the ones that we know of) already serves a purpose that does not allow for extraneous payloads of out-of-band data. You can't secrete more stuff in an organism's DNA and expect it will be reliably reproduced for more than a handful of generations, if that... unless you have added an advantageous trait, but in that case, you haven't added "data", you've added a trait. You'd need some other storage location - you can't use the DNA - but the DNA is the only thing that gets transmitted across generations. So there's simply no way that a message could have been stored in a distant ancestor and recovered in a modern organism - not for any kind of organism we know.

══════════════════════════════════════════════════

Now as for the speculative question....

If a hyper-advanced species wanted to store a message for a future species (or even their own ancestors) to recover, putting it in their DNA would be a stupid idea. However... putting it in DNA in general is not a stupid idea. DNA is an excellent storage mechanism - excellent data density, excellent resilience, and any curious, scientifically literate species would be on the lookout for it, and quite interested to study it. Look at us, for example: scrounging for bits of DNA from dinosaurs or ancient viruses (and damn near desperate to find extraterrestrial DNA, such as on Mars). Frankly, any intelligent species would be much more likely to study a bit of random DNA than they would to place a random crystal under the microscope.

But as i said, trying to encode your message in active DNA - DNA intended to be used to create an organism - would be stupid, and wasteful. All you'd need to do is just make a DNA strand that encodes the data you want to encode - to hell with making it viable, and to hell with even worrying about encoding valid proteins: just use the bases as a base-4 encoding system, and write your data.

Then, just dump it somewhere where the DNA will survive for a million years or so - DNA doesn't normally survive in the wild for more than a million years, but if you choose and set up the conditions you might be able to have it survive for a hundred million years... you can imagine how eager the scientists of a future species would be to study hundred-million-year-old DNA. If you want to set up a beacon so that the DNA will be found, you can use a radioactive signature.

But you know, if you really want to leave a message for a future species, there are way better ways to do it. For example, just pick the most tectonically inactive planets in the solar system - like, say, Mars - and then just blast U-238 into the soil to write out letters a hundred kilometres high that say "Hello!". The radioactivity from the uranium will make it sure to be spotted, and once noted, reading the message is trivial - and with the letters that big, it can't be easily damaged. Or just make a huge-ass billboard - a megametre square - with the message on it and have it orbit the sun; hell, make a hundred of them so you know at least a few will survive a billion or two years.

I don't grok the logic of a species wanting to "hide" a message; for example, in our DNA. That's just toying with us, for no apparent purpose. If you have something you really want to tell us - especially if it's a warning - just make a big-ass space billboard. (Which isn't even that technologically hard - we could technically do it today, even.)
Bikerman
I was just settling down to do my duty here - explaining Shannon entropy, redundancy, error control/correction and self-correction, signal-noise-control ratios, signalling overheads, routing overheads - and the rest of my 12 week Elective Unit (2 credits) on Principles of information theory for the Non Computing/Mathematics Undergraduate....but no need. You have summarized the important bits here.

I felt duty bound - in as much as I have a specialism I guess this is part of it, and I even managed to dig out my lecture notes which have been mothballed for a decade or more.
I wasn't looking forward to it much though, so ta vee much Smile

This was never my favourite unit for the undergrads, I much preferred really torturing them on the Boolean logic, Truth Tables and Elementary Circuit design unit Smile
LxGoodies
If you consider the living entity itself as a persistent container for information, the whole idea is creationism in disguise Rolling Eyes for a message to be enclosed in a lifeform (DNA, whatever), the entity containing it would need to be designed as is in the first place ! All other problems with the idea are described accurately by Bikerman and Indi. Your information would be at risk anyway..

I think culture is more effective as an information container. There are some books around, that have been preserved for centuries without change. If millenia are required though, your data will get messed up by revision and translation. I wouldn't trust human beings to preserve anything.

I wonder how long a classic ROM would last ? non-programmable read only memory ? Provided I still have electricity in the year 3784, would e.g. a Tandy TRS-80 computer start and function normally ?
Bikerman
This is concern with many archives. The promises made for DVDs and, before that, CD's were wildly extravagant and us people in the know never believed them for a minute. I remember the BBC première technology show - Tomorrow's World - showing a chap eating off a CD platter to show how resilient it was. I was watching it with a bunch of techies and we were pissing ourselves laughing at the claims.

Many non-specialists, however, made bad decisions based on the hype. Entire archives were transferred onto digital media which within months began to give problems and within only a few years will be effectively junk. And that is before we even think about whether the technology required will still be around - this is just the physical degradation of the discs themselves.

I know a couple of unlucky buggers who have effectively lost fortunes by believing the hypers 10-20 years back, poor sods.
LxGoodies
Analog is better than digital for permanent storage. Maybe most effective is microfiche, translate them to digital form (OCR) only to make it better searchable..

http://www.bfi.org.uk/news-opinion/bfi-news/bfi-digitises-4m-newspaper-cuttings
Bikerman
No it isn't - that's silly.
Digital stores two values (in most systems) 1 and 0. Analogue stores a huge range of values which make it far more vulnerable to data error.
.
A proper digital system will be MILES better than any analogue system you can think of, in terms of information integrity, speed, accessibility and plain practicality.
A system of RAIDed (nowadays SSD) disc storage for transaction and short term data, coupled to a proper backup regime with off-site storage and redundant processing capability, is the State of the Art - I know, I ran such a system. It isn't even POSSIBLE using analogue kit, much less better.
Don't forget, we have generated more information, digital, in the last 2 decades than in the previous 2 million years of homo-sapiens. A single telescope generates more information in one week than astronomers have in the entire history of observation. Information generation and storage currently increases an order of magnitude each 5 years - and the period is shortening. The LHC generates 1 Petabyte (two hundred thousand DVD's worth) EVERY SECOND. Obviously they have to junk 99.9999% of that, but even then it is staggering.
Indi
LxGoodies wrote:
If you consider the living entity itself as a persistent container for information, the whole idea is creationism in disguise Rolling Eyes for a message to be enclosed in a lifeform (DNA, whatever), the entity containing it would need to be designed as is in the first place !

That's not true. There's no reason i couldn't take my own genes, replace the non-coding and non-functional parts with the message i want to store, and then make a clone with that DNA. (In fact, i think that's literally what scientists did - obviously not using human DNA, of course, and they didn't actually grow an organism out of the altered DNA; i mean they took an existing DNA strand and "tweaked" it rather than making one from scratch.)

I wouldn't expect the information to be preserved for more than a generation or two, for the reasons i mentioned, but i *could* put it in without designing a whole organism.

LxGoodies wrote:
Analog is better than digital for permanent storage. Maybe most effective is microfiche, translate them to digital form (OCR) only to make it better searchable.

I'm afraid you really don't understand the meaning of the digital or analog in this context. The whole point of digital - the reason why every single information storage and transmission system humanity uses, from radio to TV to records, is going digital - is because it is much more resistant to noise and signal degradation. Like... infinitely more resistant to noise and signal degradation (because analog is entirely unresistant).

In theory, with analog you could store a signal precisely... perfectly (assuming the signal's amplitude and frequency are within the ranges capable of being handled by the system). But in reality that's absolutely impossible, because there is no way to prevent noise or signal degradation - it's literally impossible. It's also unnecessary, because everything in the real universe is digital anyway. (Of course, we don't yet have anything that stores/transmits with a resolution on the order of individual photons and Planck units, but that's really not necessary for most purposes. Humans can't distinguish audio frequencies above 25,000 times a second, and i don't know exactly how many graduations of volume we can pick out but i'm going to guess it's less than 4 billion, so 32-bit 44 kHz audio is way more than we'll ever need to store (a single channel of) sound (CD quality is 2-channel 16-bit 44 kHz; 5.1 surround sound is 6-channel 14-bit 44 kHz; the brand-new HFPA initiative is for 2- or 6-channel 24-bit 192 kHz, and people are calling bullshit on that for being too much).

If it weren't for the biological problems that come with replicating DNA, any digital data stored in it could be stored in it and retrieved perfectly. And this has already been done for audio files, video files, and more... but only with DNA that wasn't allowed to replicate. In fact, DNA is seriously being studied as a data storage medium. But just DNA... not organisms.

(Incidentally, just because data has been digitized doesn't mean it's searchable. For example, merely scanning the pages of books would not make the text searchable - one would then need to use OCR (optical character recognition) to "scan" the images and extract the text. Same with audio files: presumably you have a ton of digitally encoded audio files... try finding all the music files that are in the key of A. Not so easy.

What the article you linked to says is that the old microfiche collection has been digitized - for storage and preservation - AND OCR-scanned for easy searching.)
LxGoodies
Of course, it is a combination. AND. The OCR makes the analog info more accessible.

Indi wrote:
replace the non-coding and non-functional parts with the message i want to store

I'm not an expert on DNA Cool always assumed all genes matter together in some way (known or unknown), so thx for the info. Nevertheless, although it may be possible, I won't volunteer to store the Complete Shakespeare in my genes, if you don't mind..

Bikerman wrote:
A proper digital system will be MILES better than any analogue system you can think of, in terms of information integrity, speed, accessibility and plain practicality.

Claiming digital storage is always more rugged/durable than analog I would be careful stating that with so much certainty.. of course you can build in all kinds of redundancy and error checking in a digital reader. But "durable" is not only about the integrity of a medium, it is also the type of format and the type of reader required. Most standard digital formats, like JPG, GIF, DOC, XML etc are not suitable anyway for long term storage, because a single glitch will invalidate the complete file.

Of course you can design something especially for durability. e.g. if you would add checksum and error correction features in an archive file format, to make it more rugged. But when you do that, your future retrieval procedure will depend on very specific devices and specific software. That poses a risk. Trivial aspects like file system formatting and OS-specific partitioning may become a hurdle for accessibility.

When you consider e.g. a simple analogue microphoto, any flat bed scanner with 3300 dPI capability will enable you to digitize and OCR the text on it. For simple reading, a microscope will do, if a page-view magnifier is unavailable.

Digital information can only be retrieved with suitable (digital media) reader equipment, connected with a computer system. Readers may become very costly in the future, when the medium becomes obsolete. As a result, your digital archive needs to be copied to new media, every 2-6 years.
Bikerman
LxGoodies wrote:

Claiming digital storage is always more rugged/durable than analogue I would be careful stating that with so much certainty.. of course you can build in all kinds of redundancy and error checking in a digital reader. But "durable" is not only about the integrity of a medium, it is also the type of format and the type of reader required. Most standard digital formats, like JPG, GIF, DOC, XML etc are not suitable anyway for long term storage, because a single glitch will invalidate the complete file.
How is it that I make 4 specific claims and you say I made 2 which are not in those 4? I never mentioned durability or ruggedness - mainly because they are fairly meaningless in this context.

The stuff I am talking about is what I did at Corning and I'm not saying it was better than any analogue system, I'm saying no analogue system could do it or even get close.
Corning make Optical Fibre. Because of the high cost of laying it vs the comparative cheap cost of buying it, Corning's main legal worry was being sued by a customer for the cost of having to replace faulty fibre. If it was terrestrial then it could be tens or hundreds of thousands of dollars, but if it was submarine fibre - well. think of a number and multiple by 10. For that reason, every detail of every process for every inch of fibre produced had to be stored for 10 years. I co-managed the computer systems which, amongst other things, did that. The amount of data we are talking about was nothing extreme - typically a couple of gig per day - generated by the shop floor computers controlling processing - which were, believe it or not - PDP-11s - 20 of them (anyone who knows their history of computing will know that this tech is antediluvian. Corning bought the best for the job in general and price was not the deciding factor in anything I worked on. It is also no oversight that the main minicomputer (other than the thousands of PCs, Sun workstations and the like) was a 2-bode DEC VAX cluster running VMS over token ring, basically 20 year-old tech even at the time).
We had a complete disaster recovery protocol - and I mean COMPLETE. It was designed to allow the computing facility to be operational within 4 hours of theoretical complete destruction of the entire site, with all computers included. We had a 'Vax in a Van' which had one of three transactional datasets (a copy of the data from the previous 3 working days) and a duplicate DEC-VAX mini, with connectivity to link, via wall connectors, into the site Vax in the event that it wasn't totalled.
We had a complete replica computer facility at a secret location, with a number of terminals, PCs and other critical systems duplicating the live kit. We had 4-hour cyclic backup (standard GFS). 2 to tape, 1 to HDD producing 3 copies - 1 into the onsite fireproof safe array, 1 to the 'Vax in a Van', 1 to another location which I could tell you about but I would then have to kill you.

There were another couple of wrinkles which I won't describe, but it was a classic example of function over flash, task over tech, plan over panic. It made use of no flashy kit, no unnecessary flim-flam, no high-tech fixes and no short-term patches. It was solidly engineered to do a job and it did that job

We did full simulation runs every 6 months and we never dropped a beat (or a byte).

THAT's what I'm talking about. Analogue smamalogue - this was real-world data storage where the stakes were high and analogue could never even be an option. Most serious real-world scenarios are more like this and, therefore, less capable or incapable of being addressed using ANY non-digital solution - it isn't an option.

If your transaction data is already analogue then fine - you can look at analogue archiving. Most of the data we now use and generate is NOT analogue. Making an analogue image of digital source data for archive is not, in most cases, a sensible or even a serious option.
Indi
LxGoodies wrote:
Of course, it is a combination. AND. The OCR makes the analog info more accessible.

No, actually, it's more than that. All the OCR does is make the data easily searchable. What makes it perfectly archived is the digitization.

If you scanned it then OCR-traced it then ditched the digital scans, your data would still be searchable, but now only archived in analog form... so it won't last.

If you scanned it but didn't OCR-trace it, your data would not be searchable, but it would be perfectly preserved... and since it's perfectly preserved, you can always OCR-trace it later. The point is: digitization is what preserves it.

LxGoodies wrote:
Claiming digital storage is always more rugged/durable than analog I would be careful stating that with so much certainty..

I state it with absolute and unflinching certainty. Digital archiving and transmission is infinitely superior to analog archiving and transmission.

In fact, we are now in the "digital age", largely brought about because of the development of digital storage, processing and transmission. I don't think most people realize what a game changer the move to digitization has been. You've probably seen really old movies, or heard recordings that were made back in the early part of the 20th century - fuzzy, crackly, muffled, distorted. What most people don't realize is that that's not actually how those things looked and sounded when they were new.

If you listen to the original recording of The Beatles' "I Saw Her Standing There", you're not hearing what McCartney and Lennon made in the studio in 1963, or even what fans listened to in the 1960s. That song wasn't digitized until the mid- to late-1980s. The original song in its original quality is lost forever... but the version that was finally digitized in the 1980s will never degrade. A thousand years from now, people will hear what "I Saw Her Standing There" sounded like in 198X, after 20 years of degradation... but unless someone builds a time machine, no one will ever again hear it in its original quality.

By contrast, Justin Bieber's first single "One Time" was created digitally. A thousand years from now, people will be able to hear exactly what Bieber heard the day it was recorded... exactly what fans were listening to in 2009. For better or worse.

That is pretty profound, actually. It means that all of history before the 1980s is effectively... lost. No one now or in the future will ever be able to hear or see anything from before then in its original fidelity. Our entire historical record before the digital age is effectively a snapshot of what it looked like in 1981 (or whenever it was digitized). However... from this point forward, all history will be recorded with perfect fidelity.

In other words, people in the year 802,701 will be able to view/listen to anything recorded after the 1980s as if it had been recorded the day before... as if they were right there when it was recorded... however if the want to view/listen to anything from before that period, the best they will be able to do is experience it as it would have been seen/heard in the 1980s. Basically, our perfect record of history will only start with the 1980s. Everything before that will just be fuzzy estimation. We are the first generation that will be perfectly immortalized in record, thanks to digitization.

LxGoodies wrote:
Most standard digital formats, like JPG, GIF, DOC, XML etc are not suitable anyway for long term storage, because a single glitch will invalidate the complete file.

Of course you can design something especially for durability. e.g. if you would add checksum and error correction features in an archive file format, to make it more rugged.

Ah, okay, first of all... those things you have listed are not archive formats... they're transmission formats (except for .doc, which is a format for easy editing - again, not for archiving). They have checksums and error-correction mechanisms built in... and that is what causes them to go belly up when there's an error - they're designed that way. The idea is that if you detect a bad checksum you can request retransmission. A proper archive format would be designed so that corruption of a small part of it won't cause corruption of the whole thing - and with redundancy and error-correction built in. (More realistically, though, data to be archived would just be stored in a natural, obvious format, and the storage will be designed to be redundant and error-correcting.)

But more importantly, i think you have a deep misconception of what "digital" means. It has nothing to do with computers. The only reason digital is usually mentioned alongside computers is that computers are the best way to process digital data. But computers are not necessary at all.

"Digital" just means quantized. People are digital. You can have a group of 7 people, a group of 8 people, a group of 9 people... but you can't have a group of 8½ people, or a group of 7.6352 people. Money is also quantized in practice - you can have 24 cents (two dimes and four pennies), 25 cents (a quarter), 26 cents (a quarter and a penny, or two dimes, a nickel, and a penny, etc.)... but you can't have 25.3 cents in your pocket. (And, in fact, in Canada they've abolished the penny, so you can't have 24 cents or 26 cents either - you can only have 20 cents, 25 cents, 30 cents, and so on.)

The reason why this is superior to analog is that if the data gets "fuzzy" for some reason, so long as the noise doesn't cross a certain threshold the data can still be perfectly recovered. Imagine you had a list of prices (digital, quantized to one cent) and weights (analog, recorded as precisely as possible):

Code:
$1.47    7.9882
$1.43    6.29051346
$0.76    0.5589223
$1.30    4.2176
$1.18    9.5285146


Now we add some noise - just a tiny bit, like a few tenths of a percent:

Code:
$1.47075045678749    7.99040733338128
$1.43032107344977    6.29294350177107
$0.76107597482284    0.5602684280423
$1.30010586564292    4.21886684887051
$1.18151336093668    9.5297974961316


Now how do you recover the original values? Well for the digital values that's easy - they're quantized at 0.01, so just round off to that quantization. So long as the noise is less than the digital quantization level, the digital data can be recovered perfectly (all i did to get the data below was just round off the money values to 2 decimal places):

Code:
$1.47    7.99040733338128
$1.43    6.29294350177107
$0.76    0.5602684280423
$1.30    4.21886684887051
$1.18    9.5297974961316


But how do you recover the analog data - ie, the original weights? You can't. It's literally impossible. Without actually knowing the original values or the precise values of the noise, there is no algorithm you can run to recover the original data. The original analog data is lost forever.

That is why digital is better than analog. Noise, wear, fading, damage... all of these things will happen to your data, but if the data is digital and so long as the effect is less than (one half of) the digital quantization level, the original data can always be recovered perfectly. With analog data, any noise, wear, fading, or damage will become part of the data forever - it can never be removed - which means the data is irrevocably changed, the original lost forever.

So all you need to do is store your data digitally in a way that whatever noise, wear, fading, or damage it picks up stays below the quantization limit, and it will always be perfectly retrievable - fresh as the day it was stored. With analog data, that's impossible - every time analog data is read or copied, and even if it's just sitting around, it will degrade.

LxGoodies wrote:
When you consider e.g. a simple analogue microphoto, any flat bed scanner with 3300 dPI capability will enable you to digitize and OCR the text on it. For simple reading, a microscope will do, if a page-view magnifier is unavailable.

And a single grain of pollen or a scratch the thickness of a human hair will wipe out entire paragraphs of data. Or even just the simple fading of time will render the film harder and harder to read, until finally the information is completely irretrievable.

LxGoodies wrote:
Digital information can only be retrieved with suitable (digital media) reader equipment, connected with a computer system. Readers may become very costly in the future, when the medium becomes obsolete. As a result, your digital archive needs to be copied to new media, every 2-6 years.

Incorrect. Again, you seem to think that digital data means computer data. It doesn't. You know what punch cards are? They are digital storage devices. In fact, they are binary digital storage devices. On the card is a grid, and some of the grid locations have holes punched out - the pattern of holes (0) and non-holes (1) is digital data. You don't need "reader equipment" to read that. You can use your eyes.

In fact, to illustrate your confusion, let me also point out that you can read a CD with a microscope, too. Yes, literally. A conventional CD is just an aluminium disc that has been dented. The dents are 500-600 nanometres wide, 800-3000 nanometres long, and 150 nanometres deep, spaced 1600 nanometres apart (the image also shows the dimensions of the dents on other types of optical discs). With a powerful enough microscope, you can literally see the pattern and read it. They're called "optical" discs for a reason. A future historian could easily just take a CD, scan it with a powerful microscope, note the pattern of dents, and reproduce the data perfectly. (Or, more plausibly, trivially build a device that would do that automatically, and convert it to whatever technology they use.)

The only reason CDs (and other conventional optical media like Blu-Ray) are not good for long-term storage is because the dents are stress points, and they're small enough - and the material is stiff enough - that over time the stresses will relax and the bumps will smooth out. Natural degradation of the material (like chemical reactions) will also eventually make the bumps unreadable. Theoretically, if you used a really good material - like gold, which is mostly non-reactive - and pressed it in such a way that the stresses are relaxed, and didn't ****** around with the disc too much, a CD could last hundreds of years. (And yes, you could read it long after CD players have gone extinct, using a microscope.)

If you're truly paranoid about data loss, you could store digital data on punched mylar tape. The old standard gives you about 1 8-bit byte per 0.1 inches of tape, so you'd need 2.5 km of tape to store a megabyte... but that's not really impossible. It's sold in rolls of 1000 ft (300 m), 28 per case - that's about three and a half megs per box. If you printed your data on punched tape and stored the tape well, it could be read by eye - or a machine to read it could be trivially rigged. Even if there's some kind of apocalypse that sends us back to the Dark Ages technologically, the data can still be easily recovered.

Another option is to etch a really hard material with tiny marks, and preserve that carefully. There are already prototypes that would store hundreds of terabytes on crystals the size of your fist, and they would last forever. And the data will be recoverable as long as we have eyes (and microscopes).

Bikerman wrote:
If your transaction data is already analogue then fine - you can look at analogue archiving.

No, i disagree. If you have analog data, you should digitize it immediately, with the best resolution technologically possible. analog archiving is a waste of time. It won't hurt to hold on to the analog originals, but you must digitize it ASAP.

If better digitization technology comes along years later, and the analog original is still in good enough condition that digitizing it produces a replica at least as good as the older digital scans, then immediately redigitize. If the analog original has degraded so much that even if it were redigitized using the older tech it would produce a replica inferior to the digitally archived one... well, too bad - you're stuck with the older digital scan as the best record you'll ever have.
LxGoodies
I know what a checksum does.. I am aware of error correction in floating point tables, I've been a programmer for 28 years. what I meant with "a single glitch" is the effect of single byte loss in compressed files like GIF or JPG, or rigid formats like XML and ZIP, or FAT. It will be disastrous and very difficult to restore. Of course, there are correction schemes etc, storage can be redundant inside the data file or outside (by e.g. using Raid-sets) but still you'll need a reader, disk drive, whatever equipment. And read a JPeg file from a CD with a microscope ? dream on. A microscope is an analog device. Indeed, you could be right about the far future, where advanced image recognition will be available for that, but for practical purposes, you'll need a CD reader for CD's a disk drive for disks, etc. Which are devices that will be obsolete in 5-10 years from now. As you pointed out already, CD's themselves are not durable.. they were sold wityh that story.. but they've never been durable. You could put your digital stuff on tape as you explain, but I don't believe it can be read by eye in any way. Paper tape bytes are mostly grey-coded, to prevent easy breaks. The medium itself.. it better be good ! As science fiction fans we all know what happened to our precious books only <100 years old. They just crumble and vanish.

I'm not a dinosaur. Bottom line, for me too: whatever you store, create digital backups. We agree on the usefullness (and urgency !!) of digital backups of analog data. But I do not believe that digital storage - on media we use now - will last very long. I regard it as cache. It will have to be maintained: copied and copied and copied.. to make sure it can be read and its (mostly very high-density) media will stay intact for future use. Meanwhile, the analog originals are kept safe from scratches using glass plates.

Last month, I visited the university I studied ca 30 years ago. The library provided schematics, electronics on microfiche.. periodicals, curriculum accounts etc.. much of it is pictorial information that is digitized right now, because only recently, we have cheap TB-storage devices to contain high resolution pictures with schematics.

In the mean time, what they did (ca 1985-1990) is NOT to trust digital storage media of that time.. NOT invest in huge manual digitizing projects requiring zillions of CD's.. they just installed a pick and place unit for the microfiches, enabling library visitors to reference these analogue media digitally, on demand. First, a cache (disk) is consulted to see if there is already a digital version in store. If not, a robot will select a fiche for you, it is picked out and scanned on the spot.. cached.. and sent to your PC over a LAN connection. Then you can store it locally. Thirty years ago we only dreamt about such efficiency.. and soon, that 1990 system will be obsolete too.. because the server will have a 30TB raid with everything digitized on it. The microfiches originals can be safely put away.. and the original-originals on paper.. will vanish quicker than the microfiches.
Related topics
Avatars and Signatures for sale!
Church Jokes for hump day
Comment poster votre message et en quelle langue
Message won't leave that dreadful "outbox"
what does these message means....?
World's Smallest Website...
how do u attach an application to an email message?
Post this message for f$
n00b help: phpbb message board
MESSAGE BOARD??
FOUND NEW HARDWARE message wont go away
How to stop "Script Debugger" Error Message in IE
Message boards
Private Message Notification
Message board
Reply to topic    Frihost Forum Index -> Lifestyle and News -> Philosophy and Religion

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.