Data vs. Information

analogA1

Updated: Jun 30, 2024, 6:02 AM | 0

When a game of chess is played on here, data is generated from all the moves that were made on the board. In my games, the data captured from them will tell a funny story.

My games are short, obvious, and don't have a very captivating plot to them. I'd even go as far as saying they're primitive sometimes. We're talking 5-minute blitz games with an average of about 25 moves each. The story from these games might be better classified as a comedy sitcom because they are guaranteed to have one or two sudden surprises, like a crazy bishop suicide across the board in exchange for a pawn, that will make your eyes roll if you're any kind of a serious chess critic. It's from the unconventional and dramatic moves I like to make when surprising my opponents, and from the absence of excitement that I might feel. But combine all the games in my chess game history together and what you'll find in the data is not just a pattern of comicazi techniques and train wrecks, but rather the making of a brand new season of Seinfeld.

This site uses an algorithm that collects data from each game played and runs an analysis on that data to return an accuracy rating. That accuracy rating (from what I understand) comes from comparing the known textbook moves that should be made when pieces are in certain positions to the moves that were made. Those known moves are probably stored somewhere in a massive chess database that is probably accessed quite frequently judging by all the active users and games that are occur on here. In my case, if all the combined data from 5,000+ games were processed strictly for the output of an overall accuracy rating, that finding would prove how someone should *not* play a respectable game of chess. This is just what some of us thrill-seekers do whenever we have an urge to receive satisfaction from cognitive visual learning, instead of things like bungee jumping or getting chased by a cop, as colorful examples.

This revelation isn't data that's revealing it to you, it's information. Data was captured at first, but information was created from it later on by pulling it all together. Information therefore is all the data needed for collection that is processed and transformed and presented back as something for a useful purpose.

There is a difference between the two terms and I'm not splitting hairs about it either. I know this because it was actually taught to me.

Years back during graduate studies, when I was first introduced to the discipline of computer science, I completed an assignment that compared these two concepts and how they were different. I never honestly knew there was a real difference between them until this assignment was given to me. The assignment compared their relevances at different points in their lifecycles and how they originated in systems that were either closed in isolation (bound locally to hardware) or open over a network (interoperating with other systems). We had to write our individual responses in a class discussion forum in the university portal so that others could express their viewpoints on the topic. We collaborated on it and only had about a week to compete it once it was assigned. The grading for it was largely based on how many meaningful responses you received that originated with your starting thread.

You know how students are. You had some that did the absolute bare minimum and made random stuff up, while others actually put in the time and effort taking the assignment seriously. I was somewhere in the middle and I tried to post relevant feedback for others based on what I understood or found interesting whenever someone made a good point.

It wasn't really an inspiring assignment to be honest. Back then, I remember being more interested in the telecommunications aspects of things since that was my major. I also remember being distracted from it because I was having a mild panic attack wrestling with the core concepts of object-oriented programming and trying my hardest to understand Java nested for loops in a programming course I was taking that same semester (I don't possess a natural-born computer programming personality so it was all new ground for me). But the highlights of the assignment were to point out how the two terms were misused in the field and to separate their purposes for a better understanding of what they each accomplished in their rightful states.

To get a little technical, data was described to be the fundamental (binary) ones and zeros that represented logical values for events that took place in a system. Data was generated from conditions that algorithms waited for whenever those specific events took place. And for me, as a graduate student being introduced to this academically for the first time, learning about data seemed so mysterious because of its intangible properties, like an apparition living inside your physical hardware. It was there, but you couldn't see it. You could hear it being written to disk from the sounds of the needle scrapping the spinning disk, but if you uninstalled the drive and looked at the disk with your naked eye you'd see nothing. More interesting was that data carried a cost, because it occupied space, but its existence carried no weight. Additionally, the cost of owning data itself was insignificant to the potential value it could bring when it accumulated in a system over a period of time.

On the other hand there was information. Information was different and was not to be used with the term data synonymously. Information required more things to happen first before data could be recognized as information. Information always served a human purpose and it often needed a data collection process that involved translations, conversions, formatting, a secondary processing procedure or multiple, and/or some combination of each before it could be presented and categorized as information.

In terms of the time it took to be made, data was instantaneous and very close to system events occurring a hot hop off the physical circuit board, whereas information took longer to generate and occurred elsewhere in the application stack. In terms of relevance, the creation of data was almost always relevant because that's what the algorithm was designed to do, but there were times when data was created that didn't always serve a purpose. Information was always made to serve a purpose and was made with an intent to give greater understanding and context behind the events that took place.

In essence, data can live in the present more than information can, but only information can make use of the past to make predictions about the future. And because of this philosophy alone, information was placed higher up in the value chain for human consumption.

I sometimes wonder what the real motivation was behind the assignment. I mean, was it really necessary to have all new students in the graduate program I took learn to recognize these subtle differences immediately up front before diving into real technical substance? Or was it just that unapproachable odd-ball professor being overly pedantic about the subject that wanted to prove a point? I don't remember any other assignments that this professor gave out in that class either, and I'd have to dig in and find my transcripts to see who it was who actually taught it. But I am glad I did it.

Other than a thesis paper and a few other 10-page writing assignments I had to complete later for the curriculum, there were no other comparable assignments like this one that left a mark the way this one did. Lately I've found myself reflecting on this now 22-year-old assignment a little more to balance some of the more sophisticated concepts being introduced with distributed computing systems in AWS cloud, like containerized applications launching from a S3 bucket using Lambda, or when scrapping the surface of GenIA when it comes to workloads using Bedrock and Sagemaker for specific use cases in machine learning.

If more awareness needs to be known about this topic then at least I can say I made a small contribution towards it on here. It's okay to laugh at my chess ranking but that's not a guarantee that I'll be an easy victory. I'm full of surprises.

Data vs. Information

A1 Aaron's Blog