IT:AD:Data/Regarding Data, Clay Tablets and Camels
- Records are not the Object in question.
What you persist is a record of the transaction. The hieroglyphics are …just marks on a clay tablet. But they ain't the real deal:
<div tip“
>
C
eci n'est pas un chameu1).
But no matter how much importance one attaches to the marks, the fact remains that the mark is not a camel. The mark represents the camel. Conceptually.
So …no matter how much the DB marketers of the 80's messed up most developers mind as to what they were selling “Ceci EST le chameu” and got developers to write apps around the concept of trading clay, rather than camels (much like the real fake market of “Ce papier EST du pain”2) ) you have to know the difference. You are working with Camels. As real Entities. Worry about the persistance issues secondly.
* You retrieve Storage, but work with Models. Programs work with representation. We get the Tablet from the TabletStore, but we don't want to use that. We need a Camel. So we go to a Wizard who Factories up a Camel. And uses the facts on the Tablet to create the Camel. So if the Tablet Says [Sale#:123, Brown Camel, Left Foot Lame], the wizard creates a new Camel(), and waves his wand till the Camel is ugly and lame. He then gives you the Camel, and you both throw away the Tablet, just in case you forget and he creates the same camel twice and the market gets confused as to which damn camel we're talking about.
The difficult part her for newbies is not realising that the world is self-interested, and that fat little polyester-suited computer sales persons have been messing with their minds for a while, and will do a lot to convince developers that because they have made their Tablets as easy to use as a Camel (even easier! No poop!) that developers should forgo going to the wizard (cause wizards are fast, but still take time, you know), and just use the Tablets dirrectly. Yeah…well…not only do you get Vendor lockin, but you also run into other problems…because often the Tablet has a fixed shape, and sometimes you need a little bit more.
DataBases are not InfoBases
As an aside, but it's important when talking about data, lots of data, as in databases, and what we could do with it, is……..Data ain't Information.
A spreadsheet is data. The number at the bottom, your negative balance is Information.
The Information you want to is usually a Projection (a subset, or a composite, or a subset of a composite) of the original Data/Tablet.
For example, when reading a contract, you can skip right past the boiler plate intro, the legaleze footer,and hone in on the basics of the contract (“One camel to ahmed in return for some time-sharing with his hot slave for siz months”), but you probably also need to refer back to the some other documents of state (“Some acts even when in the privacy of the harem are not permitted by law/religion/etc…”), so to act on the contract, you need a composite of Tablet A and Tablet B, unless you want to live ignorant, dangerously, and probably don't understand the local meaning of getting stoned, yahman'.
### The ViewModel is not the Model Camel case aside, consider the following 3 Business Models: * Person = [ID, First, Last, Sex, Age, Girth] * Address [ID, PersonFK] * AltNames = [ID, PersonFK]
One generally needs to be refer to a person' information presented as a composite ViewModel (Person + Address + etc., all in one View).
But the ViewModel that backs this View is not a good way to look at developing the Model either…
The confusion will arise from some developers in that they think they want ViewModels. And want all those attributes, in one model. About where you are. Cause “All those Attributes! The power! The power!”
Life is not like that. Let's put it into a language you feel more comfortable with
A single person might believe that the use case requires a ViewModel of [Id, SS, First, Last, NickName, BankAccount]
Later, he starts having a girlfriend, and ammends his attributes to [Id, SS, First, Last, NickName, BankAccount, SecondaryForHerBankAccount]
Then, he marries, and is very confused for a while and [Id, SS, First, Last, NickName, BankAccount, OurBankAccount]
Then after some bitching, and a divorce, appends it to [Id, SS, First, Last, NickName, WhatsLeftBankAccount] and her attributes shift to [Id, SS, First, Last, NickName, NotAsMuchAsIThoughAccount]
The point is…this is the result of pretty poor planning. It's pretty obvious that with a mature normalized model of the real world, it really could be abstracted as:
Person [ID, PrincipleFirst, PrincipleLast] * 1001, John, Smith * 1002, Betty, Boop
Bank [ID, Name] * 2001, Chase * 2002, BNP
BankAccoutns [ID, BankFK, PersonFK, Balance, Name, Notes] * 3001, 2001, 1001, -123.45, “Income”
OrgIDs [ID, PersonFK, Name, ID] * 4001, 1001, “SS”, “123-45-6789”
The Projection of this Normalized Data, is the Information ViewModel…get it?
Regarding Structured Data
But it's so structured!
Yes it is. Life is like that. Try flying un unstructured plane or driving an unstructured car, or get money out of an unstructured bank account.
The great feat of humankind is to correctly wrest out of chaos the names of things and their set of attributes. Once named, they can be equated. Thank you Aristotle.
Now. We haven't been able to organise everything. 2 millenium on from there, and still further to go. That's probably cause, emotions are unstructured. Women defy containment. And new things come up all the time.
But the heck if I'm willing to make everything back to chaos, because I don't know how to define it.
It's like the universe. I have the known, the discovered, the galaxy stars close enough to know for sure their composition, etc. and then I have…the uncategorised. The too far out there for our current telescoes, what-nots. Yet. The blurry dots. The faint pulses of radio signals….
And at that boundary that's where I switch database system. From a structured RDBMS Set-Theory based one, to an unstructured database…
You can't have everything in one package/DB package. Doesn't exist. Or at least, I have not found the mythical hybrid3).
### Searching, Deep Learning, etc. But what about those “Instant Search algorithms” that will search through everything and find what I need! If I could just use any old attribute, and have instant searching…we'd be done, right?!? Not exactly. They are brilliant. Much like humans…they see patterns in what has not yet been categorised. And that's fantastic. But..frankly, if they are as good at pattern matching, rather than formal laws, as humans, might not be so good. We might end up with the Crusades of the Database, because one database sees a pattern that doesn't coexist with another database's pattern, and they both have to logically prove to themselves that they are right, by deleting the other db's erroneous data, to ensure they are logically consistent…
Seriously, fast, smarter and smarter searching, is not a replacement/solution for disorder. Retrieving Index values is fast. Searching is slower. Index what you can, search through what you can't figure out how to categorise.
Forgot to add that Data = Data + Schema.
Your idea that Schemas start small/unknown and “Grow” until stable is nice correct and life like. Again, RavenDB to your hearts content. I just see that it defers till later the codying of any logic….And then because all the first records had no values you have to do a LOT of checking.
VALUE OF CAMEL = + IIF (CAMEL.TEETHQUALITY,0) + IIF (CAMEL.HAIRQUALITY,0) + IIF (CAMEL.HUMPSIZE,0)
And you keep on having to add work-arounds for unexpected edge cases. ie, it's not Logical, it's RuleBased. Sounds like most community bylaws and their exceptions…
That's fine, it actually makes sense, just more complex to maintain, slower to run. Everything is a tradeoff.
Being life-like is interesting. It governed our thinking through much of the past, and we tried really hard to make planes with flapping wings.
Turns out that being non-lifelike is more practical in some cases. It's a Fixed Wing, doesn't have the infinite flexibiltiy of nature, it's not a copy of the shape of the world, but it's good at modeling the requierd functionality world.