Writing a data conversion

Why is it important?

Dataconversion is an important subject for data integrity. When you create a new field, remove a field, rename a field or let it have a different meaning, the database will not know this.

Adding a field will result in a generated updatescript where fields are added, removing a field will drop the field. This sounds easy, but if the application was relying on information stored in this field, the information needs to be moved. This is called a data conversion.

Novulo generator will recognize some situations. If you rename a field and keep the other parameters the same (field_type, id) it will recognize that it is the same field and just rename it. If you remove a field and the re-add it, the id will be changed. If it has the same name and the same datatype, the generator will assume that it is the same field. For tables, it will look at all the fields. If the fieldnames and types are consistent, it will assume you renamed the table or cut it out of component A and put in in component B and just rename the table (if applicable). If you cut the table and then rename fields and add new fields and the fieldtypes and names no longer match, the generator will drop the old table and create a new one. In doing this, all data from the old table will be lost if we do not add dataconversion! And you couldn’t quick fix it during the System health check, as it is possible for some expression errors.

How and when to write dataconversion?

  • When there occurs a drop (table/field);
  • When the meaning of a field changes (it is now an Enum instead of an id-reference or it was a number field and it is now a text field);
  • When the default value is not always the wanted value after adding a field;
  • To initialize a cached field. Doing this in an update script saves loads of time!

What do we want the SQL to look like?

We like it to be readable and debuggable. Please start with a comment about what you are trying to achieve:
– initialization of cached field timeslots.has_open_spaces()
– dataconversion for the dropped field journalentries.entry_date → journalentrylines.entry_date
– only set editable when not finished for new field invoices.is_edable

Then follow up with something like:

UPDATE jel
SET jel.[entry_datetmpfld] = je.[entry_date]
FROM [journalentrylines] jel
LEFT JOIN [journalentries] je ON jel.[journal_entry] = je.[id] -- we assume all jel have a je

See how we added a comment to represent our assumptions? This makes the dataconversion readale and debuggable by future you or someone else. It is good practice to capitalize SQL words that are not data, also to create a more readable expression. Same goes for newlines and alignments.
The we add around table and column names to make sure we don’t trigger SQL keywords when we mean a fieldname like je.[id] and j.[type]. If you type this in an SQL editor with syntax highlighting on you will see the wordt “type” light up when you type it. Also we prefix our columns with the table. It is not technically necessary, but I assure you that if someone else needs to debuug the query they will not know which of your 5 JOINed tables would be the onle one to contain a field named “assessment_score”

Usage of JOIN vs APPLY

SQL has 4 types of JOIN: LEFT JOIN, INNER JOIN, RIGHT JOIN and OUTER JOIN. Leaving the prefix away, just JOIN will result in an INNER JOIN. Please always remember to add the specific type.

SELECT 1
FROM [servicerequests] sr
LEFT JOIN [productions] prod ON sr.[production] = prod.[id]

Will result in the same as

SELECT 1
FROM [productions] prod
RIGHT JOIN [servicerequests] sr ON sr.[production] = prod.[id]

In our dataconversions we only use LEFT JOIN and INNER JOIN.

When we have a many-to-many relation, we use APPLY. CROSS APPLY is related to INNER JOIN and OUTER APPLY is related to LEFT JOIN. CROSS APPLY will result in a subtable, which can be queried later. For our purposes: if you would like to JOIN a subtable or a TOP 1 of someting, please use CROSS APPLY or OUTER APPLY.