ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

SQL / SQLite for Beginners - Tools, Tips, Tricks, and Traps

(1/2) > >>

ital2:
EDIT June 5, 2017:
Title change and intro
(Previousoriginal titles: "SQLite, SQL, SQLite Expert", then "SQLite, SQL, SQLite "Expert" (ha, ha)", for lack of warnings when stacking up database updates which in combinations can get dangerous and which you would do one by one if you did them by hand)

This thread does NOT intend to replicate the usual sql intros, but wishes to give additional practical advice for beginners those intros' authors did not think about.


Original post:

First, some quite obscure trap for beginners, stackoverflow helped me to find that out.

For many a command, not using quotes (single or double) will work fine, so you will not discover it's faulty syntax.

for example
select * from tablename where somecolumnname = somestring
will work fine as long as there is no column (another column) which is called/homonym to somestring

example:
select * from A where T = V
will work fine and display all records where there is a "V" in column "T"
IF there is not another column which by any chance is called V

IF there is, SQLite will simply do not show the resulting records, no warning, no error message, nothing, so when you've got some other columns and your requests work fine in general, it'll be probably a lot of time before you realize SQLite will not display many wanted query results.

You know ask why would anybody call many columns by single characters, and also many string values by single characters, to begin with? That's right, but I had just a few 1-character columns, incl. "C" for Code, with 1-character codes in it, for several, different todo-categories, just 6 or 8 characters in all, all single ones, and so I wanted to have it neat and without sacrifying space, and so I called that "Code" column "C".

This works fine, except for my code "h" in column "c", since I also had another 1-single column called "h", and without the quotes, SQLite is unable to differentiate column names from values if they are identical, even if you would have thought that the "where c = h", by its syntax, must have told SQLite the first is a column and the second is a value, but no.

Thus, it's absolutely necessary to write in the official syntax

where "columnname" = 'value'
which is double quotes for identifiers and single quotes for strings/literals
(if at least it was the other way round, I could memorize that better...)

in order to avoid lots of problems which, as the above use case has shown, are far from obvious and could have you get incomplete results without even discovering the problem.


EDIT June 1, 2017: Title change; see next post


EDIT / Add-on June 5, 2017: Null
The web abounds of discussions if you should use null values or empty strings; I won't add to add to that discussion but rather want to indicate some more beginners' traps caused around Null/empty values; btw it's of interest that Oracle's and FileMaker's sql flavors which both try to do away with the Null problem both don't succeed at it, so switching over to them will not allow you to ignore what it's all about.

When I imported my csv data, I did that, as I said further down in this thread, in SQLiteBrowser 3/DB Browser for SQLite (they don't know how to name their tool, it's not 2 different tools here), then opened my databases in SLite "Expert" (see next post), since the latter's "Personal" edition doesn't allow for import or export (but you can do some select, then ^a, ^c, then treat the resulting csv clipboard by macros, for example for printing or for exporting the data into anything else).

So the first problem described here may result from this 2-step setup. Anyway, blank fields from the imported data were zero-length/empty strings, while the same for records I then created in SQLite "Expert" (fields I simply left blank, in newly created records) were/are Nulls. If you don't know databases at all, you will think that's the same thing, database experts will laugh about what I describe, but for database beginners, it's important information.

This also implies two things: You should start your database experiences with data you intimately know, and/or you should always set the option "display Nulls" to "on"; both measures will help with at least becoming aware there are problems. Since, and that's the problem, your selects will not display the expected results if you don't  always cope with the " where x = (or <>) '' [these are two single quotes, not one double one!] or where x is (not) null" problem you will have got now.

Also, traditional logic is different from SQL logic, and you must become aware of that phenomenon, too. To start with, we use the (possible) column setting "No Nulls allowed here" for "in no record, this field must be left empty", but in the SQL 3-tier-boolean world, this just means "absence of Nulls", not of empty strings; similar for boolean fields (yes, no, null (allowed or not)) and for numeric fields (some number, null, or, very bad in most cases, some "special" value for "no number here" from developers wanting to avoid Nulls at all (and in case very high) cost.

Also, you should google for "not in vs not exists", and even when you think you are aware of the problems and do it all correctly, you probably get wrong query results again because of your having left out the fact that Null is NOT the opposite of "True"/"Yes", and that this logical sql truth extends to any field type in sql.

Many authors insist on the fact that Null stands for "unknown"; since we also and even mainly use it for "no value"/"field empty", and the meaning of "unknown" is rare, we may overlook the fact that as for the sql logic, these authors are perfectly right. And we fall into the trap of writing "where x <> y" instead of writing "where x <> y or x is null": In our logic, Nulls are excluded anyway there, in sql's logic - which we must observe when we want to get correct query results -, it's "who knows about these Nulls?" instead, so they have to be mentioned separately in your query in order to deliver correct results.

Somebody in the web wrote something along the lines of, "Which consumers/non-professionals still access data directly, nowadays?" - well, it's the traps in sql which discourage people to delve into sql, since they prevent them from getting the expected results. It's correct that you would have to delve deep into sql in order to gain some real expertise in it, but some advanced facts must be known to the beginner, too, from day one; it's not like playing the piano where you safely attack some piece of Kinderszenen without needing to know anything about the intricacies in Kreisleriana.

Tuxman:
Was that a question?  :huh:

widgewunner:
Thanks for the public service announcement! - I am new to SQL and am using sqlite2 in a couple small projects and this info is indeed useful/helpful to me.

Thanks again!

ital2:
Then, a word on SQLite Expert Personal

As I said in the "Navicat Warning" thread, I recommend this free software, it's really pretty. It has got some bugs, but not of the data-destroying kind, just minor window problems and such.

I said that I use it on my pc, but I left out an important info: Since it's the free version of a paid program, it will not import data, so as it is, it would have been unusable for what I'm trying to do, translating some organized text files (and also csv-files from ListPro) into databases / flat database tables (flat to begin with, looking out for doing it perhaps more to the regulations later on if I find the time).

So I use SQLite Browser / DB Browser for SQLite (free, they don't know how to name it, anyway, for XP I use version 3 of it since 4 will not work there): Import from csv-file into table, then save the new database. All other things I then do in SQLite Expert Personal. csv means comma-separated, but obviously a comma as field separator isn't the best alternative, so I use tab-separated, but it's with the .csv suffix anyway.

Since in SQLite Browser / DB Browser for SQLite, I create the database, for example cds.db (from cds.cvs), I can then open my CDs-database within SQLite Expert where I format the columns, and so on.

Now my explanation why I just don't buy SQLite Expert since I really like it a lot. Well, my psychological loss is too big. Last year in summer, I had already looked out, superficially, for SQLite frontends, since my problem of having data in text file, as explained in another thread, is a persisting one, and then I already had had the idea to do something about it.

It was 60$ lifetime, which was an absolute treat, and I promised myself to buy that fine program as soon as I would need it. Now it's 100$ with 1 year of updates, which means instead of costing me 60$ plus VAT, this same program (btw, the price rise came for the "old" version, not even a new major version in-between) would cost me hundreds of $$ over the years. Of course it's all my fault, and I don't blame the developer, but since I lost the lifetime option, now paying a price almost double of that one, for just the current version, plus update costs again and again - it's unacceptable for me at least for the time being where SQLite Expert is a fine SQLite data browser (I like it a lot more than SQLite Maestro, same price, so you see where the hike had its idea from), but where for example there is no graphic designer, let alone that imaginary piece of joy I've just described in the "Navicat Warning" thread, and there are no stored queries yet, which are available - as a graphic designer, but not a good one - in SQLite Maestro, but as said, stored queries in SQLite Maestro cannot be triggered from the tree entries but must then be triggered by an unnecessary intermediate step.

SQLite Expert is in continuous development (got just another minor upgrade), so it's very evident it will get at least some of its missing functionality later on, and with no doubt, people who didn't come too late for the life updates, as I did, will have got an absolute treat. (It's been very unfortunate that the price hike and the dismissal of the lifetime license came both at the same time - together with the almost 50 p.c. rise dollar vs euro, that all combined is not too bad for a developer who's in the euro zone...


EDIT May 25, 2017
You will find more info upon SQLite Expert, also in comparison with some competitors, in my May 25, 2017 add-on in my thread about trial conception ( https://www.donationcoder.com/forum/index.php?topic=43835.0 ).


EDIT June 1, 2017 - SECOND Edit for the day
Destroy Your Data with SQLite "Expert" (ha, ha)
You will be interested in naming your columns in some standardized way, between your different databases, in order to greatly facilitate your management of pre-defined sql queries, so for example you will use C1...C4 (see my edit of the post below) for (possible) hierarchy, and even when you only got one code for the time being, that column should be called C1, not just C, in order to standardize your sql selects as far as possible. Also with "Titles", "Items" and such, just name it "T".

The same with "Authors" and "Artists" and such bad column names, just call that N for Name, and the Christian/First name should be FN; if you call these columns N1 and N2, you will have the logical problem that when your sort, you'll either sort by N2 first, then by N1, or you will have those columns in the wrong order on your screen (if you want to have the traditional set up with first name first, then family name).

So I lately did a lot of column renames, in order to standardize my tables / databases, and my selects - they all are somewhat different, but they should identical in all parts similar.

Then, in one of my tables, I made a big mistake, with SQLite "Expert" (ha, ha), by switching the names of two columns, then triggered "Apply": This, as you will already have guessed, immediately deleted thousands of names "N", putting the FN into N, and leaving the FN column blank; of course, this happened without the slightest warning.

My solution: I had the Ron's Editor csv file from yesterday or the day before, so I dumped a copy of that (as text) into Beyond Compare, the later-on-deleted column set to the end of the bunch in order to be able to visually compare in BC; I also dumped the remaining database into a csv and put that into BC, again as text (BC data compare may function with additional settings; out of the box it did NOT align the datasets); then from the differ I updated the the original csv from the partly destroyed newer one (not in BC but in Ron's Editor), then with the updated original csv I built up the new database: Two hours for nothing, but imagine the loss if I hadn't had a recent csv of my data available.

What do we learn from this: Do daily database backups. Do an additional backup before doing any database re-"design". Don't assume software which calls itself an "expert", is an expert, but assume it probably misses relevant security code analyzing the possible interaction of several steps which will be done from what it lets you "combine" in some task list*, and by which it will possibly destroy your data without even knowing.

You know the principle: switching names a and b means renaming a to c, then b to a, then c to b; SQLite "Expert" (ha, ha) obviously thought it could make without the additional step. If I had done the renames "by hand", by sql renames,
- I would have made a backup first, since I would have trusted myself less then I obviously trusted this "Expert" program, and
- I would have made the right steps in the right order since I'm aware of the problem.

In SQLite "Expert" (ha, ha) I relied on the software to make those steps, in the right order, and which failed, but I made my choices within some tasklist where you can ask for all sorts of changes, and then the program, upon "Apply", will execute these tasks one after another, it seems, and where the SQLite engine itself will not stop the execution, by its own internal safety routines, it's obvious this program proceeds, without taking account possible unwanted interactions between the various commands in the line: There is no safety procedure implemented BEFORE sending the commands to the engine, like you would do in your head, by doing it manually.

It would be of interest to send such requests from within other SQL frontends, in order to check if they come with such checking routines or don't either. This comparison would only be valid for different frontends but the same database format (here SQLite), since, as said, it's perfectly possible that some SQL engines have some of such routines in-built so that any frontend applied upon them would work fine, either by applying the necessary additional steps or by rejecting the commands it cannot faultlessly process in that context.

And what can developers learn from this? FIRST exterminate such fatal bugs (here: not checking if any rename was in fact a switch: recursion probe: if the commands are sent one by one to the database engine, how could that discover such a problem? So it must be done by the frontend, which also triggers the necessary intermediate steps/commands.), THEN triple/quadruple your price if you really think you must do so (here: from 60$ with lifetime updates to 100$ with 1 year's updates).

ital2:
Here: The "like" query for strings as a real alternative for not-too-extensive databases

classic sql selection:
select * from tablename where "x" = 'y' -- display all records where column x has got the value y

"like" syntax:
select * from tablename where "x" like '%y%' -- ditto for the string y anywhere within the value in column x

It's known that simple databases - we're not speaking of full-text search here, which is available in SQLite - are not perfectly suited for text search, the "like" clause is said very slow. This certainly is true for big databases, but for thousands of records on an old xp pc, even "like" will bring the results in less than a second, so it's perfectly adequate.

To give an example, my list of CDs:

ID (SQLite ID number)

Code/Category

Range (bear in mind CDs are physical stuff, so they only do have one place if not copied, no physical CD in more than one physical range, so the "where to find" info is not redundant even though most of the time (only), Category and Range are identical; in pop, categories are by countries and then in case decenniums; in jazz, Category is the instrument of the main artist),

Artist (can be several, for jazz in particular; it's just my classical music CDs that I have got in a different list, the main difference being the fact that the composer column is the main field over there, where pop, jazz, and so on are sorted by artist)

Year (not of a single CD but year of birth of the artist or year of creation of the group)

Titles (which means different CDs, not titles on a given CD, that info being on the sleeve; it's just in certain cases that I put exceptional titles in parentheses here, in order to know where to look for them; also the year of the CD is in parentheses if I took the time to look it up from the sleeve or from elsewhere)

From the above it becomes evident that my data isn't in perfect database format since as said, some records can have several artists, and it's similar for the titles: I put some solo artists together with their respective (main) groups, and then, in the titles, I put some additional info; it's evident I need (and have) links.

Instead of lengthy explanations, I just put an example here, with the query and its immediate result; from this it becomes clear that a simple (and not too big) SQLite database, with systematic "like" instead of "=" query, is also suitable for text data which is in a less-than-perfect form, and where to bring it into such perfect form would not make too much sense, because of too much work, for not really much more effect in practice (left out fields without interest here):

select * from cds where "artists" like '%stefani%' or "titles" like '%stefani%'

1)
[Artist:] No Doubt
[Category:] A8 [this means Pop, USA, Eighties]
[Year:] 87
[Titles:] (Gwen Stefani 1968, Eric Stefani) a) Rock Steady, b) Gwen Stefani: The Sweet Escape (2006), c) Gwen Stefani: Love. Angel. Music. Baby. (2004)      

2)
[Artist:] (Gwen Stefani see No Doubt )
0 [default when not applicable]
0 [default when not applicable]
[Titles: empty]

From the above, it becomes evident that it would be very welcome to have some input device (macro tool) which would allow for typing the search string just once and then put it into both query statements. The same macro would also apply to queries for titles, or, for speed reasons, you would apply a second macro, with just one query statement (the string just being searched in titles).

A macro may be easy on the pc, it should also be available for Android and/oder iOs/iPad, I hope. Here again, if you make the slightest typo, most of the time you will just not see the wanted results, without any error dialog, for example if you write "artist" instead of "artists", so doing this by some macro tool is not only a question of convenience, but of avoiding typos which may leave out wanted records.

Also, if I put my links to artists from the "artists" column into the "titles" column (which should be quite easy), a simple query would be sufficient in these searches for an artist also belonging to some group, but it would not cover cases where, as said mainly in jazz, there are main, and "secondary" artists, the "secondary" being just the ones by which the CD in question is not filed. As you can imagine, in jazz, it's not so neat at all, but in my text files, every important artist is either named in the artist or in the titles field, so that with a query over both columns, they will be retrieved reliably.

In other words, in many practical, private or office, use cases of not-too-extensive data collections, "like" is the solution to use a light database instead of a text file: on a modern pc, some 20- or 30,000 records should deliver almost instantaneous results, and that without enormous adaptation work beforehand which could often not be justified in terms of time investment.

P.S.

Tuxman: Any statement is an implicit question, too, for people who'd be able to add significant info/advice.
widgewunner: Thank you!

EDIT:
Just imagine music group names with or without the leading "The": Who; The Who; Who, The? If you all made it neat instead, that and similar ones will be additional problems; ditto for names in an office environment (but where most of the time the databases are very big and thus name search should be made possible by "="):
- prefix before first name
- first name
- prefix before last name
- last name
- suffix after last name
Not speaking of the multiple variants, according to country, too, and then even according to existence/absence/combination of these elements, how these elements are then to be combined differently (!) in the address field and in the salutation.


Edit June 1, 2017
Writing from some more experience with transposition from text data to database data:

As said, you'll need a RegEx editor, and you'll need to know how to do RegEx replace with it, and then you'll be able to achieve about 80-90 p.c. of the necessary translation work by automated means. It's also necessary to have a RegEx editor which allows for details, for example for only replacing the very first ":" in each line, instead of further ones, too; this avoids a lot of manual work after visual check.

As for the result in the database, you will want the family name of the very FIRST author/artist/etc. of probably several ones as the START of your author(s)/artist(s) field since you will want, for example for print-outs, to be able to SORT by them ("order by..."); for just FINDING authors/artists/etc., you'll do as explained above ("like" instead of "="). Thus, you will need to do some macroing in order to seperate the first from the last names, which means in practice that you clip by the last space (before the first comma if there is a comma) in the field, except when there is some "di/von/de/of", by this allowing for more than one first (or middle) names.

For example, when there is an institution as the author, this will not work, so you will have to do some visual checking afterwards, for this like for everything else, but here again, it's a very good idea to combine several tools; for example, visual checking is much easier within a csv editor (I use Ron's Editor for that, but do NOT try to SORT with that, it's bug-ridden in this respect but the developer is not interested in de-bugging it; at least he does not react to mentions of those bugs) or even the target database, but the editing should be done by RegEx, within your editor or such (2-screen or big-screen setup): At this stage, you'll use the database just as a visual checking tool but you work on your editor csv file, which means you'll create the same database several times, up to it being almost perfect, then only NOT going back again to other tools for editing.

Before the final version, all your fields will be in varchar format and without size limitations, since all too often, depending on your original text file and/or your editing ("|" or something else as field separators) in order to distribute your data into the respective target fields, SOME contents of for example fields 6 and 7 will be within fields 7 and 8 instead, and since thus things you can quickly rectify in a csv editor (at least in Ron's Editor), you'll do your editing in some systematic way, sometimes even doing changes in the csv editor, then saving it for further editing in the text editor for RegEx, and so on - mixing things up will end up by inadvertantly discarding corrections already done.

Instead of buying a csv editor and/or learning RegEx, you probably can do a lot of things within Microsoft's Excel, but you'll probably get a double-quotes problem which can be avoided otherwise; anyway, you should replace all double-quotes with single-quotes in your original file(s) to begin with, before getting to any editing in the copied version of the original file. Btw, since your editing will probably have unwanted effects sometimes, you should do some sort of versioning (between your visual checks in the csv editor or in Excel or in the database), and of course you should store your original source file in a safe place as they say.

As a general rule: If your editing had unwanted effects on SOME records / datasets / text lines, rectify them manually at once, before doing other automated editings; if they had such unwanted effects on a lot of records, though, go back and devise some other (/ some other RegEx) strategy.

And: Many users don't use text files for data storage but Excel files to begin with; it's evident that they then will get all the advantages of a database (SQL queries, incl. standardized ones and ones for filtering), even out of their flat files (before, or even without ever normalizing them), without all the semi-manual fuss described above, so to store half-way standardized files in Excel instead of in text files seems to be a very good idea: The field format is common to spreadsheets and databases and thus forces you to a much better standardization from start on then the text format does, so you'll get much less "micro-architectural" variants than if you use text files, even if you had thought you had been entered your data in a standardized format in there.

As for "sorting fields", I mean fields FOR sorting here, it's evident you can easily establish a flat tree structure, by (creating, and then sorting by) fields like C1/2/3/4, for Code1..., for categories and possible sub-categories, while the latter can remain empty when not needed.

But as implied above, databases, even "primitive" flat ones (and Excel, for such things), only get "interesting" when you're able to trigger technically complicated sql (or other) selects very easily; in other words, you need 1-key(-combinations') "stored searches", but with variables for the key terms; it's surprising that current database frontends' developers didn't grasp that so that you have to write your own macros for it.


Edit June 2, 2017
Printing ("Reports")
I didn't look into the various "Reports" facilities of paid/trial frontends; from previous experience, I would assume most of them are not very flexible so that you must accept what you get; for example, from Navicat, you can buy some extra "Report Generator" so that it's safe to assume the report functionality of the Navicat frontends is very basic; also, there is some famous generic report generator I don't actually remember the name of and which isn't cheap either.

Thus, printing from some database frontend does not seem to be easy, but it is, with some macroing. First, you must identify how you want to get the records; for example, you want the first record separator ("|" or other) of each record to be translated to ": ", the second one to " - ", the third one to a ":" but with a code for printing bold, the fourth one to be translated to a comma and a code for end of printing bold, the fifth one should be a code for newline, etc. - it goes without saying that we're speaking of database data and thus we're speaking of perfect standardization of the records.

I've spoken of this problem before, you need a RegEx implementation which allows for differently treating the first, second, third... occurence of your field separator in each line. In practice, this means you need an implementation which allows for treating each line individually, and then, instead of trying bombastic macro code, you just set up specific RegEx commands for the specific field separators 1, 2, 3..., and you run the first RegEx to the first occurrence of the "|" (or whatever) in each line, then the second RegEx to the first (!) occurrence... and so on: Since each RegEx replaces the first "|" in each line, the next run will automatically replace the next occurrence.

So you must find a RegEx implementation which allows for treating each line individually, but in bulk of course, and which allows for treating just ONE occurrence in each hit region (here, the lines); this will unfortunately exclude some of the available editors and their RegEx Replace implementations, but then, you will want to have the whole macro available for any printing anyway, instead of manually entering the different RegEx commands.

After this, you will have your csv file, exported from the database file, and you will have a text file which originally will have been a copy of said csv file and which is now a plain text file, in case with markup codes for formatting.


Oh, I forgot: You will need titles and sub titles for prettier print-outs, and you also will want to have automated page breaks. You will run an sql select with "group by" (without the further data of course, in our example above without "N", "FN" and "T" and so on), which will give you a list of the groups, and then you put that list, after the necessary formatting, as bulk into your database (insert into ... (field, field...) values (v, v, v,), (v, v, v)...; - you know how it's done): This will give you sort of "empty" records but which are perfect titles and sub titles even for browsing query results on screen (but then without leading blank lines, while for printing, you can easily insert blank lines or page breaks before those titles/sub titles). In the future, you will not create new categories or sub categories without creating the respective "blank" record, the record with just the categories, first, and that's not too difficult to remember, since:

You shouldn't create new records entirely by hand anyway but create a macro which allows for creating a new record as a copy of the current record, and if you systematically do this, this means that you will create new records by first going onto the "title" / "sub-title" = the "dummy" category record, trigger the macro, and then just fill out the real data; this way, it's technically impossible to ever forget to create the dummy, "title", record for some new category.

Oh, and I forgot, how to then automatically insert the codes for leading blank lines or leading page breaks? You will insert an additional column, name it "pc" for "print codes" or whatever, 1-char numeric, default 0, blank line before = 1, page break before = 2, or whatever, OR you could write a macro which afterwards checks the blank content fields, but that would be less flexible since it's not taken that you want a blank line before ANY record where C3 and C4 are blank (lesser group change), and a page break before ANY record where C2, C3 and C4 are blank (and all the rest, of course, since we only consider titles of any level), so an individual setting is probably best (and will "cost" 1 digit per record). (Of course, your macro mentioned above will then also translate the "1" or "2" into the needed formatting code.)


So, you will have got some select result which then is perfectly pre-formatted by your macro but which looks horrible to the eye. What to do with such a thing? You can put it into InDesign or some other dtp tool, but for most use cases, that would be way too sharp an effort. It's probably possible to import it into Microsoft Word or such, but the question remains, how much formatting will be preserved, how much of it will be lost? It would be ideal if there was some pdf printer (tool) which could read some formatting at least, but in fact, I didn't even find any which preserves page brakes (ascii 12 for example), so let alone for bolding the content of some field content...

Thus, you may think of html, which should be perfectly possible, but then, printing from html is not something I had never considered to do. On the other hand, it's very easy to enlarge your macro just that little bit as to add a little rtf header and the closing brace, and running a macro translating all possible special characters (à, ù and so on) into the respective, 4-chars rtf codes, is easy since the web abounds of the respective tables for that task.

Then, you open your newly-created rtf file from within an rtf editor (for example WordTabs, free) and trigger "Print" - that's it, or perhaps it's even possible to send an rtf file directly to a printer, the necessary commands (paper size, margins, and so on) should all be available within the rtf language.

I'm just assuming here, but let me guess that the above work-flow (which is entirely free) is much more flexible than some expensive database "report" programs are, at least as no graphics are involved, and I admit that I didn't bother to look up Tibetian-to-rtf, but for the rest (and by printing from WordTabs or Works), it works fine already... to be refined of course, and when I have time, I'd like to get rid of the additional rtf-editor-for-printing step - all of rtf is available on the web, but it's quite some stuff to wade thru, for direct printing.


Oh, and for standardization, all the tables in my one-table databases are called "main"; helps big with writing heap select-macros.


Edit June 3, 2017
To clarify why RegEx here.
I'm not speaking of the classic RegExReplace "somethinga fieldseparator somethingb fieldseparator somethingc" by "somethingb fieldseparator somethingc fieldseparator somethinga", for which RegEx is needed in order to maintain the original somethings to be relocated (you'll work with parentheses for groups and numbered placeholders for them), it's just needed here in order to treat the lines (records) one by one and then applying the "replace once only" (in the specific way of the respective replace of the replace stack) to every LINE. You probably can do this by other means (for example have a look at KEdit, its price has come down from 129 to 99$ recently and which possibly could do it, did not check), but for me, doing it with RegEx replace has proven simplest.

Instead, you could write a one-line classic ReEx replaces "doing" each line/record for all which is needed, then (with the same command line) the next, and so on, but I suppose that will take a little bit more execution time (by the multiple replaces 1-by-1 of the field contents (which are left alone in the strategy described above, and which I prefer for that reason of "minimal invasiveness" then), but it's correct that it may be simpler to maintain specific 1-line replaces, one for each database select-result file (independently of course if this result comes from flat tables or from a more or less normalized database), than doing multi-line replace commands, one line for each field separator in the record. But then, I prefer the latter also for the fact that in a multi-line command, you will get neat comments at each line, whereas in a 1-line RegEx, you would need to write down the respective comments not as near the respective command but for example as a multi-line block below, each line referencing one group in the 1-line replace command, so at the end, the 1-line flavor is not as elegant anymore when you put down comments, and it's evident that even when you do, changes aren't as quick to do as with multi-line since you'll always need to check if your change concerns the right element (and not possibly its left or right neighbor, inadvertently), while in the multi-flavor, with the (right) comment in the same line, it's always evident where you are and do your changes. But that's a discussion of RegEx styling, both ways are perfectly possible.


Oh, and I would like to modify my claim above that the described strategy may be "more" (?) flexible than out-of-the-box "report designers". For example, it's easy with the former to automatically insert a leading blank line whenever the first, or even the second character of some field changes; this would be helpful for example for the field "A" when you set changes of the fields "C1" to "C4" to "leading page break", and would not even take much white space in case you print in more than 1 column - but good specimens of the latter variety should do that too, of course, so let's say the way described here is AS flexible as good "report designers" (very probably) are.

You just run another macro element which retrieves the content of that field in every line/record, compares it with the stored content of the previous one, and if character 1 or 2 isn't the same anymore, it changes the content of field "pc" ("print code") of the line from 0 to 1 (if the one of the previous line has been 0, not 1, 2 or other, before, so this will not affect records after titles/sub titles, or you do it by "changes, but no changes from nothing", titles'/sub title's "N" fields being empty, in our example above); or you simply bolden the content of the "N" field in the output upon such character changes only, instead of boldening it for every record - as you see, the possibilities of revamping core database outputs are seemingly endless, just by running text macros on the csv, in the right order.


As said above, you will want titles/sub-titles on paper, and on screen, and on top of the respective real data records, not below them, so with the above, you will have an order problem:
select * from main order by C1, C2, C3, C4, B, N, FN, T -- I renamed "PC" to "B" for "breaks"
does it wrong,
select * from main order by C1 asc, C2 asc, C3 asc, C4 asc, B desc, N asc, FN asc, T asc
is ugly, so, in order for the real select
select C2, C3, C4, FN, N, T from main order by C1, C2, C3, C4, B, N, FN, T
to work correctly,
you either code line breaks as -1, page breaks as -2, but I don't like negative values when not needed, or you set the "B" column's default value to 3 instead of 0 and set it for line breaks to 2 and for page breaks to 1; this will even leave you the 0 for a possible other code before you'll have to do an "update main set... where...". I concede a default of 3 isn't pretty but then you don't even have to display it (see the last select); for new records, your macro will put it in automatically, and sql lets you sort ("order") by invisible columns.


We're continuing the "refine it" part here: When a field is empty, you will probably want to get another replace string as the one you want for a non-empty field, for example if you have 4 hierarchical categories C1...C4, you want them "C1 - C2 - C3 - C4: ..." but when subcategory C4 is empty, you want it to be "C1 - C2 - C3: ..." instead. How to do it, since RegEx does not offer some "if it's the full string (with variable part here, that's a given), replace it by (the various part and) "a", and if the variable part is missing, replace it by "b"; look-arounds do not help either if you always replace the very first string of the kind of each line, in every iteration, and a non-replace would create chaos, any un-replaced field separator wrongly becoming the first one then.

So in most cases the replace would be done, which would result, in our example, in a "C1 - C2 - C3 - : ...", and later, you simply would replace any " - :" or even " -  - :" by a ":". If that's not possible because of possible occurrences of these strings in the regular text, you simply replace all field separators with the wanted strings but in which you maintain the original field separator, by replacing it with a special character which does not occur anywhere. For example, if the field separator is a tab, do the replaces as wanted, but replacing the tabs by an additional "|", too, which gives, in our example, "C1| - C2| - C3| - [C4 being empty]|: ..." = "C1| - C2| - C3| - |: ...", and then, you'll simply delete (always by macro since you'll know beforehand which strings can occur by this) unwanted "|" combinations (here: "| - |"), and in the end, all remaining (single) "|"; Perl is said to be best, so perhaps it can be done in simpler ways in that language. Anyway, you clearly see that you don't need to just use spaces, then delete the multiple ones, but you can do anything you want; let's hope the commercial offerings go into these depths, but then, 2/3 of their development efforts inevitably go into the GUIs.


EDIT June 12, 2017: Some (useful and/or very important) Details

Output: RegEx / Replace

To my surprise, I didn't find any RegEx editor which allows for only replacing the very first occurrence of some given (sub-)string in each line, so I had to do it the opposite way described above, by changing every line in one occasion, the whole line, line after line, and this gave those horrible 1-line, inscrutable RegEx replace code I referred to above. As you know, you must group the sub-strings by (), and in order for then not getting totally lost, I did a comment line, with numbers 1, 2, 3... 25... immediately above the code line, for every ()-group and with the necessary numbers of spaces before and between the numbers, in order to identify every group by its number - since in the replace part, you must refer to those numbers but which are not in the what'-so-be-replace part of the RegEx code - a RegEx flavor where you write the group numbers directly into that "original's" code would be very welcome.

The above applies to the regular, the "B=3" lines (and thus, the "B=3" is checked within this code line and will thus leave alone lines where this condition is not met); you'll do your special RegEx replace code for other lines, for example for B=1 or B=2, for other conditions, and you'll also don't necessarily replace the original line by the final rtf code but for some formatting code, you'll use placeholders (for example little/suspended/superscript 1/2/3/a and such), and you'll do their final replacement later on (incl. extermination of unwanted code combinations, see above): This will greatly enhance the clarity of your code (so you'll not be get lost in it) AND will greatly help you with formatting changes you will want to implement afterwards, be it that you want some else formatting or be it as alternatives for several different output situations.

Also, you must pay attention to a correct, AND to a smart, order in your replace code (which replaces will be made before others; if you do this smartly, you can greatly reduce complications in your code (differentiation of which lines / sub-strings are affected further on: "discard" anything which is easy to "discard" at some stage, then do the "complicated" things, then again take the "discarded" lines and do further replacements there, in case); also, only the core replacements where RegEx is really needed (for groups) are done by RegEx, most other replacements are done with regular replace commands.

In order to not get lost in this bulk of code, read your text body into the clipboard, then have your code work onto the clipboard, and put a stop code after the first line, then the second one, and so on, and have your code display some of your (changed) text body to check if the replacements made up to the stop code are what you will have expected.

If you do the replace according to these lines of advice, it will all work out to your entire satisfaction, as it did for me, and your replace/rtf-formatting code will differ only in some core parts from one database to another; for almost anything, it'll be perfectly "reusable" - in practice, of course, you'll use the same code block for everything, doing just some "if database = a/b/c..." branchings where the code differs.

(You'll do a lot of replacements in order to refine your code, for example you will print/export only the very first words in bold, up to the first comma if there is any, of the content of the field "N", which means the family name of the FIRST author will be in bold, not further authors (in that book example)*, but if the author is some institution with several words in its name, all of these will be boldened; also, if you will have no first name (field "FN" is null), there will be no space (which would be inserted between a first name and a family name): All these details (fine-tuning) you'll realize by refining your code after the code works well in its core functionality, and placeholder characters greatly help with all this.)

*: Since I now have just one author in the "N" (for "name") field, OR the first author is before the very first comma in there (if there is any comma in that field), and the order within each sub-category (or overall, in order to get an ABC authors' list) is by these "first authors", it would be easy to put further authors (ie anything after a comma there) into additional, referring records, but since I do "like" searches anyway, I don't feel the need to do so yet.

Output: RTF

To my surprise, printing an rtf file isn't done easily, like you would send a pdf file to a printer, and I cannot do it do, there's too much special knowledge involved. So you will need some intermediate program in order to open your rtf file, and from which then you'll print the file. Forget the usual rtf editors, I tried some of them, like "Jarte" and others, and they will open your (correct) rtf file indeed, but then you'll lose much of your formatting, in other words, they aren't able to correctly read the rtf code of the file they will have opened.

Don't laugh and say, "then your code will be wrong", since in MS Word, it all works out perfectly, incl. for example printing-out in several columns, with title lines of several indentation levels starting a new section, so that's exactly the application I advise you to use for checking and printing your programmatically-created rtf files: First, Word is of great help to correct and refine your code, by checking what it will give in Word, and then, it's the perfect intermediate tool for printing as long as there is no basic tool available for sending rtf files to printer.

It goes without saying that such "raw-data-transformation" of database data into "pre-formatted target data" not only works with rtf code for printing, but also, with the respective html/css/js or mark-up codes, for web publishing or for further treatment in InDesign or any other (ex "desktop") publishing tool; even links to graphics or other elements, present in the database, could be integrated as needed into the target text body InDesign or similar then will work on. (Btw, the size of your database files will be extremely low, comparatively, since there is no formatting or other "noise" info within the database body, but just will be added programmatically after export into csv/text format; this doesn't apply to BLOBs, of course, but then, you probably will prefer links to files in the file system instead of BLOBs within the database anyway.)

To begin with, you will send the correct sql select statement to your database, in order to get the wanted data (only), in the wanted order; then, your text-processing code will do everything SQL isn't, or is less, able to do.

SQLite: rowid / ID / autoincrement / primary key

As for SQL specifics you must know beforehand, there's the SQLite particularity that by default, it re-uses (!!!!!) row IDs that will have become "free", by deletion of the row/record in question; it's obvious this can create total chaos in your database (while for flat book lists and such, that is not really such a problem in reality, you'll want to make even such a basic thing "future-proof"). In the web, there is also WRONG information upon this phenomenon available, for example http://www.sqlitetutorial.net/sqlite-autoincrement/ pretends/implies by omission of clarification that such rowid re-uses will only occur when your SQLite rowids have reached the stunning number of 9,223,372,036,854,775,807: this is definitely wrong. Fact is, most of the time, SQLite uses new rowid numbers, not numbers of deleted rows/records, but there is no guarantee whatsoever for this, in my tries, when I added (single or multiple) new records, then deleted them, then created new ones, rowid numbers of these recently created-and-deleted were re-used; HOW SQLite decided upon re-use vs new number is beyond my knowledge.

Thus you will need some ID column replicating the internal rowid but with the "autoincrement" setting (which then, but then only (?), also works for the internal rowid, or in other words, I don't see a way to apply "autoincrement" directly to rowid, you seem to have to replicate the rowid in some ID column which then, while remaining a replication of the rowid, forces that rowid to autoincrement; there does not seem to be a direct setting.

They all say that for performance reasons, you should avoid autoincrement at all cost, but imagine you do some zettelkasten database, with cross-references to other records, and then SQLite re-uses rowids/IDs of deleted records for new ones?!

Ok, that's not a perfect example since when doing cross-referencing, you should prevent deletion of referenced records to begin with, but this implies technical cross-referencing (cross-references either only in special, referenced fields (foreign keys) or cross-references-in-text with additional such referencing and the overhead that comes with that, and which is not possible within a home-made, hobby application) while in real life, you would probably use non-technical-cross-references, ie not-monitored ones, just by copying the target's ID number into your text within the content fields (hopefully with follow-the-link formatting), so that these "inline" references to other records MUST be absolutely stable, which, if you use SQLite along the lines of their common advice (ie avoiding autoincrement IDs), is NOT the case.

I did not yet try how it works if you create new records programmatically, ie by executing your own sql, only, but in practice, you will probably create many or even almost all your new records from within your frontend, and doing so will create plenty of re-used rowid numbers at least in SQLite Expert, but I think it's not the fault of this frontend, this time. Of course, there is always the solution of avoiding record deletions, discarding records not being wanted anymore into some sort of "archives", by switching a boolean switch in each record (for example column "a" for "active: yes/no", with or without some "cleaning" of them within the "active" table and shifting them into some "archives" table(s) every now and then), but it's probably a good idea to have such "archives" indeed, AND to provide for the possibility to definitively eliminate records, even from these archives, and this would make reappear the problem, and overall, how would you really prevent ANY possible (and potentially problem-creating) (manual) record deletion from a hobby application? In other words, you couldn't use any out-of-the-box sql frontend anymore but would have to write your own, deletion-proof frontend. It's evident that corporate applications would not come with such problems since there, for compliance reasons, it's even necessary that no record will ever be deleted, instead of being moved into some archives, but then, SQLite isn't the right back-end for any group application to begin with. So we're back to "autoincrement". But:

In the different flavors of SQL, it's more or less difficult do database design changes after creating a database, or after creating a column, and in some cases, your only means is to create a new database, with the characteristics you want it to have from start on, and then fill it with the data from your original database (fortunately, this can be done by sql code), and in the case of unique IDs in SQLite, that's exactly your problem here if you will have created your database without creating the autoincremental ID column when creating the database, ie the command alter table main add ID not null integer primary key autoincrement will NOT work, because of the primary key constraint ( see, among others, http://www.sqlitetutorial.net/sqlite-primary-key/ ). (So when creating your database and your very first (empty) table in SQLite browser (see above), first create the necessary ID column, THEN only do the "File - Import - From CSV" command in order to create the (basic) database from your csv data, before "formatting" the (other) fields in your frontend.)

SQL: Mix-up of nulls and of empty strings

This problem described above did not come from my using two different frontends, alternatively, for the same database(s), but from my using Ron's Editor in order to quickly and "manually" shift data from unwanted columns into the correct ones, after sloppy RegExes of my original text data (see above); in Ron's Editor you can shift manually the content of many fields into another column very quickly, so this tool comes very handy for putting "far-from-being-normalized" text data into a table (csv) format for spreadsheet/database use. But whenever I imported or re-imported my data from that csv editor into my database (ie whenever I rebuilt the database from the updated csv file), I didn't have nulls but empty strings in empty fields, while, as said above, for newly created records from within the database (frontend) itself, fields left blank on creation correctly got null values.

Thus, csv editing of database stuff should come to an end at some point, then you'll replace any empty strings by nulls, and from then on you will have only non-empty strings, and nulls, to cope with in your database and your sql code. (I largely prefer nulls over empty strings, and be it for the only reason that I can simply leave fields blank when creating new records from within the frontend, without creating by this a mix-up of nulls and empty strings*; you'll do how you like to do it; it's just the mix-up that's really unbearable - or you'll get very complicated code if you want to get correct results.)

*: If you programmatically create new records, you could write your code so as to create empty strings for fields "left blank", but if you create them by typing within the frontend's table view, those fields get set to null if you don't manually set them to '' one by one, and voila, you'll need the code for empty-string-nulls-mixup further on, so clearly avoidance of empty strings in favor of nulls is preferable; your mileage may vary.

SQL: Field names

The ID field can get any other name (but not any other numeric format than INTEGER), but as a general rule, my uppercase field names above are a big nuisance when entering them into the respective sql commands; I'll change them to (equally short) lowercase names, and you should use those from start on (Windows doesn't make any difference here between lower- and uppercase anyway).

Navigation

[0] Message Index

[#] Next page

Go to full version