Convekta Logo
Chess Software Sourcebook
Join Chess Reviews<>
Convekta Logo
Clearing Up Confusion with Databases, Datasets, Classifiers and Classes
keywords: Databases, Datasets, Classifiers, Classes, sorting
Robert Pawlak
Friday, June 23, 2006
Datasets contain only pointers of shortcuts to the games in a database. Many operations can be performed on them, and they are convenient for saving small subsets of games, or specialized filtering for viewing game lists.
Most people are aware that chess databases contain collections of games. Chess Assistant (CA) can work with these databases directly - there are operations for joining databases, compacting them, e-mailing them etc.
However, CA also supports something called datasets. A dataset is simply a collection of games in the database. The dataset only contains pointers to the games, and does not contain the games themselves (it is like a shortcut on your hard drive). You can save datasets to disk, and there are operations for combining them, and performing various boolean operations (and/or, etc) on them as well.
The dataset is a hold over from the early days of the program when storage space was limited. Indeed, this is the main strength of the dataset concept - it allows for compact storage of games that are important for one reason or another. The problem with datasets is that they are static, and are not easily updated. For example, let's say that you create a dataset of Kasparov's King's Indian games in Hugebase, and save it on disk. Then let's say you update hugebase by adding the latest tournament games. You find that Kasparov has played some more KID games. However, when you go back and look at your dataset, it only contains the games that were originally placed in the dataset. It does not contain any of the new KID games. To update the contents of the dataset, you have to perform the search again, and save the contents of the dataset to disk.
To address this shortcoming, CA 6 introduced something called a classifier. A classifier also contains pointers to games in a database. However, while the contents of a classifier are static, like a dataset, they can also be updated easily as games are added or subtracted from the database. You can think of classifiers as giving you the ability to save complex searches for later use, and automatic classification of games.
To further muddy the waters, CA also has a feature called 'classes', which are not to be confused with 'classifiers'. Classes give you the ability to bookmark a game for future reference. Classes can be assigned to both game moves, and the game itself. For instance, if you thought a particular game was noteworthy for a theoretical novelty that was played, then you would assign the move where the novelty occurred to the "Opening theory/plans class".  At the risk of further confusion, note that you can also use a classifier like a class, if you want (i.e. you can assign games manually to be members of a specific classifier).
My recommendation is that you use classes for indicating important games that you might want to come back to later. Classifiers should be used in instances where you are frequently updating a database with new reference games, or you were performing a search repeatedly. Classifiers can be used to quickly classify games in an automated fashion, whereas classes are used to manually call out a game for it's importance.
Note that you can search for games in a particular class, and with a little work, you can also do the same for games that are in a particular classifier.
An Example - Datasets vs. Databases
There are a couple of ways to perform a sort in CA. One involves sorting a dataset, the other a database. Like I mentioned earlier in the article, a dataset is a subset of games from a database, and it only contains pointers to games. When you open a database, it automatically opens a dataset with all the games that are in the database.
If you want to sort the database (which is what most people want), in the browser pane you would select the database you are interested in sorting, then go to base->operations->reorganize. There you will see a dialog box that will automatically sort an entire database for you, using whatever criteria you want (i.e. by white, black, date, whatever). Once done, this is permanent, and every time you open the database from that point forward, the default dataset will contain the games in your sort order.
Dataset screenshot
You can also sort datasets, and save the results to disk. What you would do is right click on the dataset, and select tools->sort. Then after sorting, use dataset->disk operations->save to disk  (this operation is accessible from the datasets menu, see picture). You will notice that if you now close all the windows for the current database (right click over database in browser pane, and select "Close all windows"), and then reopen it, the default dataset does not have the sort order that you saved to disk (unless it was identical to the database sort order). If you want it back, you need to load it in, using the disk operations command from the dataset menu.



Datasets are used to keep track of game lists. Boolean operations can be performed on datasets, and they are efficient in terms of storage. However, they are time-consuming to update. Classifiers have none of these weaknesses, and are to be preferred over datasets, in most applications. Classes are not to be confused with classifiers, and are best used to quickly mark a game for later use or reference.


ChessAssistant is a trademark of ChessOK
Syndication available through rss.xml
Click on my name to send me e-mail (must have javascript on)