Hey everyone,
I've ported the datastore to SBCL and OpenMCL properly. I'm sorry to everyone else who has been kept waiting.
The format of the transaction log in UNCHANGED. I've tested loading, creating, and snapshotting across CMUCL, SBCL and OpenMCL with all primitive datatypes (array, integer, symbol, string, list, hash-table, single and double floats), as well as other compound objects.
The patch is against the latest svn repository. Apologies again for this not being up sooner.
Hans, if I were to change one thing about the current transaction log format, it would be the encoding of floats and double-floats. I think it would be better to use integer-decode-float (because it is in the hyperspec), than all these read-time conditionals. I couldn't find a `make-single-float' and `make-double-float' for OpenMCL.
I can easily do this for you and I'll even write a converter if you wish.
Hoan
Hi Hoan,
I have read your patch and it looks fine so far. I will test it on cmucl later today, and if things go well, I'll commit it to the trunk.
I have not made much use of floats, so I cannot comment on the current implementation much. I would also prefer not to have this workaround for OpenMCL, so if you could come up with a patch, I would propably add that as well. The new implementation for %encode-string looks fine, and as far as I can see it is more consistent with the %decode-string implementation.
Thanks a lot! -Hans
I did a quick check of the patches and they seem to work - At least on the first sight. I have commited them to the trunk, please test and report!
Thanks a lot, Hoan!
-Hans
Hey Hans,
I'm glad it works, unlike last time!
There are two things I learnt from this:
1. Test 2. Make small independent patches
Hoan
Hi Hoan,
I will test your patches more intensively tonight, but I read the patch, it was short enough to completely understand and it did not do anything which was hard to understand. Great. I hope to be able to generate a few test cases which make it possible to do some unit testing on the store in the future. This supports what you wrote and I´ll freely admit that our source base should be more helpful.
Do you have any plans for other changes? Last time, you had more (like content-addressed blobs) which I did not understand. Maybe it is a good idea to quickly discuss proposed changes before they get checked in.
I do not have any current plans, but if I happen to have enough time, I will propably work on database maintenance and operations so that BKNR based systems can be run for a long time without much nursery. The snapshotting mechanism currently eats disk space without bounds, and the blob garbage collection is not automated.
Cheers, Hans
Hans,
Looking back, I'm not very happy with the code I wrote for the content-addressed blobs. So for now it is going to remain uncommitted. In the near future, I won't be working on the datastore, however there are some features I'd like to add in the future.
It would be nice if the datastore was disk-based instead of memory-based. I think that is the most productive step forward.
Hoan
Hi Hoan,
what do you mean by "disk based"? The most fundamental idea of the datastore is that it runs from main memory, not disk. This is more than just an accident or a deficiency, it is a feature. If what you want is a O/R-mapping database layer, BKNR may serve as a starting point on how to interface to CLOS, not much more.
But maybe I am missing what you mean by "disk based". Can you explain?
-Hans
Hey Hans,
Ok, what I meant was something like AllegroStore. In AllegroStore, all objects are stored on disk and there is a memory cache.
The idea is to still maintain the illusion that objects are in memory, but still provide high performance.
Hoan
PS. Its getting late over here in Australia...
Hi Hoan,
I don't really plan to make the BKNR Datastore compete with AllegroStore, AllegroCache, cl-versant, or any of the O/R mapping database products. There may be some overlap between such a project and the BKNR Datastore, but I don't expect it to be large. The requirements for a disk-based database are very different from those of in-memory, transaction based system. For example, with the BKNR Datastore you can make arbitary in-memory datastructures persistent by making all destructive operations be transactions. In a disk-based database system, the changes to the persistent data itself is being written to disk, which requires different mechanisms - In practice, you would propably restrict such a system to make CLOS objects persistent and hook into the slot access functions.
I am not opposed to object database systems, but I want to see the BKNR Datastore developed towards better maintainability and reliability. Also, I fail to see what advantages a traditional, disk-based approach would have, despite the inherently lower performance.
What is the reason that makes you think that the BKNR Datastore should move towards a disk-based system? Maybe I am overlooking something or fail to see where the current approach does not work.
I see the most severe limitation of the current approach in it's size limit. A store, in practice, is limited to a few 100 MB of data. Beyond that, the restore times will be too high and also global garbage collections can become a problem, at least if the GC is very naive. In practice, a few 100 MB of data is a lot of data and will be enough for even very large applications. There are limits, but to me they are more theoretical.
Check out some design scribbles I wrote a while ago: http://common-lisp.net/project/bknr/templates/development-style.xml and http://common-lisp.net/project/bknr/templates/why-no-db.xml
Cheers, Hans
Hello Hans,
Could you elaborate a bit more on these points:
On Tue, Feb 07 2006, Hans Hübner wrote:
.. I will propably work on database maintenance and operations so that BKNR based systems can be run for a long time without much nursery.
What kind of nursery do you mean?
The snapshotting mechanism currently eats disk space without bounds,
It seems to me it eats as much as necessary - no more no less. Or am I missing something?
and the blob garbage collection is not automated.
What exactly do you mean? If we delete blob wouldn't the memory be cleaned?
Thanks.
2006/2/7, Kamen TOMOV ktomov@web.de:
On Tue, Feb 07 2006, Hans Hübner wrote:
.. I will propably work on database maintenance and operations so that BKNR based systems can be run for a long time without much nursery.
What kind of nursery do you mean?
The snapshotting mechanism currently eats disk space without bounds,
It seems to me it eats as much as necessary - no more no less. Or am I missing something?
If you regularily snapshot from a cron job, all snapshots will be kept on disk and need to be manually moved to offline media. Depending on the transaction rate, it may be neccessary to snapshot often, which will use a lot of disk space. It would be good if the Datastore would provide an automatic mechanism that purges older snapshots. This could be done on a simple (only keep N snapshots) or a more complex basis (keep the last three snapshots, one for each week, one for each month or some such).
Also, a restore tool is needed that helps selecting a snapshot to restore. I have been bitten many times while selecting an older snapshot to restore and selecting a too-early one. Instead of manually having to select a certain snapshot, one should be able to specify a date and time (or manually created markpoint) to restore to and have the environment select the right snapshot file and rollforward location.
It would also be good if a snapshot could be automatically created as soon as a certain restore time limit has been reached. This would require adding up the execution time of restored and newly executed transactions and scheduling a snapshot only if it is required. Together with the automatic selection of a snapshot to restore, this would make the database system much easier to use, and it would also help with the disk growth problem.
and the blob garbage collection is not automated.
What exactly do you mean? If we delete blob wouldn't the memory be cleaned?
Yes. Blob files cannot be deleted when the blob object is deleted because they may be needed again, should the user decide to restore an earlier snapshot. An integrated snapsht archival function would be able to move deleted blobs offline together with the corresponding snapshot file and transaction log. At the moment, there is the (bknr.datastore:delete-orphaned-blob-files) function which can be called to get rid of stray cats.
Cheers, Hans
On Wed, Feb 08 2006, Hans Hübner wrote:
2006/2/7, Kamen TOMOV ktomov@web.de:
On Tue, Feb 07 2006, Hans Hübner wrote:
.. I will propably work on database maintenance and operations so that BKNR based systems can be run for a long time without much nursery.
What kind of nursery do you mean?
The snapshotting mechanism currently eats disk space without bounds,
It seems to me it eats as much as necessary - no more no less. Or am I missing something?
If you regularily snapshot from a cron job, all snapshots will be kept on disk and need to be manually moved to offline media. Depending on the transaction rate, it may be neccessary to snapshot often, which will use a lot of disk space. It would be good if the Datastore would provide an automatic mechanism that purges older snapshots. This could be done on a simple (only keep N snapshots) or a more complex basis (keep the last three snapshots, one for each week, one for each month or some such).
Also, a restore tool is needed that helps selecting a snapshot to restore. I have been bitten many times while selecting an older snapshot to restore and selecting a too-early one. Instead of manually having to select a certain snapshot, one should be able to specify a date and time (or manually created markpoint) to restore to and have the environment select the right snapshot file and rollforward location.
It would also be good if a snapshot could be automatically created as soon as a certain restore time limit has been reached. This would require adding up the execution time of restored and newly executed transactions and scheduling a snapshot only if it is required. Together with the automatic selection of a snapshot to restore, this would make the database system much easier to use, and it would also help with the disk growth problem.
I see. These would probably be useful tools.
I was thinking about the general purpose of the snapshotting mechanism. It seems to me that it exists solely because the hardware can not meet the demands of the applications only in terms of store-objects restore times. For not so demanding applications I would rather do without snapshotting. The prevalence model allows tracking the history of the changes of the data.
As a framework user I would rather not rely on snapshotting to allow changes of my store-object based objects. This is something that I would fix in the framework.
In my opinion the main functionality must not depend on the snapshotting mechanism, only the other way around.
and the blob garbage collection is not automated.
What exactly do you mean? If we delete blob wouldn't the memory be cleaned?
Yes. Blob files cannot be deleted when the blob object is deleted because they may be needed again, should the user decide to restore an earlier snapshot. An integrated snapsht archival function would be able to move deleted blobs offline together with the corresponding snapshot file and transaction log. At the moment, there is the (bknr.datastore:delete-orphaned-blob-files) function which can be called to get rid of stray cats.
I meant the memory of the blob object, not the file. I suppose this memory is garbage collected, right?
As to the disk memory I was surprised to find out (some time ago) that the files are not deleted on deleting the blob object. Now I understand your reasons. The integrated snapshot archival function that you suggest sounds reasonable. Perhaps once we have it, we could consider deleting the associated file on blob deletion.
Regards,
Hi Kamen,
2006/2/8, Kamen TOMOV ktomov@web.de:
As a framework user I would rather not rely on snapshotting to allow changes of my store-object based objects. This is something that I would fix in the framework.
In my opinion the main functionality must not depend on the snapshotting mechanism, only the other way around.
This is how it works. If you never snapshot, the store will only use the space occupied by the transaction log on disk. If you don´t need persistent CLOS objects, you can also create stores without the store-object-subsystem - Then you will not be able to snapshot at all, because it is the subsystems that do the snapshotting.
I meant the memory of the blob object, not the file. I suppose this memory is garbage collected, right?
Blob objects are not special in terms of normal memory management. Their binary contents is not read into main memory by the Datastore, so there are no special requirements or demands - Or am I missing something?
Cheers, Hans
Hi Hans,
On Thu, Feb 09 2006, Hans Hübner wrote:
This is how it works. If you never snapshot, the store will only use the space occupied by the transaction log on disk. If you don´t need persistent CLOS objects, you can also create stores without the store-object-subsystem - Then you will not be able to snapshot at all, because it is the subsystems that do the snapshotting.
Yeah, I know. This flexibility is great. But it would be even better if changes of the schema (store-objects) go into the transaction log as well. This way the transaction log mechanism would continue to work. What do you think about that?
Blob objects are not special in terms of normal memory management. Their binary contents is not read into main memory by the Datastore, so there are no special requirements or demands - Or am I missing something?
No, you aren't. Initially I did not understand and that's why I asked.
Regards,
Hi Kamen,
2006/2/9, Kamen TOMOV ktomov@web.de:
Yeah, I know. This flexibility is great. But it would be even better if changes of the schema (store-objects) go into the transaction log as well. This way the transaction log mechanism would continue to work. What do you think about that?
True - But the first thing that'd need to go into the transaction log would be the code changes. Especially when a transaction function is changed, the semantics of the operation changes and - if no snapshotting is used - this change needs to be present in the transaction log file in order to be able to successfully restore. We have been discussing this for quite a long time, but as our production systems use snapshots, we presently snapshot before we do code updates, so we don't see the lack of code change logging being a problem.
Logging schema changes is in fact a higher order function. The store-object-subsystem would have to have a set of transactions that perform the schema changes. The general case, at least in my opinion, is code change logging.
In the end, it would make sense to put all code into the repository and directly edit in memory instead of using external files. This is kind of the long-term goal, but we have not made much progress in that direction, sadly.
Cheers, Hans
On Thu, Feb 09 2006, Hans Hübner wrote:
In the end, it would make sense to put all code into the repository and directly edit in memory instead of using external files. This is kind of the long-term goal, but we have not made much progress in that direction, sadly.
That's a great idea. Perhaps it is a part of the more general concept that an application code should not reside in files but in a repository. The part of the task that tracks the changes (versions) of the code could probably be tied with the transaction log mechanism.
Also if we look at it as a more general issue (important not only for bknr-datastore users/developpers), there is a chance that more people would get interested. Have you heard about any project that deals with that?
Regards,
Hans,
Here is a minor patch to make object-tests.lisp working with sbcl-0.9.9. It also includes sbcl changes to init.lisp for logical path translations (assuming installation in user's home directory).
Note that in order for datastore to compile I have to replace cxml and porableaserve with their latest cvs versions.
Tchavdar