Ticket #770 (closed enhancement: fixed)

Opened 4 months ago

Last modified 4 months ago

Integration of memcached support into QCubed

Reported by: vaibhav Owned by: somebody
Priority: must Milestone: 2.1
Component: ORM Version: 2.0.2 Stable
Keywords: memcached, Cc:

Description

Related thread:  http://qcu.be/content/qcubed-memcache-integration-complete

QCubed has always been rated as a slow framework and the addition to the framework tries to solve that by adding memcached support.

Please note that the API used is memcache (not memcached).

The following changes have been made:

1. configuration.inc.php has two new definitions. One is boolean value (true/false) used to switch off the memcached functionality altogether. Other is the array which will be used for connecting to various memcached servers (it has only two parameters: the server IP address and the port to connect to).

2. QApplicationBase.class.php has a new public static member which is initialized as a memcache object by prepend.inc.php as and when it is included. The variable remains null otherwise.

3. prepend.inc.php is modified to check whether memcached support is enabled and initializes the variable in QApplicationBase.class.php and adds the server to the pool.

4. Query methods Load, Delete, DeleteAll, Save, Truncate and Reload have been modified in the templates to consider memcached support. The behavior is as follows:

*
Load method sees if the objOptionalClauses is null or not. If it's not null, the query is actually performed on the database.

If the objOptionalClauses is null and memcache support is enabled, the the object is looked up in the cache. If found, it is returned otherwise the query is run on the database.

If the query is run (because of any reason) and memcached support is enabled, the object is added to the cache for later use.

* Save function simply deleted the object from the cache. It is because it will automatically be cached in the Load method the next time the object is requested.

* Delete function will first remove the object row from the database and then remove it from the cache.

* DeleteAll and Truncate will empty all the cached items from the server. This is because these two functions are normally not carried out regularly and there is no way to search for all the keys in the cache and delete them.

* Reload will remove the object from memcache before calling the Load method. this automatically reloads the object in the cache as well!

Attachments

QApplicationBase.class.php Download (29.7 KB) - added by vaibhav 4 months ago.
/includes/qcubed/_core/framework/QApplicationBase.class.php
configuration.inc.php Download (14.0 KB) - added by vaibhav 4 months ago.
includes/configuration/configuration.inc.php
prepend.inc.php Download (4.5 KB) - added by vaibhav 4 months ago.
includes/configuration/prepend.inc.php
object_save.tpl Download (7.0 KB) - added by vaibhav 4 months ago.
includes/qcubed/_core/codegen/templates/db_orm/class_gen/object_save.tpl
object_reload.tpl Download (1.3 KB) - added by vaibhav 4 months ago.
includes/qcubed/_core/codegen/templates/db_orm/class_gen/object_reload.tpl
object_delete.tpl Download (4.7 KB) - added by vaibhav 4 months ago.
includes/qcubed/_core/codegen/templates/db_orm/class_gen/object_delete.tpl
class_load_and_count_methods.tpl Download (2.9 KB) - added by vaibhav 4 months ago.
includes/qcubed/_core/codegen/templates/db_orm/class_gen/class_load_and_count_methods.tpl
test.patch Download (21.6 KB) - added by vaibhav 4 months ago.
test patch.
Predis.tar.bz2 Download (99.1 KB) - added by vaibhav 4 months ago.
The Predis library for supporting Redis into QCubed. Expand this into includes/external_libraries directory (will have to create that directory).
ticket770.patch Download (27.0 KB) - added by vaibhav 4 months ago.
All changes made. It just works :)
ticket770-va.patch Download (17.8 KB) - added by vakopian 4 months ago.
Added a new feature to QCacheProviderLocalMemory - an option to keep the cache in session. This extends the lifespan of the cache from request to session.

Change History

Changed 4 months ago by vaibhav

/includes/qcubed/_core/framework/QApplicationBase.class.php

Changed 4 months ago by vaibhav

includes/configuration/configuration.inc.php

Changed 4 months ago by vaibhav

includes/configuration/prepend.inc.php

Changed 4 months ago by vaibhav

includes/qcubed/_core/codegen/templates/db_orm/class_gen/object_save.tpl

Changed 4 months ago by vaibhav

includes/qcubed/_core/codegen/templates/db_orm/class_gen/object_reload.tpl

Changed 4 months ago by vaibhav

includes/qcubed/_core/codegen/templates/db_orm/class_gen/object_delete.tpl

Changed 4 months ago by vaibhav

includes/qcubed/_core/codegen/templates/db_orm/class_gen/class_load_and_count_methods.tpl

  Changed 4 months ago by vakopian

vaibhav, thanks for the fantastic work.
A couple of remarks:

  • In Load functions, you do $objMemcache->get() twice (one in the if condition and one to actually return the object). This is not needed and could be expensive. You can easily change it to:
    $objCached = QApplication::$objMemcache->get(...);
    if ($objCached)
    	return $objCached;
    
  • I suggest we use a different delimiter between the class name and the id. You currently use underscore, which is a valid char in a class name. Using a colon (":") is better, since you can't have that in a class name.
  • Since the API we use is pretty simple (i.e. get(), set(), delete()) we should probably abstract this out into a QCubed abstract class. This will allow for different back-ends, not just memcache. Something like this:
    class QAbstractObjectCache {
    	abstract public function get($strClassName, $strId);
    	abstract public function set($strClassName, $strId, $mixObject);
    	abstract public function delete($strClassName, $strId);
    	abstract public function clear();
    }
    
    class QObjectCacheMemcache extends QAbstractObjectCache {
    	protected $objMemcache;
    	public function get($strClassName, $strId) {
    		return $objMemcache->get($strClassName.":".$strId);
    	}
    	// ... other impls
    }
    

Of course I understand you're still in development/testing stage, so we can always refactor the design once the main parts work ok.

Also, it would really help the review process if your changes come in a patch form. IDEs typically have tools to create patch files. And if you're comfortable with command line SVN then "svn diff" would also produce a path file. Even tortoiseSVN probably has a way to save the diff as a file.

follow-up: ↓ 5   Changed 4 months ago by vaibhav

Vakopian,

I did change the Load function to run the memcache get only once, even before you had made this comment and there are a few other changes. just read:

 http://qcu.be/content/qcubed-memcache-integration-complete#comment-6734

And please, do comment on that thread as well. It will help catch someone else's attention as well.

Point taken for the 'Colon' (:) delimiter. Will adjust that.

As for the abstract class: I am totally against that , 100%. The reason is : memcached is a cache which improves the performance. One of the other parts of QCubed which performs slow is when you are chaining functions in a way which calls up too many getters. Yes, getters are one of those things which make QCubed easy but I think that they should not touch anything in the caching system. When you use caching, the performance goes uphill. You wouldn't want to add extra load while going up.

As for allowing different backends, your idea is humble and a nice but, vakopian, memcached is the best out there and probably the easiest. If people want a caching solution, they go for memcache. If, someday a new caching solution comes out and turns to be better than memcached, well, we can adjust the code then. Thinking too much about the future can spoil the present :P

And for the patching. Well, I am a command line guy but diff is one thing which has never worked for me. I once applied a patch to my Linux kernel which set some variables right. It screwed up my installation! Moreover, yes, I think I do need to learn more about diff. If you have any link / resource which helps me learn diff (I prefer command lines), I would want it :( I really need to know this one thing badly (sucks up more time in my work too) :(

  Changed 4 months ago by vaibhav

Adjusted the delimiter to colon. Will upload the changes later. Going to try out patching. Wish me luck :P

  Changed 4 months ago by vaibhav

Patching did not work :( :'(

vakopian, tell me the tools n commands u use... I will install any tools of needed.

in reply to: ↑ 2   Changed 4 months ago by vakopian

Replying to vaibhav:

Vakopian,

As for the abstract class: I am totally against that , 100%. The reason is : memcached is a cache which improves the performance. One of the other parts of QCubed which performs slow is when you are chaining functions in a way which calls up too many getters. Yes, getters are one of those things which make QCubed easy but I think that they should not touch anything in the caching system. When you use caching, the performance goes uphill. You wouldn't want to add extra load while going up.

I'm not sure where this misconception comes from? First, PHP getters (i.e. __get()) are not slow as to impact this type of functionality. But my proposed code is not even using getters, it's using a simple function call, that redirects to another call. This is the so-called proxy pattern. The caching solution discussed here is to optimize the DB IO which is orders of magnitude slower than memory access. One additional function call does not impact that difference at all. If you think function calls are slow in PHP you may as well ditch PHP as a programming language altogether.

As for allowing different backends, your idea is humble and a nice but, vakopian, memcached is the best out there and probably the easiest. If people want a caching solution, they go for memcache. If, someday a new caching solution comes out and turns to be better than memcached, well, we can adjust the code then. Thinking too much about the future can spoil the present :P

Even if memcache is the best (which I'm sure the other alternatives below would contest), it's only the best _currently_. Tomorrow a better alternative could come, or Memcache could stop being maintained or whatever. But better is not the only issue. Not everybody is in control of their hosting. Hosting providers may offer other caching solutions or no solution at all. In that case, QCubed could either provide a built-in support for the provided caching solution, or at least allow the developer to easily plug in their own.
Finally, the abstraction here is so trivial that it costs us nothing to support other solutions.

Here is an incomplete list of caching solutions currently available:
 http://en.wikipedia.org/wiki/Redis
 http://meta.wikimedia.org/wiki/Tugela_Cache
 http://en.wikipedia.org/wiki/Memcachedb
 http://en.wikipedia.org/wiki/Membase
 http://en.wikipedia.org/wiki/Ehcache
 http://www.jboss.org/jbosscache

And for the patching. Well, I am a command line guy but diff is one thing which has never worked for me. I once applied a patch to my Linux kernel which set some variables right. It screwed up my installation! Moreover, yes, I think I do need to learn more about diff. If you have any link / resource which helps me learn diff (I prefer command lines), I would want it :( I really need to know this one thing badly (sucks up more time in my work too) :(

You don't need to use diff directly. If you checked out QCubed (e.g. from SVN from the 2.0.2 tag), you can just use svn diff > my_changes.patch to create a patch file. This is really the easiest way. If you just downloaded QCubed as a zip archive, then you should re-consider: you are now changing the core, so you would definitely need a versioning system to do comparisons, rollbacks, updates, etc. If you really have to use diff, then IIRC, diff -U will produce a patch file too.

  Changed 4 months ago by vaibhav

So, what changes would be required in the code? If you detail me, I will try to make them. As for the part of me being extremely foolish is : I thought Redis and Membase were NoSQL database solutions because I saw them mentioned on various blogs as such :( "Hail Wikipedia" - should have been my slogan but then, I do mistakes. Now, after going through the list, I am convinced and my 100% being against is now at 90% being for it.

Again, when I made the 'getters' being slower comment, I was actually sleepy and pushing myself to be up. I think I just mistook that function call as a 'getter'. I think the solution is good and we can extend a lot lot more with that approach. So do tell me about what changes should be made in the code and I will try to give it my best.

About patching, I am gonna try svn. It seems promising. BTW, patches created using the " svn diff " command... would they be 'visible' on the trac's patch viewing system just as other patches are?

  Changed 4 months ago by vaibhav

Yippie! Patching worked :)

Changed 4 months ago by vaibhav

test patch.

Changed 4 months ago by vaibhav

The Predis library for supporting Redis into QCubed. Expand this into includes/external_libraries directory (will have to create that directory).

  Changed 4 months ago by vaibhav

Added patch for allowing the support of Redis as well as memcached into QCubed.
API changed. Now, DeleteAll for a table will not delete all the other cached objects (works on redis only).

Also, added the Predis library used for supporting Redis.

Kindly test caching against complex queries with QCUBED as it might be breaking the result. I am 95% sure that it won't but there is that 5% which can bite back.

I hope nothing more is needed now? :)

  Changed 4 months ago by vakopian

There is always "more needed" :-)

  • I'm not a fun of the "QMCache" name. If you don't like my QObjectCache, maybe you'd like QORMCache? Or perhaps QCacheProvier, which is my preference?
  • The names of the configuration options are not very good either. "USE_MEMCACHE" implies Memcache. "CACHE_SERVERS" implies a separate server. See below for my suggestions.
  • I don't think we should have a "QMCache" class at all. Your current implementation uses it as a cache provider factory, which would be fine, except now if I wanna implement a different cache provider, I would have to go and modify that core file. This is really unnecessary. Moreover, the current configuration options are specific for the cache provider (i.e. Memcache). In fact in your patch the radis options are just hard-coded in the class.

So here is what I think would be better in terms of naming, options and implementations.
I think we need two options in configuration.inc.php, "CACHE_PROVIDER_CLASS" and "CACHE_PROVIDER_OPTIONS". The first one is the class name of the provider class, e.g. "QCacheProviderMemcache" or "QCacheProviderRedis". The second one is a simple array where you pass any options you like to the cache provider constructor. So for example for memcache people can set it to:

define(CACHE_PROVIDER_OPTIONS, serialize(array("server" => "127.0.0.1", "port" => 11211)));

and for radis, people could simply replace it with:

define(CACHE_PROVIDER_OPTIONS, serialize(array("host" => "127.0.0.1", "port" => 6379, 'scheme' => 'tcp')));

Then in prepend.inc.php (or somewhere else), you construct the global cache provider object by passing these options to the constructor:

$strCacheProviderClass = CACHE_PROVIDER_CLASS;
if ($strCacheProviderClass) {
	$objCacheProvider = new $strCacheProviderClass(deserialize(CACHE_PROVIDER_OPTIONS));
}

With this implementation, if someone wants to implement their own provider they don't need to touch anything in the core. All they have to do is implement the QAbstractCacheProvider interface.
And here is how I think that class should look like:

class QAbstractCacheProvider
{
	abstract public function Get($strKey);
	abstract public function Set($strKey, $objValue);
	abstract public function Delete($strKey);
	abstract public function DeleteAll();
	abstract public function FlushCache();

	public function GetObject($strClassName, $strId) {
		return $this->Get($strClassName + ":" + $strId);
	}
	public function SetObject($strClassName, $strId, $objValue) {
		return $this->Set($strClassName + ":" + $strId, $objValue);
	}
	public function DeleteObject($strClassName, $strId) {
		return $this->Delete($strClassName + ":" + $strId);
	}
}

First, please notice the use of "$objValue" instead of your current "$strValue": these caches do support objects, so our naming should not imply strings.
Second, GetObject/SetObject/DeleteObject allows a specific provider to use their own way of composing the key (e.g. for some reason ":" is not good for that provider), and it also makes the usage cleaner, as you don't have to do string concatenation in the templates.

And last, I'm not sure which is better, to always do if ($objCacheProvider) {...} or just have a simple QCacheProviderNoCache as the default and skip the checking altogether. The latter makes for a cleaner code, while the former saves a couple CPU instructions.

  Changed 4 months ago by vaibhav

Vakopian,

Apart from the naming conventions, I think I am not satisfied with anything else. Obviously the question is : WHY?

1. Your naming conventions are really good.

2. Did you have a look at the templates and what results they bring in to the generated code? If we do not provide that 'factory' thing, then you have to change the name of the class in the templates everywhere and then re-codegen. That is a really complicated way.

3. With cache providing mechanisms differing, a lot starts differing. I would rather suggest that we only keep the CACHE_PROVIDER in the configuration.inc.php and move the CACHE_PROVIDER_OPTIONS into individual classes. This would be the most pathetic idea in your view, I know. But I thought all that and did not do it. The reasons are:

a)
You are not going to use caching on a shared server. Anyway you would be using a dedicated machine or have a generous hosting provider. Moreover, most hosting providers do not feature a caching daemon. For that you have to have a dedicated hosting. Such a person would be either knowledgeable enough, or rich enough and whoever would be doing the activation of caching task, should not have much problem reading simple instructions like "To configure the options, go to XX file in the XXX directory". The biggest point is - caching as such is not normal for most people. If they need it, they should take a little bit of more pain to get it done their way. A framework should ease.

b)
Moving the Configuration option of CACHE_PROVIDER_OPTIONS into individual class file has another advantage: You can define your parameters the way you want for each of your caches - its very very unlikely that someone would have Redis and memcached BOTH running as CACHING servers on the same machine. Even when he has both of them, do you think it is going to be easy for the guy if he starts wanting to test which one performs better according to his needs? No way! Moving the CACHE_PROVIDER_OPTIONS into individual classes makes sure that all you need to switch on / off or change the cache provider is to change its name in the configuration file. The rest happens automatically on the next page request.

c) Configuration file should not become a DUMP of various options, each of options having the possibility of having sub-options and so on.

d) About $objValue instead of $strValue - it makes sense but the caching mechanisms just treat it as strings, hence the name.

e) You said that someone implementing his own caching system would have to change the core file. Well, there are quite a few things I have changed in my copy of QCubed core for fitting my needs as of now and that will happen when one goes on to customize things. I really do not think that someone who can implement a new caching system would be so immature (or, let's call that 'dumb') that he cannot add a line into a class file.

Except the variable and classnames, I did think a lot about everything including everything you have said and the only imperfection left from my side is to leave the CACHE_PROVIDER_OPTIONS (or 'CACHE_SERVERS' as of now) in configuration.inc.php.

Caching is mostly useful in a networked environment and that is where it really shines. I believe I am going to change the names and things would be perfect because to be able to cache, you must understand the ORMing of QCubed, your database structure, the limitations of caches as well as be able to make wise decisions on configuring the caching servers. Anyone doing that can take the pain of opening a couple of files and making the right modifications. I think that except the naming, everything is just perfect.

Do think well about this. In the end, I would just say that the framework should be there with 'features' and 'ease of use'. What you suggest (specially the 'user should not bother about the core file') makes me feel, we are trying to make QCubed as a spoon feeding system. Looking into other software framework installation systems, I still think we are just so easy!

A HUMBLE NOTE:
I worked for more than 18 hours to rethink things, analyze code and make best of the decisions balancing the ease for the user as well as sophistication of the whole system. You trying to make things too easy for the user actually made me like this -> :( :'( and I did feel bad about it sir. It took me a full 4 hours to get Redis to work (because if a framework has support for 'more than one' caching systems, it should feature at least two) and about 10 hours to understand where to place the caching code (and it may still be imperfect) . With a disgust, a little anger and irritation and with a feel that my work was not appreciated (although I think it was), I have written all the above. I do respect you and this community and I hope I do not lose mine just by honestly speaking how I felt.

Regards,
Vaibhav

  Changed 4 months ago by vaibhav

I must say one thing above all - whoever designed the code generation was a genius...its so well interwoven!

  Changed 4 months ago by vaibhav

For the ':' thing - well, I read on all those cache providers and I think all is well with that. If someone wants to replace it, it can still be done in the set and get by using the logic. You just have to explode and implode the name.

For the if condition thing - at places in the code, the checking is required, hence it is there.

Now, the ways you suggest are usually performance killers for the simple reason of lot of function calls. I have been a guy who has done kernel debugging, program disassembly and debugging and a good enough C/C++ programmer. If I really had the time, I would have written my whole application in C/C++.
In C/C++ or even down to the processor's techniques (assembly level) for 'function calls', each function call slows down the performance. In C, it becomes Bout 2-5% slower and PHP is about 40% as fast as C. Deeper the function call, slower it gets. More functions calls means more parsing and more memory operations - the operations where CPU could have been utilized. This is why I dislike function chaining, specially something structured as InstantiateDbRow? which is called too recursively. Self recursion gets slower as you go down the level and its a lot more 'exponetial' degradation in PHP than it is in C. This is why I try to avoid it as much as possible.

Regards,
Vaibhav

  Changed 4 months ago by vakopian

vaibhav, take it easy :-)
Your work is always appreciated. If it helps, from now on I will start all my comments with that phrase.
The only reason I'm asking you to make these changes, is so that you benefit from the whole experience - design, collaboration, review process, etc. It would have been much easier for me to provide my own patches with the way I think it should be implemented. But I don't wanna take away anything from your efforts and contributions. Thus I'm trying to gently push the implementation in the right direction.

Clearly, you're too frustrated and/or tired to take any more professional criticism or review. So I will proceed with providing my own patches. To still be civic, I will refrain from committing these patches myself, letting another core contributer review and commit.

  Changed 4 months ago by vaibhav

Vakopian,

I think the major reason I wrote things which make you feel (perhaps) a bit dishonored is that I was straight up for 40 hours and when I am reading this after a good 10 hours sleep, I think I did not write it correctly. It's not that I am not ready for criticism, but I was tired and in my mind I had set a goal that I will get this done before going to sleep (I normally work with goals in one wake-up session). May be that is good for my own process of development but not for community contributions (which I am realizing now).

Coming to the design process. I believe your decisions are really fine. But I would still say this - I do not intend to make QCubed in a way that makes it rather complicated in the working and future developments, even if it takes a little more pain for the end user (the developer using this framework) to use it. My 2 bits:

1. If you make it too easy to start off with QCubed, the changes you suggested would probably kill some performance.

2. Most new guys in search for a good PHP framework like to go with the 'Popular opinion' and unfortunately, QCubed is not the most popular one and no where even close to popular. When I chose QCubed for my work, I was actually very impressed with Symfony and Yii. Both of them have a very very large feature set compared to QCubed because of their 'external library' support. When we are adding new libraries which enable new features into the framework, it is out duty to make sure that someone wanting to build upon our flexibility should have a nice and easier way to do it. By all means a user who wants to get those good features will have to do a lot of work later if we do not split modules in very senseful way. Changing a couple of files should not make it all that bad.

I plan to integrate CKEditor in some time. Plus a PostgreSQL extensions (as a plugins) are on their way, also a new Paginator (which I have already discussed in the forum) to ease the SEO thing. In addition, I want to create a full set of files which add in the NoSQL support for MongoDB (it is so fast with the disk and so so well featured that you can use it as a cache, a full blown DB, a junk data store of some type (if you need) or as something which you can use to collect hell lot of data to go data mining and it would get your work done faster; it is hardly 20% slower than memcached even when it is going for disk read-writes and that makes it a great thing to be added into QCubed). When I design new things for the community, I keep all that future contributions in mind and then I do that.

For the caching support I would say this: we can create a new file in the configuration directory with all the caching setting storage. Anyone interested would be able to modify that instead of going for the core change (as you do not want it, and I too think it would be nicer, after waking from the sleep). It would server the purpose and serve it well.

As for the patches - I am sorry to have dishonored you. It was not intentional but an immediate reply in frustation. I Apologize if I made you feel bad. Please let me make the patches. I would feel better. And yes, I would also request that YOU should commit the changes for you have been involved in all the process. I will make the new changes with a new configuration file named as 'QCacheProviderConfig.inc.php' in the configuration directory - it satisfies both of us.

Regards
Vaibhav

follow-up: ↓ 16   Changed 4 months ago by vakopian

This is not about feeling good or bad, I'm not that thin-skinned ;-)
I'm attaching my patch here. Please take look. Perhaps my code would convince you better than my words.
Other than what I have already suggested before, the differences between your version are

  • implemented full support for composite primary keys
  • The Reload() method doesn't need to do anything with cache, since it calls Load which will reset the cache.
  • I don't see much need for the Cached flag on the database object. If you provide the scenario, for which you can see its benefits, I might be convinced otherwise.
  • I moved the instantiation of the provider into the QApplicationBase::Initialize(): no need to modify prepend.inc.php.
  • I did not implement Redis provider because Predis requires PHP 5.3 (it uses namespaces), which QCubed is supposed to work with PHP 5.2 (at least for now). In addition Predis is onlye one of 5 PHP client implementations for Redis. I wasn't sure which one is the better, more maintained one that uses PHP 5.2.
  • Even though I have implemented QCacheProviderNoCache, I decided not to use that as the default. Not because of the extra function calls, but because of the extra things that need to be done in InstantiateDbRow() before the call to Cache Provider is done.
  • my patch is against trunk, not 2.0.2 (that's why it uses php templates).

A side note: you seem to worry too much about extra function calls. Note, that compared to what I proposed, your factory class is also acting as a proxy class which introduces one more redirection. I would think you would be happy to eliminate these extra calls.
But at the end, these extra function calls almost surely don't matter in this context. Worrying about them at this level amounts to premature optimization. And that's always a bad idea.

in reply to: ↑ 15   Changed 4 months ago by vaibhav

Replying to vakopian:

This is not about feeling good or bad, I'm not that thin-skinned ;-)

Thanks a lot :) I am relaxed about that now.

I'm attaching my patch here. Please take look. Perhaps my code would convince you better than my words.
Other than what I have already suggested before, the differences between your version are
* implemented full support for composite primary keys

Ummm.... the code is fantastic. The problem again here is that you are setting a whole lot of getters. With a query which runs for well some time and involves tables with more than one columns, you will end up with a lot of getter calls. Too many function calls internally is what makes Windows slow, it is the same that makes KDE slow and Java slower than .NET framework. But, I must say I am really happy with that thing. It is really impressive vako. :)

Just a little problem: For every request for a cached object, we are being choked at by at least 1 for-each iteration, two assignments (one with a ternary operator and an array_exists function call) and a GetObject? call which would again probably make more than one check and call an implode. Now all that does take time. This much work for 'just returning the object from cache' :o ? For long queries, it defeats the purpose. In fact, if you look closely, it is almost like getting a new objDbRow from the resultset! Also, we are trying to make the ORM creating faster. So your implementation is effectively putting load on the processor (although it is really nice one feature wise). Once again the GetObject? and SetObject? functions do not need to do any work. Look at my implementation, there is just one function call. You are making it go deeper with function calls. All these calls will be made every time a new DbRow? is requested by Instantiate DbRow?.

We are using caching to reduce the CPU load and save time. I am not in favor of so much calculations. QCubed example site says : you do not get an ORM class on a table with multiple column primary key. Caching is going to be a part of such I believe. Now, in case of 'multiple' primary keys on a table: One can make an 'unique index' on two (or more) columns with NOT NULL on the DB for his 'wanted' primary keys and declare a 'serial' column for being lablled as 'primary' keys. Afterall, a primary key is just an index with NOT NULL UNIQUE INDEX constraint inside the database's internal structure. This will make the querying go a LOT faster when more traffic comes in .

* The Reload() method doesn't need to do anything with cache, since it calls Load which will reset the cache.

Load does not reset the cache (in my copy it does not and it does not seem to do that in your copy either) and it should not. If it does, we have already failed at implementing caching. Someone like me does a lot of calls to 'Load'. If it resets the cache, where am I getting the benefit of caching? I say, Bad idea there!

* I don't see much need for the Cached flag on the database object. If you provide the scenario, for which you can see its benefits, I might be convinced otherwise.

Caching brings objects to memory. It makes them stay there. If someone happens to be able to connect to your memcached server, you can have the data going out of your machine. Yes, you can apply the authentication on memcache serevr but that adds overhead and most people do not do it. Plus, it needs version 1.4.7 and someone might just be wanting an older version! That was security reason.

Someone might be using his QCubed application with more than just one or two databases and one of those databases might be containing HUGE objects which either memcached (or whatever solution one implements) does not support or, he does not want to use caching altogether on that database for some reason. I am one of those who is planning to make use of a database for serving images through Blobs. On a day I get into a dedicated server, the database would be on a different machine serving only multimedia parts of the system. Anyone can be asking for any image anytime and RAM would not be sufficient in that case. I would rather increase the working memory of the DB than caching the things! Also, what if I have only my web server with memcached running? Requests for images through my application (that is how it would be) will unneccessarily complicate everything! Tell me one reason I would want caching the multimedia DB for such an implementation!

* I moved the instantiation of the provider into the QApplicationBase::Initialize(): no need to modify prepend.inc.php.

Great work. I thought about doing that in the end.

* I did not implement Redis provider because Predis requires PHP 5.3 (it uses namespaces), which QCubed is supposed to work with PHP 5.2 (at least for now). In addition Predis is onlye one of 5 PHP client implementations for Redis. I wasn't sure which one is the better, more maintained one that uses PHP 5.2.

I agree. It is great. But I would say - leave it there and in configuration.inc.php make a mention that redis will not work for PHP versions below 5.3. That should do it. Bundling has not disadvantage if we warn the user.

* Even though I have implemented QCacheProviderNoCache, I decided not to use that as the default. Not because of the extra function calls, but because of the extra things that need to be done in InstantiateDbRow() before the call to Cache Provider is done.

yeah, that is good.

* my patch is against trunk, not 2.0.2 (that's why it uses php templates).

:( Made things difficult for me. Also, I am working hard for getting a 2.1 release soon. So may be I will have all the bytes soon :)


A side note: you seem to worry too much about extra function calls. Note, that compared to what I proposed, your factory class is also acting as a proxy class which introduces one more redirection. I would think you would be happy to eliminate these extra calls.

Have a look at your code. You have many more function calls there vako ! :) Now, as for worrying, I have already mentioned things above and there is an additional reason to worry when you are on an interpreted or semi-interpreted language. If you go see the processor and what it does internally and how it does (intel's developer manuals for Assembly level programmers is a good place to start with), you will come to know that only C can do a few things or anything else which produces binary code, not PHP or JAVA. Facebook claims that after implementing hip-hop, their performance went double. That means PHP is two times slower than C. And again, its 'interpreted' so function call cost will shoot up when you go to chaining just for the reason its 'interpreted' language. There is a reason to worry. I think in the 'poor-performance' thread, olegabr made the same point, with proofs!

yeah, having seen the kernel and assembly code and stuff like that and seeing excessive function chaining kill things, that has gone into my veins. I am not going to stop worrying on that.

But at the end, these extra function calls almost surely don't matter in this context. Worrying about them at this level amounts to premature optimization. And that's always a bad idea.

No, not agreed. They matter and they matter too much. Why would InstantiateDbResult? be the slowest thing in whole ORM process? It calls too many functions inside it! Do one thing: if you can make it all come to work without making unnecessary function calls, do it and compare the performance. I am not talking about calling functions like 'mysql_fetch_row' but ones like instantiate functions. I bet the performance will improve by at least 30% straight away! This is one of the biggest reason Yii is faster with all the shit load on it. QCubed's feet are heavy to pull and I am just trying to remove the extra fat from there :)

  Changed 4 months ago by vaibhav

And yes -> your ticket name is -> ticket-770-va.patch. Hey!!! you are using first two letters of my name :P ;)

  Changed 4 months ago by vaibhav

Ok. Let us agree to disagree. But what to do with QCubed's core code?

And yes, the caching algo has got some problem. i got differing results on some queries.

  Changed 4 months ago by alex94040

  • status changed from new to in_QA

Gentlemen,

First of all, thanks so much for investing so much of your heart and soul into this subject. Both of you are clearly very passionate about the future of QCubed and want to make the project succeed.

Secondly, I want to encourage you to switch to a collaborative approach. Instead of arguing about things you don't agree with, concentrate on the pieces that are great. MOST of the pieces you're actually on the same page about.

Lastly, to bring some order to the conversation, I encourage you to use data (ex. performance testing results, ideally from real-world scenarios) to convince the other side. Vaibhav, this is an excellent way to build credibility and convince the other side. Vakopian and I, of course - can be convinced with data. Without data, the conversation becomes completely subjective and everyone wastes time.

One suggestion: let's check in a piece of this work that both of you mostly agree with; and delay the conversation about the piece that's controversial to phase 2 of the implementation. Perhaps that extra bit of the optimization would be beneficial, I'm ready to be convinced. In the meantime, however, let's get 2.1 shipped - with as much benefit from this ticket as possible.

Changed 4 months ago by vaibhav

All changes made. It just works :)

  Changed 4 months ago by vaibhav

@alex: I would say: no need. I found multiple problems with caching and it turned out that the only way to do it was by merging my and vako's methods. I did it and have moved some of the load from the runtime to code generation time.

I removed the GetObject?, SetObject? and DeleteObject? and their need. Instead I moved the whole code of what it does to codegen and the table class generated would automatically do all that those classes did - No loss of functionality, performance improvement!

Added a new variable for the 'delimiter' in the configuration.inc.php. During codegen, it takes effect and get replaced everywhere (this is one of the main reasons I could remove the need of those GetObject? / SetObject? and DeleteObject? functions).

Dropped support for redis. Till now, I was testing with plain and simple scripts with QCubed and it was going fine. When I tried using the code in the site I am developing, things broke down. There is no php library good enough to allow Redis to store PHP object. All it could store was the Object name like "User Object 3" or "Resource Object 7". I tried serializing and jsonencoding and nothing works there. Redis support has to be postponed untill there is a good enough library. The error I get is: "Trying to access property of a non-object". May be I was dumb to not try till now during all the testing!

The way I was trying to modify InstantiateDbRow? had two problems: my method was not allowing more than one primary key, while vako's did. I utilized his method and things work pretty fine. Also, for some reason (that I could not figure out), I was getting wrong results for Query Expansion when trying to set the object cache at the bottom of the function. I moved it to the place where the new object is created. I tried placing the code at different positions and either it would produce errors with PHP or the results that should be there (with errors). So I moved it. QCubed cannot make use of Query Expansion (read object explansion) with live caching, at least as of now. For storing 'BIGGER' results, someone might want to use QCache object. It works well. I did not implement something like that to avoid overloading memcache with huge amount of data coz it does not do much good in that case.

That is all for now.

Hope it helps.

[Note: I will be copy pasting this in the forum to let others know]

Regards,
Vaibhav

  Changed 4 months ago by vaibhav

for getting the 2.1 shipped out, I would say : WAIT. I am working on two other things as well and they will benefit in long terms (also, they are my needs).

1. A FormState handler which utilizes database to store the formstate - helps when you have more than one server (distributed system). Not only it helps with load balancers, but also if you have allocated different public IP address to two machines which make use of the same code. So when a user begins a session with one machine / Load Balancer and comes back an hour later and DNS servers gives him the IP address of another machine, the formstate will remain in effect.

2. A standard session handling mechanism for QCubed. It will behave according to the setting in configuration file. Overloading PHP's own session handler, it will allow you to either make use of Databases (again) for doing the session data storage and retrieval OR (if you choose to) it will make use of the standard PHP sessions. Doing this will make sure that on a day when you have to move from a single server to multiple servers, all you have to do is to change a line in the configuratuon.inc.php file. Makes things easy to a large extent and removes the worries for the end user developing on QCubed.

Along with this, I have made a small change in the Query classes somewhere (I do not remember, but it is on Forum) which makes use of the 'ILIKE' keyword specific to PostgreSQL.

Again, i did talk about the SEO paginator. I have renamed it to QLinkPaginator and it is just one file! You might want to add that too to the release.

What do you guys say?

  Changed 4 months ago by vaibhav

@vakopian: you did not provide the defaults in the QCacheProviderMemcache class for connecting to the server. Gave me a hard time for I believed you had taken care of them! Please do check defaults the next time.

*Peace* :)

  Changed 4 months ago by vakopian

After looking a bit more closely into the caching code in InstantiateDbRow(), I now think that just checking that $arrPreviousItems and $strExpandAsArrayNodes are empty is not enough to allow the object to be cached. If any non-plain field property of the object is actually set, then I don't think the object can be cached. Checking $arrPreviousItems and $strExpandAsArrayNodes ensures that some properties used for handling direct and reverse references will not be set. But there are still virtual and early binding properties (for one-to-one relationships) to worry about.

So the code would need to also check that no virtual and no early binding will be used for the object being loaded before it can allow caching and doing cache hit check and the early return . This means more checking for the presence of more columns in the $strColumnAliasArray and more $objDbRow->ColumnExists() calls. Which will make the InstantiateDbRow more and more expensive when caching is enabled.

In a word, I don't think this ticket is ready for 2.1. I think it will take many more iterations to iron out all the above and rethink how to handle relationships.
I propose to move this to milestone 2.2.

What do you guys think?

  Changed 4 months ago by vaibhav

I think that we should drop the caching within 'instantiateDbRow' method and leave it in the 'Load' call (with objOptionalClauses Set to null condition) which loads objects with primary key values. It will solve the problems for many people and will not break anything. We can later improve it in the 2.2 release. Adding caching support even for preliminary causes can improve the reputation as well as usability to a certain extent.

  Changed 4 months ago by alex94040

As a by-stander, my framing for making the "include in 2.1 or wait until 2.2" decision is as following: does the current approach make a performance improvement to QCubed? Does it ALWAYS make that improvement? Are there any cases that are broken?

The answers to these questions are what would drive my recommendation for 2.1 vs 2.2. We can always improve any piece of code - if what we have now is "good enough" and takes us half-way there without breaking anything, I'd advocate for inclusion.

  Changed 4 months ago by vaibhav

Single Loads without $objOptionalClauses will not break anything at all (because they can't) - all they do is to load a single object into memory. No more no less. All other properties of it, such as reverse relationships are handled by other single 'Load' calls which would do the same job. Until we figure out a more advanced solution, this provides a performance enhancement in many cases e.g. my case, where pages have to be displayed according to single objects and things (properties and reverse relationships) related to them.

I had tried that (the first version of the caching that I made) with all types of queries and nothing was breaking. If you want, you may do an extensive amount of testing. I am sure that such a caching would not break anything!

  Changed 4 months ago by vakopian

vaibhav, that was one of my thoughts for fixing this too. However, that wouldn't work either because of the early binding for one-to-one relationships. It's true that if we only do the caching in Load() when $objOptionalClauses is empty, we would avoid the problems with Expand and virtual fields. But early binding fields are still a problem. The example for this is the Person <-> Login relationship. So for this example, if this is done in Load() we would also have to make sure that $objPerson->objLogin is null (and vice versa for Login).

Also note that doing the caching in Load() is much less beneficiary than in InstantiateDbRow because it doesn't help with the recursive calls any more.

  Changed 4 months ago by vakopian

Actually, it looks like early binding is also triggered by Expand only. So simple Load will not trigger it. So maybe we will be fine with caching in Load().

  Changed 4 months ago by vaibhav

This is what I said. Load will only cache single objects which is fine for a significant number of uses!

You can push it into 2.1 release. Also, do not forget HTMLPurifier.

I will make a new ticket for QLinkPaginator (the SEO paginator I talked about), which too can be integrated into the next release. I am usin it and obviously, it is tested well enough.

  Changed 4 months ago by vaibhav

I am against multilevelcaching. At least for 2.1 The reason is simple. I installed Redis and memcache both and used both of them at once for simple queries. Together, they give a performance worse than any of them deployed single!

Also, there are very few cache providers which can play well with flexibility as well as speed. Second, a cache would essentially be in-memory. Most solutions out there (including Redis) are 'persistent' full blown databases. If one really needs to start writing things to disk, QCache as such is not bad. Again, if someone wants to use 'memcacheDB', they can use it with the same API and memcacheDB is also persistent (and would give better performance than QCache). It is simply pointless to provide multiple caching support when we do not have good performing cache servers with the support which QCubed requires. Redis was the best second option and it cannot cache 'Objects'. I read the performance benchmarks of a lot of others and none of them are viable. I am against it (becuase at this moment it is pointless).

I am also against CreateKey?. While it is great to have it, it can confuse a bit. Most caching systems automatically create a new key when you call 'Set'. So it can be confusing for the end developer. Plus, it does the same job which I did with my last patch! It is useless in those regards. Otherwise, it seems great. I would say, those functions can be brought back in the next release because we are not in a positioon to support multiple caching solutions at this time.

On the other hand, I see that you eliminated the 'CACHE_PROVIDER_DELIMITER'. This can be problematic later when support for more cache servers comes in and those servers have a problem with that character. It would be better to move it to the config file! I also had created a function DeleteCache?() which would just remove the object from the in-memory cache. It can be immensely useful at times when you need to clear the cache after you have run a custom query. I do that a lot of times when I save things to DB and need a database response with the ID of the object that got saved (using RETURNING keyword with my PostgreSQL DB). And no - the 'save' function does not serve you well if there are multiple requests at a time. It can fail if race condition surfaces.

I am in support for the QCacheProviderLocalMemory. It is great for sites which will depend a lot of Ajax (mine does not). It is a useful one.

  Changed 4 months ago by vakopian

The multi-level cache provider is primarily to support the local memory cache together with Memcache. So I don't know what's there to be against about.

There is no need for CACHE_PROVIDER_DELIMITER because that's part of CreateKey now. Each cache provider is free do decide how to compose the key out of all the key components. That's also the reason why it's not part of the codegen-ed code.

  Changed 4 months ago by vaibhav

Looks like its good for the community and there is no loss either. You got the point. I did not notice that multi-level cache provider was what was getting the work done :) Anyways, seems nice :) I do not have any problems. I will replace my code with yours only after 2.1 is released (me lazy boy) :P

Changed 4 months ago by vakopian

Added a new feature to QCacheProviderLocalMemory - an option to keep the cache in session. This extends the lifespan of the cache from request to session.

  Changed 4 months ago by alex94040

So glad to see that you found a solution, guys. You both rock!!

  Changed 4 months ago by vaibhav

@alex: Thanks. As for me, I am going to take a leave from this topic becuase I have achieved an effective performance gain of nealry 200% (and I am really happy). Any further developments done by community would be acceptable to be (when I upgrade to 2.1). So I am eagerly waiting for the release (unless there is more of the testing left). Also please post in the forums. Helps me keep track (trac replies do not fire an email to my account).

Also, I am planning on something marvellous. It is HUGE and needs to be discussed in detail with all the core developers. Here:  http://qcu.be/content/integrating-mongodb-qcubed

  Changed 4 months ago by kmeirlaen

All,

This is some great work being done!
Using memcache myself for my applications, this will be a great help.

Just a few questions / remarks
a) am I correct that this will only apply to ::Load() functions, and not for functions that have different indexes and foreign key relations?
In my applications, memcache is mainly used to store the results of LoadArrayBy?() queries. As mentionned by Vakopain (in this and other threads), those perform really slow, so once I have them, I store that result in memcache.
At least, I would look if it would be possible to have the memcache also apply to ::LoadBy?() functions (1 row for another index then the primary key).

b) it seems using DeleteAll() on one table will flush the entire memcache. That could cause a major hit on memcache and DB !

I have a few applications in which I can test the results of this patch, I'm already eager to try it out!!

  Changed 4 months ago by kmeirlaen

On the LoadArrayBy?() functions: I don't believe it is desired to have these stored in memcache by default since we don't know the amount of objects/rows that will be returned. In my applications, this data is rather limited and static (not frequently changed), so it is a perfect candidate to put into memory. Just explaining my use case here...

  Changed 4 months ago by vaibhav

@kmeirlaen: First, sorry if I made a mistake in spelling your username / name, it's hard for a non-english guy.

But as of now, that is all we have done. I would love to find out a way which makes that done. But it was difficult for me to do so. Olegabr says he has done the work on 'LoadBy?*' functions. If that works, I would be happy but Load is easiest to get done and the array functions have a lot of problems with implementation of the 'right' caching scheme. We have tried and at this point of time, a feasible solution for such caching has not been found. I would love to find a way out. If you have it, please share.

Regards

  Changed 4 months ago by vaibhav

For the 'DeleteAll' -> there is no way to 'find' key names in memcache. This is why it is being done. Since you cannot find the right keys which were stored or send out a regular expression to match keys, this has to be done.

  Changed 4 months ago by vakopian

  • status changed from in_QA to closed
  • resolution set to fixed

(In [1391]) code by vaibhav, modified and reviewed by vakopian. Fixes #770

  Changed 4 months ago by vaibhav

It seems you have removed the 'DeleteCache?' function from the ORM Object generation code. Please bring it back. As of now, a lot of code is dependent on that function. If I upgrade to 2.1 later, my application will break :(

Also, it is useful when deleting single objects from cache manually (I use it after some manual queries). It might be helpful to others as well.

  Changed 4 months ago by vakopian

(In [1393]) add DeleteCache? method, and use it in Save(), Reload() and Delete(). Refs #770

  Changed 4 months ago by vakopian

Good catch. Committed the fix. Thanks

Note: See TracTickets for help on using tickets.