Ticket #151 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Multiple ExpandAsArray causes duplicates with QuerySingle AND QueryArray

Reported by: Basilieus and VexedPanda Owned by:
Priority: important Milestone: 1.0 RC2
Component: QQuery Version:
Keywords: Cc:

Description (last modified by alex94040) (diff)

When you use multiple ExpandAsArray clauses in the same QuerySingle/QueryArray query, you end up getting duplicate items in the resulting arrays. The QuerySingle part was introduced after we fixed ExpandAsArray being completely ignored (fix in ticket #96) - but it is NOT a regression.

Note that to reproduce this issue, you must have the patch from ticket #96 installed. You also need to have the patch from #154 with the test data.

This is not a duplicate of bug #153.

Attachments

bug151.zip Download (1.1 KB) - added by alex94040 6 years ago.
Better repro (both QuerySingle? and QueryArray?)
bug151_qcubed.2.patch Download (5.2 KB) - added by alex94040 6 years ago.
Patch to fix the issue. Make sure to re-run codegen after applying.
bug151_megatest.zip Download (1.0 KB) - added by alex94040 6 years ago.
Mega test - double-chaining ExpandAsArray?
bug_151.patch Download (5.1 KB) - added by VexedPanda 6 years ago.
This seems to pass every test, though I want us to be very rigorous before checking it in.
patch151_newfix.patch Download (9.2 KB) - added by alex94040 6 years ago.
New fix that does not fork between $objPreviousItems - so order of SQL results doesnt matter
instantiation_methods.patch Download (11.2 KB) - added by basilieus 6 years ago.
Fixes QuerySingle? and also fixes tabbing issue
patch151_perf_curly.patch Download (11.0 KB) - added by alex94040 6 years ago.
Small perf improvement + tabs + curly braces
qcodo_query_methods.patch Download (1.9 KB) - added by basilieus 6 years ago.
IMPORTANT: THIS IS AN ADDITION TO THE PATCH ABOVE! PLEASE INSTALL BOTH.

Change History

Changed 6 years ago by alex94040

Vexed, could you please post a full repro? Preferrably, using the tables from the Test database?

Changed 6 years ago by VexedPanda

I think this should do it. If the user has 1 login and 2 Projects they manage, the $people[0]->_LoginArray will have 2 elements, both the same.

$people = Person::LoadAll(
QQ::ExpandAsArray(QQN::Person()->ProjectAsManagerPerson),
QQ::ExpandAsArray(QQN::Person()->Login)
);

The node names are probably wrong, but I don't have the example data objects to look up.

Changed 6 years ago by alex94040

Vexed, the code you gave above doesn't quite work - there's no such thing as $people[0]->_LoginArray; there is, however, a $people[0]->Login object.

Also potentially relevant: found bug #153 while trying to repro this one, most likely it's relevant.

Changed 6 years ago by VexedPanda

  • status changed from new to closed
  • resolution set to duplicate

Sounds like the same bug, but with actual reproduce code. :P I'll set this one as the dupe.

Changed 6 years ago by alex94040

  • status changed from closed to new
  • resolution duplicate deleted
  • description modified (diff)

Changed 6 years ago by alex94040

  • summary changed from Multiple ExpandAsArray causes duplicates to Multiple ExpandAsArray causes duplicates with QuerySingle
  • reporter changed from VexedPanda to Basilieus and VexedPanda

Changed 6 years ago by alex94040

  • description modified (diff)
  • summary changed from Multiple ExpandAsArray causes duplicates with QuerySingle to Multiple ExpandAsArray causes duplicates with QuerySingle AND QueryArray

Changed 6 years ago by alex94040

Better repro (both QuerySingle? and QueryArray?)

Changed 6 years ago by VexedPanda

The issue is that InstantiateDbRow? only looks at a row at a time, and doesn't know that Project Object 2 has already been added to this Person.

The only viable solution is to track (a hash of) (all, in case there aren't pre-defined unique keys) the columns of the expanded object and the containing objects. So we would know that person1 already had project2 expanded, while not accidentally failing to expand person2's project2 node (pretending we're talking membership here where projects belong to multiple users).

I'm unsure looking at the code how to get a list of all required column names, though it may come to me with further examination.

Changed 6 years ago by alex94040

Vexed, would it be easier if we introduced the following condition:

"multiple ExpandAsArray clauses are only allowed if the table that we're expanding has an auto-increment column?"

And if that condition is not satisfied, throw an exception?

This seems like a fair requirement (and something pretty easy for a developer to fix if they have control over the schema) - this will also save the performance of trying to take hashes of rows that were already added, comparing every field, etc.

Changed 6 years ago by VexedPanda

That sounds reasonable, though any primary key should be sufficient to indicate which columns to hash. Actually, if it's just a small set of primary key columns, I think we could do without the hash, just doing an implode of the values instead.

I'm currently unclear on how exactly to find the containing object's primary key columns in order to get their values, but I'm sure it's possible.

Changed 6 years ago by alex94040

Patch to fix the issue. Make sure to re-run codegen after applying.

Changed 6 years ago by alex94040

  • status changed from new to closed
  • resolution set to fixed

My god, this bug is COMPLICATED AS HELL.

Anyway. I think i have it figured out. Take a look at the attached patch.

Changed 6 years ago by alex94040

  • status changed from closed to new
  • resolution fixed deleted

Changed 6 years ago by alex94040

  • status changed from new to in_QA

Changed 6 years ago by VexedPanda

  • status changed from in_QA to assigned

I think the for loop at the end may be faulty. I think it breaks when the primary key's column names differ between parent and child tables since the codegen is only producing checks for the child table's primary key column names.

Unfortunately the test DB doesn't have any non-id primary keys, so I can't give reproduce code based on it.

To fix that, let's add a relation_name column to the related_project_assn table, and lose the _assn.

So then say the expand is on Project->RelatedProjects?
Project will loop through just fine comparing ids, then RelatedProject? will loop through the Related Project comparing project_id and child_project_id, but then, it'll loop through the Project object, trying to compare project_id and child_project_id, which will crash.

Changed 6 years ago by VexedPanda

The $objToReturn-><%= $col->PropertyName? %> comparison is also faulty, since that'll always be the child object in $objToReturn, and not a reference to the needed key columns for the parent tables that are in the actual DB result.

Changed 6 years ago by VexedPanda

Ok, alex corrected me in IRC.
The correct problem with this patch is the following:
In an ExpandAsArray?(Person->Managed Project) and ExpandAsArray?(Person->Managed Project->Related Project) situation assume:
The previous row has Person 1, Managed Project 1, Related Project 2.
The current row has Person 1, Managed Project 2, Related Project 2.
So we want $person[0]->ManagedProject?[0]->RelatedProject?[0] == $person[0]->ManagedProject?[1]->RelatedProject?[0]

But with this patch, $person[0]->ManagedProject?[1]->RelatedProject?[0] would have been skipped, and not be set.

So we need to validate the entire parent object chain is a duplicate, and not limit the check to just the child object itself.

Changed 6 years ago by alex94040

Vexed, we need a database schema to make what you're describing testable. It seems that in your example, RelatedProject? is ASSOCIATED many-to-many with ManagedProject?.

Could you put together an example SQL that does this?

Ideas on how to fix it: inside the loop defined in line 102, iterate on all child objects, and check if the ID's for each one match the previous items. If they do, we have a complete match.

Changed 6 years ago by VexedPanda

Itterating on child/parent objects is what I thought your original patch does, and introduces the problem determining the primary key columns for that parent object. :P

I'm using the example database for these examples.
I don't have time to check it myself at the moment, but try:

insert into person (first_name, last_name) values ('Joe', 'Smith');
-- This should be person 13
insert into project (project_status_type_id, manager_person_id, name) values (1, 13, 'test1');
-- this should be project 5
insert into project (project_status_type_id, manager_person_id, name) values (1, 13, 'test2');
-- this should be project 6
insert into related_project_assn (project_id, child_project_id) values (5, 1);
insert into related_project_assn (project_id, child_project_id) values (6, 1);

Then do:

$person13 = Person::QuerySingle(
  QQ::Equal(QQN::Person()->Id, 13),
  QQ::Clause(
    QQ::ExpandAsArray(QQN::Person()->ProjectAsManager),
    QQ::ExpandAsArray(QQN::Person()->ProjectAsManager->ProjectAsRelated->Project)
  )
);
echo $person13->_ProjectAsManagerArray[0]->Name, $person13->_ProjectAsManagerArray[0]->_ProjectAsRelatedArray->Name, '<BR>';
echo $person13->_ProjectAsManagerArray[1]->Name, $person13->_ProjectAsManagerArray[0]->_ProjectAsRelatedArray->Name, '<BR>';

This should output:

test1, ACME Website Redesign
test2, ACME Website Redesign

Changed 6 years ago by alex94040

Vexed - the example you posted above doesn't work, there's something wrong with the syntax.

The exception is: "ExpandAsArray? clause parameter must be an Association Table-based QQNode"

On line that has "QQ::ExpandAsArray?(QQN::Person()->ProjectAsManager?->ProjectAsRelated?->Project)"

Changed 6 years ago by VexedPanda

Hmm, perhaps it should have been

QQ::ExpandAsArray(QQN::Person()->ProjectAsManager->ProjectAsRelated)

Changed 6 years ago by alex94040

  • status changed from assigned to in_QA

Vexed - the example you provided works just fine after the patch. The result is what you expected. I'm setting the status back to in_qa.

Here's the full code for your case:

        $person13 = Person::QuerySingle(
          QQ::Equal(QQN::Person()->Id, 13),
          QQ::Clause(
            QQ::ExpandAsArray(QQN::Person()->ProjectAsManager),
            QQ::ExpandAsArray(QQN::Person()->ProjectAsManager->ProjectAsRelated)
          )
        );

        foreach($person13->_ProjectAsManagerArray as $item) {
            echo "<b>" . $item->Name . "</b>: ";
            foreach ($item->_ProjectAsRelatedArray as $related) {
                echo $related->Name . "<br>";
            }
        }

Changed 6 years ago by VexedPanda

Ok, I guess I can see how that works.

Sorry I'm being such a pain about this, I'm just coming up with examples based on my reading of the code. I just seem to be reading it wrong. :P

Here's my current concern:
$mixPreviousItem is set only when $objPreviousItem used to be set, correct?
So why is $objPreviousItem not set _every_ time $mixPreviousItem is set?

You seem to be bypassing the $objPreviousItem variable (setting it to null) in situations it would normally have been set when $mixPreviousItem is an array.

So in the case where it is an array being passed in, I'm thinking we may see an issue where sub-ExpandAsArrays? fail?

Changed 6 years ago by alex94040

Vexed - it's getting very difficult to discuss this in abstract. This is an insanely complex, recursive function. I intentionally introduced a new variable, $arrPreviousItem, to track the "previous items" case for duplication, as opposed to the expansion of an ORM object that has to be augmented ($objPreviousItem case).

So, to be most productive, I'd like to ask you to experiment with the patch and various queries and try to break it - and if you're successful, I'd be happy to look into how we can make it better.

Changed 6 years ago by VexedPanda

Agreed, I'm just unsure when I'll have the time. :(

In the mean time, could you explain why $objPreviousItem is set to null when $mixPreviousItem is an array? (line 25)

Changed 6 years ago by alex94040

Vexed - i intentionally split out the case when $objPreviousItem is null. That case is there to separate the "duplication check" case versus the "expand the previous item" case.

Changed 6 years ago by VexedPanda

Sorry, but line 25 doesn't get executed when $mixPreviousItem is null, it's executed when it's set to something (is_array). That's what's got me worried. When line 25 is executed, the old code would have had $oldPreviousItem set to an object, but this new code does not.

Changed 6 years ago by alex94040

Vexed - with the old code, $objPreviousItem would have been null in that case, too. It would have been set to null in the call to InstantiateDbRow()

I really think we should stop talking about hypotheticals here - we're wasting time. Please do come up with an example if something seems wrong.

Changed 6 years ago by VexedPanda

No offense, but discussing why you made the changes you did isn't hypothetical, and while any actual break may be, I would expect us to know why changes to the core are made.

Sorry, but the code is complicated enough that coming up with actual breaking examples is difficult, while I think I see a clear difference in code execution between the old code and the new code. I think it's always worth examining these differences until we all understand the reason for the change. That way it can be properly documented, or if it's unintentional, fixed.

I'm unsure why you claim that the call to InstantiateDbRow? would have set $objPreviousItem to null. The only times you pass in an array (new lines 53 and 71), the old code used to pass in the last instantiated object in the child item array (old lines 45 and 62). These clearly could not have been null, or the code would have broken there with an invalid index error.

Changed 6 years ago by alex94040

Vexed - here is an enumeration of how InstantiateDbRow is called with the values of the $[mix/obj]previousItems parameter

- QueryArray/QuerySingle code passes in the previous object that represents the previous object that may need to be expanded. This is the only case where $objPreviousItem is NOT an array, but is an object.

- Recursive calls to InstantiateDbArray where we pass in the array of previous items for anti-duplication. This is new.

- In all other cases, it is null (look through instantiation_methods_tpl.php).

In the case where the array is passed in, we DO NOT want the logic for expansion of the old object to be invoked; it doesn't make sense in that case.

I hope that helps. I still think that it's really hard to discuss potential negative impact of something, considering that there are so many entry points into this code. Really, we need a sample.

Changed 6 years ago by VexedPanda

"In the case where the array is passed in, we DO NOT want the logic for expansion of the old object to be invoked; it doesn't make sense in that case. "

Maybe I'm being dense here, but I don't see that passing in an array signifies that we don't want to do any further expands. Isn't the only purpose of passing an array instead of just one object so that we can use that array in the later chunk of code that compares key columns?

Regardless, I'll see if I can come up with some additional test cases tomorrow.

Changed 6 years ago by alex94040

My brain is decisively fried today - I can't think about this anymore :-). Looking forward to talking about this tomorrow. Looking forward to talking to you about this tomorrow... We definitely need more good test cases. I've done some testing here, but more independent testing would never hurt.

Changed 6 years ago by alex94040

  • description modified (diff)

Another suggestion: try walking through the code in the debugger; this really helped me.

Changed 6 years ago by alex94040

Mega test - double-chaining ExpandAsArray?

Changed 6 years ago by alex94040

  • status changed from in_QA to assigned

Vexed - I apologize for being an ass. Your gut feeling was correct. Something is fishy here; I created more test data (as a part of bug #161), and a test set based on that data. Lo and behold, as you predicted, double-chained ExpandAsArray did NOT work. Full mega-testbed for this complicated case is attached. Here's the query:

        $people = Person::!LoadAll([[BR]]            QQ::Clause([[BR]]                QQ::!ExpandAsArray(QQN::Person()->Address),[[BR]]                QQ::!ExpandAsArray(QQN::Person()->!ProjectAsManager),[[BR]]                QQ::!ExpandAsArray(QQN::Person()->!ProjectAsManager->Milestone)[[BR]]            )[[BR]]        );[[BR]]}}}

Changed 6 years ago by alex94040

  • owner somebody deleted
  • status changed from assigned to new

Argh the last thing didn't work, pasting the code again:

        $people = Person::LoadAll(
            QQ::Clause(
                QQ::ExpandAsArray(QQN::Person()->Address),
                QQ::ExpandAsArray(QQN::Person()->ProjectAsManager),
                QQ::ExpandAsArray(QQN::Person()->ProjectAsManager->Milestone)
            )
        );

Changed 6 years ago by VexedPanda

Thank you for putting together the test bed! I made a slight tweak to your patch to set $objPreviousItem to $arrPreviousItems[count($arrPreviousItems)-1] instead of null, and it passes that now.

I'm doing some extra tests on RelatedProject?, to ensure duplicate objects in different paths isn't broken, but I think that was my only remaining concern. :)

Changed 6 years ago by VexedPanda

This seems to pass every test, though I want us to be very rigorous before checking it in.

Changed 6 years ago by VexedPanda

  • status changed from new to in_QA

I further tested with my additional data from above and:

$people = Person::LoadAll(
	QQ::Clause(
		QQ::ExpandAsArray(QQN::Person()->Address),
    QQ::ExpandAsArray(QQN::Person()->ProjectAsManager),
		QQ::ExpandAsArray(QQN::Person()->ProjectAsManager->ProjectAsRelated),
		QQ::ExpandAsArray(QQN::Person()->ProjectAsManager->ProjectAsRelated->Project->Milestone),
		QQ::ExpandAsArray(QQN::Person()->ProjectAsManager->Milestone)
	)
);

This seemed to pass when I compared it to the raw SQL output, so I'm reasonably confident it works.

Using objPreviousItem was a stroke of genius, since it does seem to be parent-specific, and knowing the primary keys becomes so easy. :) Good work!

Changed 6 years ago by alex94040

Vex - I didn't see your comments above, but I did come up with a patch that I like better than the one I did before. Take a look at the new patch - maybe you'll like the approach better. There's no more forking between $objPreviousItem and $arrPreviousItem. All is on the same codepath. The order in which items show up in SQL doesn't matter (and it did matter before...).

Do test with the new Address dataset (i.e. 3 addresses, 2 projects, 4 milestones each) - look for duplicates. Test, test, test. This is very scary code.

Changed 6 years ago by alex94040

New fix that does not fork between $objPreviousItems - so order of SQL results doesnt matter

Changed 6 years ago by VexedPanda

Sorry, mind fixing the tabbing? The patch is overly-big as a result.

Changed 6 years ago by alex94040

Vexed - I got surprised by such a vast difference myself, but sadly, the tabbing is correct. Basically, a huge section has been moved inside an IF statement and a FOR loop. This is where the HUGE yellow/red change is coming from. Nothing was changed content-wise in that big section.

Changed 6 years ago by VexedPanda

Ok, sorry that makes sense. I'll try to read through it this afternoon. :)

Changed 6 years ago by alex94040

I created an example to increase everyone's confidence in tihs code - look at the patch for bug #164 (it requires the latest patch from this bug to be applied). I'm looking forward to everyone's comments.

Changed 6 years ago by basilieus

Fixes QuerySingle? and also fixes tabbing issue

Changed 6 years ago by alex94040

Small perf improvement + tabs + curly braces

Changed 6 years ago by basilieus

Ok all here is the deal:

After making modifications to InstantiateDbRow?, the fixes we made to QuerySingle? on expanding arrays no longer works as intended.

I came up with a solution and I attached the patch to fix it, it basically uses the samething as QueryArray? but returns the object itself instead of the Array with the single object.

patch attached. Please review or ask questions..

Changed 6 years ago by basilieus

IMPORTANT: THIS IS AN ADDITION TO THE PATCH ABOVE! PLEASE INSTALL BOTH.

Changed 6 years ago by alex94040

  • status changed from in_QA to closed
  • resolution set to fixed

(In [64]) Fixes #151. Code and testing by alex94040 and Basilieus. Early code review by VexedPanda?.

Changed 6 years ago by basilieus

Bam!

Note: See TracTickets for help on using tickets.