I need some help building quite a complicated query. I'm not even sure it's possible or not:
I have two tables, users and sentences. In users we have a unique ID per user, a name... and in sentences we have things they've said, along with another unique ID, and the user ID:
users: ID | name
sentences: ID | usid | sentence
What I need to retrieve is the list of mentions (Like in Twitter, @+username), so if a user A mentions a user B, I'd like to get their IDs as the result. I hope I have explained this well enough,
Thanks in advance,
|SELECT u1.id id1, u2.id id2, s.sentence FROM users u1 CROSS JOIN users u2 INNER JOIN sentences s ON s.usid = u2.id WHERE s.sentence LIKE '%' + u1.name + '%'; |
That query selects three columns: id1, id2 and sentence. id1 is the ID of the user who is mentioned, id2 is the ID of the user who did the mentioning, and sentence is the sentence in which user id1 was mentioned.
If I write it like that I get an error:
|#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '+ u1.name + '%' LIMIT 0, 30' at line 1 |
Any solution? I've been working a long time with MySQL but I'm having quite some trouble with this query...
Looks like the concatenation is messed up. You can try
|$sql = "SELECT u1.id id1, u2.id id2, s.sentence FROM users u1 CROSS JOIN users u2 INNER JOIN sentences s ON s.usid = u2.id WHERE s.sentence LIKE CONCAT('%', u1.name, '%')";
$res = mysql_query($sql);
If that doesn't work you could try
|$sql = "SELECT u1.id id1, u2.id id2, s.sentence FROM users u1 CROSS JOIN users u2 INNER JOIN sentences s ON s.usid = u2.id WHERE s.sentence LIKE '%steve%'";
$res = mysql_query($sql);
to test out if it picks up sentences with the word steve.
I'm assuming the rest of the query is correct
It now looks quite fine, but I need to limit the size of the table users (There are over 1 million), with a subset of them, since I now get this:
|#1104 - The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay |
Although I don't know where exactly I should change users for a subset of them
Yeah, with that many users you'll need something different because the approach of selecting every time doesn't scale. How about having a separate table with a list of mentions, with items added to it when a new user is added or when a new sentence is added. Or, a table with a list of non-trivial words (i.e. not "and", "the", "it", etc) and the sentence the word is found in, adding one row for every sentence. Both approaches would result in huge tables, but the query would be much more efficient.
You could also try this alternative, which performs the filters in a slightly different order. It may perform substantially better memory-wise, but the processing required will be the same.
|SELECT u1.id id1, u2.id id2, s.sentence FROM users u1 INNER JOIN sentences s ON s.sentence LIKE CONCAT('%', u1.name, '%') INNER JOIN users u2 ON s.usid = u2.id; |
Yes that is something I also thought of, I think I will create a new mentions table, and store there the mentions. Although I will need to setup a script to load current data, since I can't just throw away those records
Although since I only need to load the mentions for a subset of the latest 40 active users, I might be able to use
|(SELECT id FROM users WHERE ... LIMIT 40) AS subset |
instead of the table users right? But it is not working...
I've been able to generate some results, although they are not right, with the following query:
|SELECT u1.id AS id1, u2.id AS id2, s.sentence FROM (SELECT id, name FROM users ORDER BY jtime ASC LIMIT 40) AS u1 INNER JOIN sentences s ON s.sentence LIKE CONCAT('%@', u1.name, '%') INNER JOIN users u2 ON s.usid = u2.id WHERE u2.id != 0 AND u1.id != 0 GROUP BY u1.id, u2.id LIMIT 100 |
Apparently in the results the id2 and the sentence are right, it seems like it's id1 the one that's off
Although I haven't worked enough with JOINs to detect the problem with this,