FRIHOST FORUMS SEARCH FAQ TOS BLOGS COMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Help me writing this query. Urgent





imagefree
I am working on a script where each post (say article) has Tags (upto 5). I have stored tags in a different table against the id of the article taken from articles table. Each tag takes one record (so, each article may take upto 5 records in tags table). Now i want to display all articles that come under a tag (say 'mysql').

The query is simple:

Code:
SELECT * FROM tags_table WHERE tag = 'mysql' LIMIT 5;


Now what i want to do is check whether the writer of the article grants permission to display article to others! The permission is in the articles_table and field name is article_permission. If the value is 1, it is public, otherwise it is private.

My Question is how to write a query to check and get the data related to tag 'mysql' and at the same time checking the permission in the articles table, without loosing performance. The query should only fetch those articles that are public and come under 'mysql'.
Fire Boar
Code:
SELECT a.field1, a.field2 FROM articles_table AS a INNER JOIN tags_table AS t ON a.id = t.article_id WHERE t.tag = 'mysql' AND a.public = '1' LIMIT 5;


This query selects two fields called 'field1' and 'field2' in the table 'articles_table'. Of course you can add more fields in a similar way. You can also add fields from the tags table using t.whatever instead of a.whatever, though if your tags table only contains a field for article id and a field for tags, you won't need to.

The 'INNER JOIN ... ON ...' clause is used to connect two tables together and give a rule for connecting. In this case, it joins 'articles_table' with 'tags_table', associating tags_table's 'article_id' field with articles_table's 'id' field (i.e. you only get rows in which the two match). You can have as many 'INNER JOIN ... ON ...' clauses as you like, ON is optional but almost always should be included. ON is a bit like WHERE, so you can have more than one condition if you like, but it's optimized for joining two tables together. You can also have LEFT JOIN and RIGHT JOIN, which guarantee that at least one entry for each row from the left/right table is included, with NULLs populating any missing records.

Finally, the WHERE clause should be obvious. It checks that the tag is 'mysql' and that the 'public' field in 'articles_table' is set to 1.
imagefree
Thanks Fire Boar. I have now a problem to run a query on 3 tables. Tables are (suppose):

Code:
table1      Fields: id1, record1
table2      Fields: id2, record2
table3      Fields: id1, id2


Now i want to run a search for record2 above but i want to get the record1 in result and all other records in table1 (LIMIT 10). How to do that?
Fire Boar
I'm assuming table 3 is a link table, and if there is a link between the ID of the record you are searching for in table 2, you want to select the corresponding row in table 1.

Code:
SELECT t1.* FROM table1 AS t1 INNER JOIN table3 AS t3 ON t1.id1 = t3.id1 INNER JOIN table2 AS t2 ON t2.id2 = t3.id2 WHERE t2.record2 = 'sample' LIMIT 10;


I'm going to change the spacing a bit to make it clearer what's going on.

Code:
SELECT
  t1.*
FROM
  table1 AS t1
  INNER JOIN table3 AS t3 ON t1.id1 = t3.id1
  INNER JOIN table2 AS t2 ON t2.id2 = t3.id2
WHERE
  t2.record2 = 'sample'
LIMIT 10
;


Basically, we're joining table1 and table3 together using the fields table1.id1 and table3.id1, then joining table2 to the result using the fields table2.id2 and table3.id2.

NOTE: Be very wary of using the * clause in select. If you ever change your database structure, the records will all come out in a different order, whereas if you give the names of the fields directly you won't have this problem. Also it's good practice to only select the fields you need: it saves memory.
imagefree
Quote:
I'm assuming table 3 is a link table, and if there is a link between the ID of the record you are searching for in table 2, you want to select the corresponding row in table 1.


Link table mean? The table 3 has no Primary Key. Each field (id1 id2) may repeat. So, table1's id1 may have multiple occurances in table 3.

Record2 and id2 (autoincrement) in table 2 are both unique and do not occure more than once in table2, but in table3 the id2 get atleast 1 occurance.

Ok i will check it and give you the feedback. One more thing to ask, we are searching table 2, and returning MAX 10 records from table 1. What if we want to search MAX 10 records from table 2 and then return

1. all the records from table 1 that meet criteria provided.
2. only first X records from table 1 that meet criteria provided.
Fire Boar
Link table

Okay, for your problem, you need to use a nested query.

Code:
SELECT t1.* FROM table1 AS t1 INNER JOIN table3 AS t3 ON t1.id1 = t3.id1 WHERE t3.id2 IN (SELECT id2 FROM table2 WHERE record2 = 'sample' LIMIT 10) LIMIT 100;


This returns the first 100 records that match, with only 10 records from table 2 searched. The breakdown looks like this:

Code:
SELECT
  t1.*
FROM
  table1 AS t1
  INNER JOIN table3 AS t3 ON t1.id1 = t3.id1
WHERE
  t3.id2 IN (
    SELECT id2 FROM table2 WHERE record2 = 'sample' LIMIT 10
  )
LIMIT 100
;


The subquery is fairly easy: it selects the id2 fields from table2 where the criteria matches, returning no more than 10 results. The main query then uses the subquery to only select records where table3 has one of the chosen id2 fields. LIMIT 100 is of course optional, and omitting it selects all matching records (with no more than 10 distinct id2s).

If you need to add any additional info from table2, simply add the second INNER JOIN ... ON ... clause in my previous post, and add t2.whatever to the SELECT fields.
imagefree
Thanks. Its great. I will test it today and will give you feedback or ask for more help.
Thanks again.
imagefree
Thanks FireBoar. The query is now working fine. Thanks again.
imagefree
I have one more thing to ask.

The tags table is getting bigger, so scanning the whole table each time an article is viewed would be very expensive.

When article is submitted, user submits few tags and those tags once submitted cannot be changed. Each tag is submitted in one record and max 5 tags are submitted. So we can say that all tags of an article can be found in the next 4 rows of the first occurance of the article's ID in the tags table. Is there any way to limit mySQL to scan just those 5 records (or scan from start, till those 5 records)? If we cannot do exactly this, then any workaround to achieve same results (preventing mySQL to scan full table)?

Note 1 thing that user may submit less than 5 tags with an article, and in that case if we use LIMIT 5 then mySQL will continue to search the whole table even if 5th tag never exists, so LIMIT 5 is not the answer.
Fire Boar
Wow. You've just asked for what is possibly the most complicated query that one could possibly imagine. Really, I wouldn't worry about efficiency: it only becomes an issue when there are several hundred thousand records or more, and even then the query given above is easily up to the task. The important thing is to reduce the number of queries, and I don't think we can go any lower than 1.

My advice is to look into indices, if you haven't already. Basically, for a field you are going to search regularly (that is, what comes after the WHERE), it should ideally be indexed. This makes searches much faster and scale much better. There are three types of index: primary key, unique and standard index. Primary keys, obviously, are for the ID field. Unique indices are for fields that must never have more than one field with the same value, and standard indices simply speed up searching. The disadvantage of using indices is they make insert and delete take slightly longer, and a table with an index uses up more space than one without.
imagefree
Thanks.

On one of my project i need to frequantly match long texts (only full match, no searching like google search) and i expect only 1 result in the whole table for that text. Text searches are slow, so to solve this problem i am using an alternative to make text searches fast (i havent tested at large scale, so i cant tell that it is making searches efficient or not).

What i am doing is that i have 3 columns in the table, one is the ID, second is text itself and the third is the length (of the text) field.

I use php to claculate the length and then use inside the query

Code:
WHERE length = '2345' AND text = 'THE ACTUAL TEXT' LIMIT 1


I want to ask few questions. 1st comment about this type of searching (ie searching some hash - like in this case the length - and then searching inside the results for actual result, for performance).

2nd, do the sequance of the WHERE conditions matter? or mysql automatically matches the faster possible condition first , or something like that.

3rd, mysql is really really fast (this is what i hered, i havent tested it to limits), and so i expect mysql to do some workaround to make slow searches faster (for example the workaround i do to make my text searches faster using the hash or string length). Do mySQL really?

4th, is my length condition redundant?

The most important one, if the text is a URL that mostly (99.9% of times) start with http(s)://(www.)
if i do not use the length search, mysql will waste huge amount of time on each record, matching first 7 - 8 digits and then moving to next record in case of failure (remember LIMIT 1, if success then search ends). Although we are using exact match (text='SOME TEXT") but i think the matching is only possible byte by byte. Please correct me if both of my assumptions are wrong.
Fire Boar
You mention "huge amount of time" for something which is probably one of the fastest operations there is. Your length field is almost certainly redundant, but to check you should test and profile. Use the PHP microtime() function before and after doing several thousand/million typical queries with one approach and compare the difference. Then do the same for the other approach, and see which is faster.

In my opinion though... while it's good to think about efficiency, I think at this point you're obsessing over it a bit too much. The difference is marginal at best, and only worth looking at if it starts becoming a real problem.
imagefree
Fire Boar wrote:
You mention "huge amount of time" for something which is probably one of the fastest operations there is. Your length field is almost certainly redundant, but to check you should test and profile. Use the PHP microtime() function before and after doing several thousand/million typical queries with one approach and compare the difference. Then do the same for the other approach, and see which is faster.

In my opinion though... while it's good to think about efficiency, I think at this point you're obsessing over it a bit too much. The difference is marginal at best, and only worth looking at if it starts becoming a real problem.


Well i was right in saying it Huge Amount of Time.

Today i performed performance test on the table and it showed about 1.4 Second time saving per query. I will paste details after few hours.
imagefree
This was my first ever performance test. I always read Authors posting such benchmarking, but today i did it and i was quite successful.

URLs below start with http(s):// and then there is random hexadecimal. It doesnt matter in this test whether the part after http(s):// is a valid URL or not.

The following Queries were executed:

Code:
Query 1: SELECT * FROM links WHERE li_length = '93' and li_url = 'http://3b08aa22e11bdadb6a0cf1a42048eb523b08aa22e11bdadb6a0cf1a42048eb5252f48953cc1bf54687c0d4' LIMIT 1

Query 2: SELECT * FROM links WHERE li_url = 'http://3b08aa22e11bdadb6a0cf1a42048eb523b08aa22e11bdadb6a0cf1a42048eb5252f48953cc1bf54687c0d4' LIMIT 1

Query 3: SELECT * FROM links WHERE li_id = 4754091

//All above 3 Queries return the same row.

Query 4: SELECT COUNT(*) FROM `links` WHERE li_length = 93


Most of the real life URLs are between 50 - 90 characters, so i took an average URL between 36-105 (appx).
Every 10th URL starts with https randomly.
4,840,029 records in the table and i searched for the 4,754,091st

Code:
The Query 1 results Appx 0.44869875907898 Second per Query (50 Queries test).
The Query 2 results Appx 1.84329597949980 Second per Query (50 Queries test).
The Query 3 results Appx 0.00074028015136 Second per Query (50 Queries test).

Comparison of 1st Query with 2nd Query proved my assumption that a hash search will result in faster search.

The 4th Query calculates # of occurances of the value 93 in the table. COUNT(*) resulted 77,716

One interesting thing with my solution
The number of occurances of li_length = '93' in records (77716 times) means this much time the li_length matches with the value provided and then mysql looks up the text field, so i changed the li_length field to something unusual to check the performance. Results were shocking:

Code:
Query 5: SELECT * FROM links WHERE li_length = '500' and li_url = 'http://3b08aa22e11bdadb6a0cf1a42048eb523b08aa22e11bdadb6a0cf1a42048eb5252f48953cc1bf54687c0d4' LIMIT 1


Code:
The Query 5 results Appx 0.00035324096679 Second per Query (50 Queries test). (this is less than the avg time of Query 3 that is based on Primary Key AUTO INCREMENT lookup. See above).

I think this query was faster than the Query 3 because the li_length is smallint, while the li_id is mediumint. Am i rite?

This is where my solution could cut the corners.

This again proves my point that if too many text records start with similar characters, it could waste huge amount of time in large tables.

The more there are unusual values in the table, the more it would become efficient in searching both usual and unusual values. The size of text becomes irrelevent factor and this is the most important thing in my view.


Quote:
In my opinion though... while it's good to think about efficiency, I think at this point you're obsessing over it a bit too much. The difference is marginal at best, and only worth looking at if it starts becoming a real problem.


Yes i know that i am wasting "Huge amount of time" Smile on thinking about performance, but i think in the end it will prove to be no wastage of time because i am constantly getting more and more knowledge by thinking deeper and deeper (without reading books). My whole source of knowledge is Nice People like you who help beginners on forums, and my imagination.

Thanks for the help you provided me in the last few days. I was really needy at that time for the solution of multiple table query problems.

Waiting for your comments about the benchmarking/profiling.
Fire Boar
I've got a comment: add an index on the URL field. Then benchmark again. Smile

Code:
CREATE INDEX url ON links(li_url)
imagefree
Fire Boar wrote:
I've got a comment: add an index on the URL field. Then benchmark again. Smile

Code:
CREATE INDEX url ON links(li_url)

This all was done just to avoid the indexing of the text field. Indexed text field takes lot of space. I just wanted to search the text fields with the same efficiency with which indexed numeric fields are searched.

Anyways i tried to index the URL field the last day after you suggested to do, but it took 7-8 hours and the query was still incomplete. Then i shut down the pc. Total directory size of the satabase is now just 468 mb, but the last day it once reached 1.28 GB, and then i force closed mySQL.
imagefree
About text field indexing: Indexed take too much space, and in case of large table, it would make the table difficult to manage (like backups, or mySQL operations like optimize etc. Also it makes the changes in table structure terribly slow).

All these are what i faced today. One more thing to say that if you use string indexing, the INSERTs become very slow - appx 9/10 (or more) performance loss in large tables. (i havent tested UPDATEs and DELETEs).

The SELECT query in both cases (indexed string searching, and indexed hash searching) was equally efficient.
Fire Boar
Well, if you've found a way that works, great. I'm a little puzzled about the efficiency of the index though, in theory it should be a lot faster than that. You weren't inserting URLs in alphabetical order or something, right?

About INSERT speed, that is only really relevant if inserting one record takes unreasonably long. The reason being, in almost every scenario you are SELECTing far more than INSERTing.

Either way, if a length pseudo-index works well, then hey, why not? Smile
imagefree
Thanks
imagefree
Got another problem. When i use i am joining two tables one is for comments and one contains users' information.

Comments table contains users' ID too, and i use that to fetch username from users' table. Till now it was working fine.

I added a feature that unregistered users can comment too. Such users have ID 0 by default and id 0 has no entry is users' table. When i use join, it results only those records that have an entry in users table.
Here is the query:

Code:
SELECT dt_comments.* , dt_users.us_username FROM dt_comments INNER JOIN dt_users ON dt_comments.cm_mid = dt_users.us_id WHERE dt_comments.cm_iid = '5'


its output is:

Code:
Array
(
    [cm_id] => 1
    [cm_iid] => 7
    [cm_mid] => 1
    [cm_comment] => WOW!!! Beautiful image. I love this model of honda car. I am definitely gonna share this image on the web. Please share more...
    [cm_ip] => 127.0.0.1
    [cm_date] => 1275295285
    [us_username] => Hamidjee
)


There are 5 more comments by Anonymous users (ie the cm_mid = 0. Otherwise cm_mid contains the unique id of users. ), but those are not in the results. How to force mysql to show those too?

Thanks.
Fire Boar
Aha, here's where we use a type of outer (rather than inner) join called a LEFT JOIN (there is also a RIGHT JOIN which does the opposite):

Code:
SELECT dt_comments.* , dt_users.us_username FROM dt_comments LEFT JOIN dt_users ON dt_comments.cm_mid = dt_users.us_id WHERE dt_comments.cm_iid = '5'


That's all you need! This kind of query will select from the comments table whether or not there is a corresponding entry in the users table. If there is no users entry, the fields for dt_users will all be NULL for those rows.
Related topics
Help with PHP
Help with mysql query
PLease Help ME its urgent!!!!
PHP Form Sending problem
VERY URGENT! Please Help Me!
AJAX and PHP contact form
Very Urgent....!!! Help...
How to recover lost data?
help me create a query
Mynes Forum. [Design, Coding, RE,..]
Looking for blog editor
PHP search function with MySQL fulltext
Help, Constant Crashing !URGENT
History Paper on Newfoundland joining the confederation
Reply to topic    Frihost Forum Index -> Scripting -> Php and MySQL

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.