FRIHOSTFORUMSSEARCHFAQTOSBLOGSCOMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Help me in Extracting page number from document





bukaida
The document is inserted as a text file in the database. the table structure is

id(number) Title(Text) Body(Long Text)

The body may be a 500 page book in text format.

The content in the text file is written (continuously) as

-(-p- 1 )-

content of page 1

-(-p- 2 )-

content of page 2

---so on

Now I need to store this data as individual page in another table automatically which has the following structure--

id ( may be FK from previous table)
content (page wise text content, automatically extracted)
page number( automatically extracted)

I am in total blank how to generate the second table from the first one. I am using PHP 5 with Mysql 5 and Apache 2.2.
Any suggestion is gracefully accepted.

P.S-- The actual page numbers are unicode non-english numbers which are unique and never repeated inside the body of the text.
sonam
This example is based on txt file instead of mysql but you can implement on mysql, too. Just need to read mysql table and then insert in another table.

Code:
<?php
$file = file_get_contents("test.txt");
$pages =  substr_count($file, '-(-p-'); // this will coumt how many pages are in text
for($i = 1; $i<$pages+1; $i++){  // start a loop dependent of pages
   $f = $i+1; // create next page number
   $page = '-(-p- ' . $i . ' )-';  // this will create start point of text
   $page2 = '-(-p- ' . $f . ' )-'; // this create end point of text
   $chunk = explode($page, $file);  // create one array from text and fix cursor on start point dependent of page
   if($i <$pages) {
          $chunk2 = substr($chunk[1], 0, strrpos($chunk[1], $page2)); // cut text on the end of page and in dependent of next page
   } else {
         $chunk2 = $chunk[1]; // the last page don't need cuting
   }
echo 'Page: ' . $i  . ' <br />' . $chunk2 . '<br /><br />';
}
?>


Sonam
bukaida
@Sonam
Thanx man.
Is it possible to use only one table, I mean only the second. The basic information is in a text file on harddisk. So instead of inserting it to a database and again retrieve it for processing, is it possible to apply select-->process-->insert logic directly? From your suggestion it appears to me that it may be possible. Only thing is the size of the file-- sometimes more than 1000 pages i.e 1.8 MB plain Text!!
sonam
Hmmm, my English is not perfect and maybe I don't understand you.
1. Are your id, title and pages in mysql or in pure txt file (on hard)?
2. Do you want to read (load) pages separately without copy in new database?

Everything is possible but from your previous post I understand you want to copy every page in separate row in new database. But now I am not sure did I understand you.

Sonam
bukaida
previously I was storing the text file in table 1 first from the harddisk. The text file is having the page numbers (in a particular format as stated earlier) but all the content is in a single file. I have to store the content in database for further query. Now my question is--

Should I load the complete text file in table 1 and then retrieve it , apply your code and store the processed data to table 2 ? Or there is no need of table 1, directly browse the file , process and upload it to table 2 ?
The code for first table ( which was modified according to your suggestion earlier)

Code:

<?php header("Content-Type: text/html; charset=UTF-8");?>
<?php
include 'connect.php';
  $title=$_POST['title'];
  $filename = $_FILES['file']['tmp_name'];
   if (($handle = fopen($filename, "rb"))) {
        $stream =mysql_real_escape_string(fread($handle, filesize($filename)));
        fclose($handle);
        unlink($_FILES['file']['tmp_name']);
        $type = $_FILES['file']['type'];
       
        echo('Title:'); echo $title;
       
       
          $qstr = "INSERT INTO articles (body,title) VALUES ('$stream','$title')";
        $result=mysql_query($qstr)or die(mysql_error());
    }
   
 if($result){
echo'<font color="green" size=+2>Records inserted successfully</font>';
}

 ?>



What I am asking is to combine this one with the code suggested by you on the top so that only one table is needed. My source text files are on my pendrive.
Not sure still whether it is clear now Very Happy
bukaida
sonam wrote:
This example is based on txt file instead of mysql but you can implement on mysql, too. Just need to read mysql table and then insert in another table.

Code:
<?php
$file = file_get_contents("test.txt");
$pages =  substr_count($file, '-(-p-'); // this will coumt how many pages are in text
for($i = 1; $i<$pages+1; $i++){  // start a loop dependent of pages
   $f = $i+1; // create next page number
   $page = '-(-p- ' . $i . ' )-';  // this will create start point of text
   $page2 = '-(-p- ' . $f . ' )-'; // this create end point of text
   $chunk = explode($page, $file);  // create one array from text and fix cursor on start point dependent of page
   if($i <$pages) {
          $chunk2 = substr($chunk[1], 0, strrpos($chunk[1], $page2)); // cut text on the end of page and in dependent of next page
   } else {
         $chunk2 = $chunk[1]; // the last page don't need cuting
   }
echo 'Page: ' . $i  . ' <br />' . $chunk2 . '<br /><br />';
}
?>


Sonam


The code is working perfectly as far as the page numbers are english. However it is not working for non english numbers ( eg. I, II, III, IV instead of 1,2,3,4). However the total count ( echo $pages) is coming perfectly along with total number of start position -(-p and end position )-. Please help.
sonam
Ah, I understand your question now. You want to store pages from txt file direct in second table and escape first one. Yes this is possible with this code.

Quote:
The code is working perfectly as far as the page numbers are english. However it is not working for non english numbers ( eg. I, II, III, IV instead of 1,2,3,4). However the total count ( echo $pages) is coming perfectly along with total number of start position -(-p and end position )-.


This is little bit tricky. If you are not follow the same page numbers systematization for all books it will be hard to prepare code for all books. For example if you use I, II, III in one book and i, ii, iii in second then this can produce error. Otherwise, if you are follow the same systematization for all books then I can try to customize this script for you.

Sonam
bukaida
sonam wrote:
Ah, I understand your question now. You want to store pages from txt file direct in second table and escape first one. Yes this is possible with this code.

Quote:
The code is working perfectly as far as the page numbers are english. However it is not working for non english numbers ( eg. I, II, III, IV instead of 1,2,3,4). However the total count ( echo $pages) is coming perfectly along with total number of start position -(-p and end position )-.


This is little bit tricky. If you are not follow the same page numbers systematization for all books it will be hard to prepare code for all books. For example if you use I, II, III in one book and i, ii, iii in second then this can produce error. Otherwise, if you are follow the same systematization for all books then I can try to customize this script for you.

Sonam


Actually if it is possible to capture the text inside -(-pagenumtext-) considering them as string instead of number, may solve my problem. I mean if the structure is considered as--

-(-pagenumtext1-)

Page 1 content

-(-pagenumtext2-)

Page 2 content

-(-pagenumtext3-)

---and so on

where pagenumtext1, pagenumtext2, pagenumtext3 etc are strings not numbers.
sonam
Quote:
-(-pagenumtext1-)

Page 1 content

-(-pagenumtext2-)

Page 2 content

-(-pagenumtext3-)


Actually this is the same like first sample. For my code is not important is there number or string it is important what is there because I must know what to search and how to display. For example if you have:

-(-p- I )-

content of page I

-(-p- II )-

content of page II

-(-p- 1 )-

content of page 1

-(-p- 2 )-

content of page 2

then I need to know structure of txt file. Or, Rolling Eyes I got one idea. I will try to realize this today. Wink

Sonam
sonam
Ok, I think this will work for you for any page what is in right bracket

Code:
<?php
$file = file_get_contents("test2.ttxt");
$pages =  substr_count($file, '-(-p-');
$chunk = explode('-(-p- ',  $file);
foreach($chunk as $val) {
    trim($val);
    $chunk2 = explode(')-', $val);
    trim($chunk2[0]);
    trim($chunk2[1]);
    if(!empty($chunk2[1])){
       echo 'Page: ' . $chunk2[0]  . ' <br />' . $chunk2[1] . '<br /><br />';
     }
}
?>


Sonam
bukaida
Giving it a try. Will post the result here.
Related topics
convert ur PDF files to word documents
Not Voting is Reasonable for People Who Want Freedom
Hide Part of Your Web Site from Yahoo!
Google™ API Key?
Goto page 1, 2, 3 ... 143, 144, 145 Next
HELP! Iframe opens in new window - SOLVED
Pagination
250 frih to help increase page rank
Never forget a number again! A memory technique I use.
Ajax Loader
what does this javascript do
Pagination of data from an UNION query
Page Rank and It’s SEO Strategies
I can give 50 frih to help increase page rank
Reply to topic    Frihost Forum Index -> Scripting -> Php and MySQL

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.