FRIHOST FORUMS SEARCH FAQ TOS BLOGS COMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Extract non-code from website





alalex
Hello guys,
I have a big problem I've been trying to solve for days but I am completely unable to do it:

Imagine Google Translator: It takes all website's text, translates it, and leaves the code intact. Well I need to do sort of the same thing. I have a website's code stored in a variable, and I need to pass all of its text through a function, so the first thing that came to my mind was to use preg_replace_callback, but I can't create the appropriate regexp to select only the text Sad

This is the regexp I have so far:
Code:
@<(?:title|div|p|h1|h2|h3|a)(?:\s)(?:.*)>(.*)</(?:title|div|p|h1|h2|h3|a)>@isU


Any help will be hugely appreciated, I can pay frih$ if wanted

Thanks in advance!
jmraker
If the document is a fairly simple html file (without javascript and escapes <> as entities) you can make a simple html parser

Code:
$arr = preg_split('/[<>]/', $html, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_OFFSET_CAPTURE);
foreach($arr as $tag){
      $let1 = substr($html, $tag[1] - 1, 1);  // Get the letter that split the tag
      if($let1 != '<'){       // This is not a tag, it's some text
                           $text = $tag[0];
                }
}

Or you can try using a better html parser
http://www.google.com/search?q=php+html+parser

but they might use a ton of memory trying to set properties and methods that a browser has.
alalex
The problem here is that I need a parser that will work for any website ideally. So I need something that is able to let me process all text, and then display the html without it being modified.

The best example is an online translator, I need to be able to pass all text through a function, leaving the HTML as is. I am looking in php classes for a html parser that will do that, but no luck yet Sad
Related topics
asian guys and girls,if ur website use gb2312.....come here
PHP-Nuke ha sido Vendido
bb code help - regex
Account Support : Question Regarding Copyrights
my web site design !
File Upload from webpage failing
Help Me Out With A Strange Page Phenomenon!
DreamWeaver
Many links and no high pagerank?
qscomputing.net - my new site
*FREE* Ta SPRING
Game Engines
A good windows version for web server...
Old Member returning
Reply to topic    Frihost Forum Index -> Scripting -> Php and MySQL

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.