FRIHOST FORUMS SEARCH FAQ TOS BLOGS COMPETITIONS
You are invited to Log in or Register a free Frihost Account!


Getting Hebrew User Input...





smartbei
Hello, I am writing an application that will get posted hebrew user input. I then want the application to save the inputted text in a file. The problem is that the hebrew never shows up right.

What I am looking for is a function that will convert hebrew text to utf-8. I have tried utf8_encode():
Code:

$test = "גדכדגכfasdf";
$test1 = utf8_encode($test);
echo $test1;

but this outputs "âãëãâëfasdf". Not exactly hebrew. Anyone know how I can do this?

Thanks!
hexkid
smartbei wrote:
Code:

$test = "גדכדגכfasdf";
$test1 = utf8_encode($test);
echo $test1;


Try this: save the file as data.inc.php
Code:
<?php
$test = "גדכדגכfasdf";
?>

Then create different files with different headers and interpret the results you see in the browser Smile
Code:
<?php
header('Content-Type: text/html; charset=utf-8');
// header('Content-Type: text/html; charset=iso-8859-1');
// header('Content-Type: text/html; charset=us-ascii');
require_once 'data.inc.php';
echo <<<HTML
<html>
<head>
  <title>test</title>
</head>
<body>
  <p>$test</p>
</body>
</html>
HTML;
?>


Also read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), by Joel Spolsky
smartbei
Yes, I have tried different encodings by opening the webpage and going through view -> encoding -> try most of them. However, it works (not surprisingly) well only on hebrew, not on Utf-8 (which, from what I gather from the article linked to above) should be able to show just about anything. Also, no matter what I put in the charset, it doesn't change the way the browser renders the page. Here is what I have (extremely, simplified because I am trying to find the problem):
Code:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
גדכדגכגדכדגכ
</body>
</html>

Even when I use hebrew (windows-1255 or ISO-8859-8-1) as the charset, the text shows up as latin characters with accents. Apparently, the browser is entirely ignoring my charset call. ???
hexkid
smartbei wrote:
Apparently, the browser is entirely ignoring my charset call. ???


The charset must be in the headers, not in a meta tag in the head of the document. The browser needs to know what charset is in effect before the head and meta tags reach it!

I've made a PHP script for you to try Smile
smartbei
Gotcha. Is there a possible solution if I want hebrew to display well in a non-php file?
Also, why doesn't this work:
Code:

<?php
header('Content-Type: text/html; charset=utf-8');
$test = utf8_encode('גדכדגכגדכדגכ');
?>
<html>
<head>
</head>
<body>
<?php echo $test1; ?>
</body>
</html>

It does set the browser to UTF-8, but the text shows up in question marks. Is this not the correct way to encode text to utf-8?[/code]
ganbate
smartbei wrote:
Gotcha. Is there a possible solution if I want hebrew to display well in a non-php file?
Also, why doesn't this work:
Code:

<?php
header('Content-Type: text/html; charset=utf-8');
$test = utf8_encode('גדכדגכגדכדגכ');
?>
<html>
<head>
</head>
<body>
<?php echo $test1; ?>
</body>
</html>

It does set the browser to UTF-8, but the text shows up in question marks. Is this not the correct way to encode text to utf-8?[/code]


your variable is $test.
Code:
$test = utf8_encode('גדכדגכגדכדגכ');

but why you print the $test1 variable?
Code:
<?php echo $test1; ?>
hexkid
smartbei wrote:
Gotcha. Is there a possible solution if I want hebrew to display well in a non-php file?

Yes. You have to know the encoding of the original data.
If the original data is encoded in utf-8, read in utf-8, (stored by the script/webserver in whatever encoding it prefers), output with the proper headings in utf-8; it will display properly in the browser.
If the original data is encoded in iso-8859-1, but read as if it was utf-8; it's already messed up and you have no way to understand it!
If the original data is encoded in utf-8, byt read as if it was iso-8859-1; again it's already messed up and you have no way to understand it!

So there are three important things to get right:
1. What encoding is the original data in?
2. What encoding is used when reading that data?
3. What encoding is used to display the data?

1. and 2. must match

Let's say the original data comes from a file on disk you edited with your editor of choice. The same encoding you use to save that file must be used to read it later from PHP or wherever.

smartbei wrote:
Also, why doesn't this work:
Code:

<?php
header('Content-Type: text/html; charset=utf-8');
$test = utf8_encode('גדכדגכגדכדגכ');
[...]

It does set the browser to UTF-8, but the text shows up in question marks. Is this not the correct way to encode text to utf-8?[/code]

utf8_encode() reencodes iso-8859-1 as utf-8.
If you pass it something in another encoding (like utf-8), it will do its best anyway (which normally results in ???, or, if you're lucky a box).
smartbei
Yes, that is what I thought.
How can I then use php to encode incoming user hebrew data to utf-8?
I have looked through the php functions relating to utf-8 but I cannot seem to find one that fits. How would I go about writing my own?
hexkid
smartbei wrote:
How can I then use php to encode incoming user hebrew data to utf-8?
Why don't you accept only utf-8 for incoming data?

If the data comes from an HTML form, besides setting the 'Content-Type' header to utf-8, limit the input data encoding to utf-8

Code:
<form action="..." method="post" accept-charset="utf-8">
<input ...>
<!-- ... -->
</form>


http://www.w3.org/TR/html4/interact/forms.html#h-17.3 wrote:
accept-charset = charset list [CI]
This attribute specifies the list of character encodings for input data that is accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received.

The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element.

I believe the absence of the 'accept-charset' attribute in the form is viewed by some browsers (guess which one?) as permission to send data in whatever encoding they prefer with no regard to the encoding used to transmit the document.
smartbei
That appears to have solved the issue. Thank you very much hexkid.
hexkid
smartbei wrote:
That appears to have solved the issue. Thank you very much hexkid.


You might want to check whether the browser can deal with utf-8 first.
Maybe something like this
Code:
<?php
if (isset($_SERVER['HTTP_ACCEPT_CHARSET'])) {
  if (strpos(strtolower($_SERVER['HTTP_ACCEPT_CHARSET']), 'utf-8') === false) {
    exit('Please, configure your browser to accept utf-8 encodings.');
  }
}
?>

Or you might check the accepted encodings and use one of them and insert a hidden field in the form to be able to properly decode the data when the form is posted.
Related topics
How To : Secure Your PHP Website
Interview: Derek Liu, Gaia Online Anime Community
managing a bilingual site
How do I enable user input?
html and css code from form inputs
Give each user a sub directory?
Help installing Fedora Core 6
Vote for the best Superman photo edit. (3)
mysql querries and subquerries problem..
How competitive is PHP?
Prevent mysql error showing when code needs user input
javascript framework vs php framework
How to have an input in c++
escapeshellcmd seems to blank my user input
Reply to topic    Frihost Forum Index -> Scripting -> Php and MySQL

FRIHOST HOME | FAQ | TOS | ABOUT US | CONTACT US | SITE MAP
© 2005-2011 Frihost, forums powered by phpBB.