Non-english charactes

Feed 11 posts, 3 voices

Jul 30, 2008 03:33
Avatar
10 posts

Hi there, I have just installed frog and let me start by saying that it looks good :)

But I do unfortunately have one major problem with it - it does not accept any of the danish characters - and since I am going to use this to create pages with danish in them, this is a show stopper.

So my question is - are you planning on adding support for encoding those special signs that are found in other languages?

The problem is easily reproduceable - simply create a page and input a non-english character in the title or the actual page. After pressing save - everything from and including the sign is removed.

 
Jul 30, 2008 04:01
Avatar
818 posts

Hi tinus - this has been observed before, and I'm not sure of everything involved. I do know that those using, e.g., French language sites (which include characters with diacritics!) seem to behave properly. The problem is the stripping of some the characters from the page "slug", so far as I know.

You could have a look at this thread, although there are not many answers there. You should also check to ensure that your Frog mySQL db is set up for UTF8.

You can also compare how things look at this Hungarian Frog site, and this French language site. Someone has this working in Bengali, too, but I can't find the URL just now!

 
Jul 30, 2008 07:16
Avatar
10 posts

Hi david,

Yes, I kinda figured it was possible somehow (given that this CMS is being developed by a french person), but I have had no luck in tracking the problem down. I need to look more into the code before I can give a definitive answer on that however.

I looked at the thread, but it did not really provide any answers to me, because I have two sites running - one with a .htaccess file and one withouth - and both have the problem. Both databases are using UTF8 (utf8_danish_ci) so I do not think that is the problem.

Thanks for the quick answer though!

 
Aug 18, 2008 17:36
Avatar
10 posts

Ok, I went back and looked at the code and I was unable to figure out where things went wrong, so I started looking at what mysql performed and found the following:

?140 Connect example-user@localhost on example-host 140 Query set names 'utf8' 140 Prepare [1] SELECT * FROM frog_setting ... bunch of select statements 140 Query UPDATE frog_page_part SET name='body', filter_id='textile', page_id='3', content='This is my site. I\'m living in ... I\'m doing some nice thing, like that and that ... æøå test', content_html=' <p>This is my site. I’m living in … I’m doing some nice thing, like that and that … æøå</p>' WHERE id = 3 140 Execute [7] SELECT tag.id AS id, tag.name AS tag FROM frog_page_tag AS page_tag, frog_tag AS tag WHERE page_tag.page_id=3 AND page_tag.tag_id = tag.id ... a few more select statements 140 Quit

 
Aug 18, 2008 17:38
Avatar
10 posts

Ah - the missing edit button... anyway, the important part is that both the content and content_html contains the letters æøå. But if I look into the database nothing has been updated?!

When I execute the update manually it works fine - and now the letters also show up in the page editor - but they dissapear if I save the page again.

any bright ideas of where I can debug this issue further?

 
Aug 19, 2008 06:27
Avatar
818 posts

Hmmm...

I'm wondering if this has anything to do with a Frog "helper", kses.php. When I was trying to sort out tags in comments once, this cropped up.

I could be completely wrong, so don't spend too much time on this if you get that feeling! :) But kses.php strips out a lot of stuff, and maybe non-ASCII characters (you know what I mean!) are going, too. I did some googling for kses.php and utf-8, but didn't find much useful.

If this does prove to be the culprit, then it might be worth looking int HTML Purifier, or at least suitably modifying this filter!

 
Aug 19, 2008 09:29
Avatar
10 posts

Sadly that is not the case -- I tried dumping the page title before and after running ks but the string remained unaltered.

I can not get my head around why the query that mysql logs does not do anything - because to me it seems like frog passes the correct string and creates the correct query, but mysql for some reason rejects it.

 
Aug 19, 2008 09:34
Avatar
396 posts

@David & @Tinus - This probably doesn't help you out a lot, but I've defined an issue for this and any other UTF-8 / i18n related problems.

Also, I think you might be experiencing issue 24 and I also think you are correct that Kses might be the problem David.

I've been looking at Kses and this stuff in between other things during the weekend and I've already added a small reminder to the code to possibly replace Kses with HTML Purifier.

If I don't get around to it before the weekend, I'll definitely create a htmlPurifier.php helper for Frog next weekend. Once that is available, we can do some testing.

@Tinus - just for kicks, if you want to test whether kses is the problem, look for lines like use_helper('kses') (if I remember correctly) and comment those out... might break your installation, but easily undone since you uncommented them... (offcourse ;-) )

 
Aug 19, 2008 10:07
Avatar
10 posts

This is not related to kses - I have tried both uncommenting kses (that did not change anything) and to view the output after running kses (the output stays untouched after running kses).

I have looked at the issues you linked to but have been unable to see a connection since what I am experiencing is somewhat different. All input after entering a special char (including the special char) is simply removed and I suppose this is due to some utf-8 conflict, but I have yet to identify where this error occurs.

I will look further into it today.

 
Aug 19, 2008 10:09
Avatar
10 posts

Oh and btw, I tried having my database encoded in latin1, utf-8 general, utf-8 danish and none of it made any difference.

 
Aug 22, 2008 15:05
Avatar
10 posts

I finally figured out what caused the problem - php, mysql and utf8.

It seems like php is trying to send an uft8 string to mysql, but it simply ignores the special chars. So I tried to send it as latin-1 which suprisingly worked. The problem now however is that it is impossible to fetch the char from mysql again, since php now can not figure out what it has recieved.

so in order to fix this I need every update/insert to use latin-1 and every select to use utf8.

oh the joy of character encodings...

 
Sep 23, 2008 03:53
Avatar
10 posts

Hurray! It was caused by an error on my host, effectively ensuring that the backend pages where served as iso-8859-1 even though they were suppose to be utf-8. So I added the following to my .htaccess file:

php_value default_charset "UTF-8"

And now everything works, and I can finally start using frog!