Project

General

Profile

Bug #6933

French accents encoding issues in exercises

Added by Alain Deschênes about 6 years ago. Updated almost 6 years ago.

Status:
Bug resolved
Priority:
Normal
Assignee:
-
Category:
Exercises
Target version:
Start date:
18/01/2014
Due date:
% Done:

100%

Estimated time:
4.00 h
Complexity:
Normal
SCRUM pts - complexity:
?

Description

Depuis la dernière mise à jour téléchargée sur GitHub 1.9.x, il existe un problème lors de la création d'un Exercice dont le nom comporte des caractères francophone.

Testé sur stable.chamilo.org via Firefox 26 et IE 11 via Windows 7

Le problème a aussi été soulevé sur le forum au post suivant : [[http://www.chamilo.org/phpBB3/viewtopic.php?f=14&t=4953]]

À votre service,

Alain


Files

encodage_exercices.jpg (66.1 KB) encodage_exercices.jpg Alain Deschênes, 18/01/2014 17:24

Related issues

Related to Chamilo LMS - Bug #6985: Using < or > in a "fill blanks" answer does not workBug resolved13/02/201424/04/2014

Actions

Associated revisions

Revision 4ada8037 (diff)
Added by Hubert Borderiou about 6 years ago

Add htmlentitydecode for test name in breadcrumb - ref #6933

Revision 520119c6 (diff)
Added by Hubert Borderiou about 6 years ago

Replace htmlentities and html_entity_decode with api_... - ref #6933

Revision 044b39d9 (diff)
Added by Hubert Borderiou about 6 years ago

Character encoding problem in test title - part 2 - ref #6933

History

#1

Updated by Yannick Warnier about 6 years ago

  • Target version set to 1.9.8

Je joins Hubert et Yoselyn à la tâche, au cas où quelqu'un se souviendrait précisément du problème. De mon côté, je n'ai qu'un souvenir assez vague mais il me semble qu'il doit s'agir des dernières modifs au sujet de api_html_entity_decode() et api_htmlentities(), et leur usage en version native PHP qui ne prend pas en compte UTF-8 par défaut... non?

#2

Updated by Hubert Borderiou about 6 years ago

  • Assignee set to Hubert Borderiou
#3

Updated by Hubert Borderiou about 6 years ago

  • Status changed from New to Assigned

Done https://github.com/chamilo/chamilo-lms/commit/520119c6052e5ba991b87357cecc0c8e7838e930

Cela vient de l'impossiblité de mettre des < ou > dans un titre de test car cela affichait du HTML.

Le problème est le même avec le titre des questions...
Un enseignant ne peut pas mettre une question avec dans le titre <...>

#4

Updated by Hubert Borderiou about 6 years ago

I continue working on it, e.g. on Learning Path, to have the correct title displayed.

#6

Updated by Hubert Borderiou about 6 years ago

  • Status changed from Assigned to Needs testing
  • Assignee deleted (Hubert Borderiou)
  • Estimated time set to 4.00 h
#7

Updated by Yannick Warnier about 6 years ago

  • Category set to Exercises
  • Status changed from Needs testing to Needs more info
  • Assignee set to Hubert Borderiou
  • % Done changed from 0 to 50

I have added a few comments to Github. Please review.

#8

Updated by Alain Deschênes about 6 years ago

Salut l'équipe,

Testé de part et d'autre et tout fonctionne bien.

Merci,

Alain

#9

Updated by Yannick Warnier about 6 years ago

OK! Merci pour les tests.
Reste pour Hubert à revoir un peu le code. J'ai mis des commentaires à ses modifications. Espérons les avoir bientôt corrigés.

#10

Updated by Yannick Warnier about 6 years ago

  • Status changed from Needs more info to Assigned
#11

Updated by Hubert Borderiou almost 6 years ago

  • Status changed from Assigned to Needs testing
  • Assignee deleted (Hubert Borderiou)

Modifications done.

#12

Updated by Yannick Warnier almost 6 years ago

  • Subject changed from Problème encodage des caractères francophones sur les Exercices to French accents encoding issues in exercises
  • Assignee set to Francis Gonzales
#13

Updated by Julio Montoya almost 6 years ago

  • Status changed from Needs testing to Needs more info
  • Assignee deleted (Francis Gonzales)

I don't think this change is fine. The contents must be added in the database "as is". So any api_htmlentities() or whatever,
just the classic Database::escape_string().

Right now (with that change) an exercise will have differents encodings in the title, in the description, in the question descriptions/feedback, answers description etc. So no change should be done. The issue here is when showing the data in a HTML page.

I'm dealing with an exercise export to qti and the contents of the exercise/questions/answer have different formats.

#14

Updated by Hubert Borderiou almost 6 years ago

I understand your point of view.
Do you think we can we have it like that for 1.9.8 and improve it for 1.10 ?

I'm dealing with an exercise export to qti and the contents of the exercise/questions/answer have different formats.

I see, you need to parse datas and convert it to a correct format for your export.
It gives you more work to do, our solution is to help web users.

#15

Updated by Yannick Warnier almost 6 years ago

Is there a way we can detect the encoding of all these things before we insert them in the database.

We had the same issue with learning paths (SCORM packages in different encoding) in the past, then most people standardized to UTF-8, but we still left the field "encoding" in the learning path edition for that, just in case. The real issue was that some people packaged ISO-8859-1 and registered in the SCORM package that it was UTF-8...

In my view, everything that is in the database should be plain UTF-8 (not HTML chars, not ISO, just UTF-8). If we can ensure that, then we know exactly what to do when printing the titles, questions, answers and comments depending on where we print them.

There are some hints (in the comments) on how to do that here: http://php.net/mb_check_encoding. Iconv doesn't seem to have anything similar, and intl is full of functions (http://www.php.net/manual/en/book.intl.php) and the only one that seems close to what we need is UConverter::getSourceEncoding().
Additional note: Ivan Tcholakov did a great work at unifying all these libraries into main/inc/lib/internationalization.lib.php, maybe there's something there.

If we can detect the original encoding, we can ensure it is stored as UTF-8 and thus we can convert it to... whatever we want when printing on screen or in XLS exports.

#16

Updated by Julio Montoya almost 6 years ago

Yannick Warnier wrote:

Is there a way we can detect the encoding of all these things before we insert them in the database.

We had the same issue with learning paths (SCORM packages in different encoding) in the past, then most people standardized to UTF-8, but we still left the field "encoding" in the learning path edition for that, just in case. The real issue was that some people packaged ISO-8859-1 and registered in the SCORM package that it was UTF-8...

In my view, everything that is in the database should be plain UTF-8 (not HTML chars, not ISO, just UTF-8). If we can ensure that, then we know exactly what to do when printing the titles, questions, answers and comments depending on where we print them.

There are some hints (in the comments) on how to do that here: http://php.net/mb_check_encoding. Iconv doesn't seem to have anything similar, and intl is full of functions (http://www.php.net/manual/en/book.intl.php) and the only one that seems close to what we need is UConverter::getSourceEncoding().
Additional note: Ivan Tcholakov did a great work at unifying all these libraries into main/inc/lib/internationalization.lib.php, maybe there's something there.

If we can detect the original encoding, we can ensure it is stored as UTF-8 and thus we can convert it to... whatever we want when printing on screen or in XLS exports.

fckeditor converts the HTML content to "HTML chars" so we have already the problem.

Agree to having everything stored in UTF-8. This seems more a task for 1.10.

I guess I have to put a lot of validations when exporting an exercise to qti.

#17

Updated by Hubert Borderiou almost 6 years ago

The origin of the problem is to be able to create test with a title with <...> characters in it.
It is not UTF-8 or ISO-latin encoding.

The question is "could we convert a text with api_htmlentities before saving it in the database ?"

The title of the test can be displayed in HTML (the most common, lots of occurence in all the code)
or in text, e.g. in a form's input to be able to edit it (used twice in the platform).

The easiest way to solve this problem was to save the title with api_htmlentities in the Database and use api_html_entity_decode twice when I have to display it as text.

Otherwise, it means to find all occurence of exercise title displayed as HTML in all the code <sic> to use api_htmlentities before displaying it.

It is going to be quite a long work, I guess.

So, we can :
- [reverse to previous version]
Teacher cannot create test with <...> in the title (or he will use < and > )
Data are saved in the database according to the input form, no convertion to HTML entities
Fast solution (I can do it)

- [keep this version - do nothing] save exercise's title with HTML entity in the database (as it is with the modifications)
Teacher can create test with <...> in the titll
Data of exercise title are converted to html entities before being saved in the database
Fastest solution ;)

- try to find all occurence of HTML display of exercise's title to convert it before displaying it.
Teacher can create test with characters they want in title <...> etc...
Data are saved in the database according to the input form, no convertion to HTML entities
Slow solution

BUT
with doctrine, no DB call'll be done directly in functions.
Wouldn't it be possible, then, to add a filter AFTER the call to the title in the database, and before giving it to the function, to have the text converted to HTMLentities ? (and decode it for the 2 occurences, we need it as text, not HTML)

WHat do you think ?

#18

Updated by Yannick Warnier almost 6 years ago

I'm OK with keeping the current version for the time we have left in 1.9.
Julio, for the export, just reverse the process with api_html_entity_decode().

Just as a note for 1.10, the title shouldn't be HTML. It should be plain text. The question's body is there to put HTML, so anything inserted in the title should be considered plaintext and, as such, be escaped to HTML when printed to HTML (and not inside a form field).
Thanks for the discussion on this topic.

#19

Updated by Hubert Borderiou almost 6 years ago

You can use function
public function get_formated_title() (for object exercise)
or
public static function get_formated_title_variable($in_title) (for variable exercise title)
in
main/exercice/exercise.class.php

#20

Updated by Julio Montoya almost 6 years ago

Got it.

#21

Updated by Yannick Warnier almost 6 years ago

  • Status changed from Needs more info to Bug resolved
  • % Done changed from 50 to 100

Also available in: Atom PDF