Project

General

Profile

Bug #1765

Charset - encoding problems in lp items during Chamilo Update 1.8.x to 1.8.7

Added by Julio Montoya almost 9 years ago. Updated over 8 years ago.

Status:
Feature implemented
Priority:
Normal
Category:
-
Target version:
Start date:
26/07/2010
Due date:
% Done:

100%

Estimated time:
Spent time:
Complexity:
Difficult
SCRUM pts - complexity:
40

Description

Since lp items can have their own charset, it will be a bad idea to update everything to utf8.
I already did many updates from 1.8.x to 1.8.7 and in all cases those tables were mesed thanks to that conversion.

Some cases:

Lp charset in 1.8.6.1 : ISO-8859-15
After migration everything was messed. This was easily fixed changing the LP charset to utf8.

Lp charset in 1.8.6.1 : UTF8
see the screenshot.


Files

charset.png (40.5 KB) charset.png Julio Montoya, 26/07/2010 14:24
set_lp_utf8.php (1.11 KB) set_lp_utf8.php Andre Boivin, 27/07/2010 00:53
lp_character_set_setting.png (90 KB) lp_character_set_setting.png Ivan Tcholakov, 27/07/2010 04:57

History

#1

Updated by Andre Boivin almost 9 years ago

Here's a little script to change all the charset of all leanrpath to UTF8

#2

Updated by Ivan Tcholakov almost 9 years ago

Hello Julio,

This is very difficult topic. I hesitate about being concrete at the moment.

I've got two questions:

1. "After migration everything was messed." - Are those UTF-8 encoded lp items shown well om phpMyAdmin?

2. "This was easily fixed changing the LP charset to utf8" - Did you mean the "Charater set" setting on the page for LP-edition? See the picture.

At the moment I can say the following common thoughts:

1. I accept that there may be some encoding migration problems in LP tool.

2. Using different encodings in the database is very old, historical approach. Also, we have to keep in mind, that in the past the design of LP tool was based on frames which may have different encodings. When we switched into a single page, code about the encodings management may not be updated properly, this is to be checked.

3. Using different encodings in the database if very bad idea. It is very important all the database tables to be UTF-8, without any exceptions. These are the benefits (not complete list):
- we are to avoid yet another layer where we would have to manage encoding conversion. MySQL is not that rich in supporting encodings, also it identifies them in its own way (utf8 instead of UTF-8, etc.).
- we will have better interoperability. If you are an external developer, an if you want to access the lp-tables directly, knowing that they are UTF-8 encoded eases your developments. Also, raw data will be easily seen correctly on phpMyAdmin or MySQL Administrator that makes code support easy.
- Some day we would face seriously the challenge of migration to Chamilo 2.x. If our tables are UTF-8 encoded only, this migration will go smoothly.

4. I think, the LP code needs removing some historical burden (extra encoding conversions, html-entities) and to be cleaned/simplified. Why not to make this for 1.8.8. I would be glad participate in such an activity, I consider it as needed.

In conclusion: Let us try to keep UTF-8 as database encoding ( let us hold the line :-) ). This is possible. We can solve the encoding issues on the PHP/HTML side. Let us avoid "ad-hoc" solutions, but prefer "future-proof" solutions, although this is not easy.

Regards,
Ivan

#4

Updated by Julio Montoya almost 9 years ago

Andre Boivin wrote:

Here's a little script to change all the charset of all leanrpath to UTF8

Thanks Andre for the Script.

#5

Updated by Julio Montoya almost 9 years ago

  • Complexity changed from Normal to Difficult
  • SCRUM pts - complexity changed from ? to 40

Hello Ivan,

Thanks for you quick answer.

Ivan Tcholakov wrote:

Hello Julio,

This is very difficult topic. I hesitate about being concrete at the moment.

Yes it is!

I've got two questions:

1. "After migration everything was messed." - Are those UTF-8 encoded lp items shown well om phpMyAdmin?

You are right.

2. "This was easily fixed changing the LP charset to utf8" - Did you mean the "Charater set" setting on the page for LP-edition? See the picture.

Yes. Changing the "Character set" as in the picture. But this worked only with lp items that, before the migration, had an ISO-8859-15 as I explained.

At the moment I can say the following common thoughts:

1. I accept that there may be some encoding migration problems in LP tool.

Yeap

2. Using different encodings in the database is very old, historical approach. Also, we have to keep in mind, that in the past the design of LP tool was based on frames which may have different encodings. When we switched into a single page, code about the encodings management may not be updated properly, this is to be checked.

Yes, I already did an update from 1.8.5 to 1.8.7. And I found this kind of problems. Even with the 1.8.6.2.

3. Using different encodings in the database if very bad idea. It is very important all the database tables to be UTF-8, without any exceptions. These are the benefits (not complete list):
- we are to avoid yet another layer where we would have to manage encoding conversion. MySQL is not that rich in supporting encodings, also it identifies them in its own way (utf8 instead of UTF-8, etc.).
- we will have better interoperability. If you are an external developer, an if you want to access the lp-tables directly, knowing that they are UTF-8 encoded eases your developments. Also, raw data will be easily seen correctly on phpMyAdmin or MySQL Administrator that makes code support easy.
- Some day we would face seriously the challenge of migration to Chamilo 2.x. If our tables are UTF-8 encoded only, this migration will go smoothly.

You are totally right. i'm changing the task name something like: Bug with lp_item whe migrating etc

4. I think, the LP code needs removing some historical burden (extra encoding conversions, html-entities) and to be cleaned/simplified. Why not to make this for 1.8.8. I would be glad participate in such an activity, I consider it as needed.

Great to hear your enthusiasm! All help is welcome!

In conclusion: Let us try to keep UTF-8 as database encoding ( let us hold the line :-) ). This is possible. We can solve the encoding issues on the PHP/HTML side. Let us avoid "ad-hoc" solutions, but prefer "future-proof" solutions, although this is not easy.

I agree with you.

Regards,
Ivan

#6

Updated by Julio Montoya almost 9 years ago

  • Subject changed from Don't change charset to utf8 for lp_item + lp, course tables during Chamilo Update to Charset - encoding problems in lp items during Chamilo Update 1.8.x to 1.8.7
  • Target version set to 1.8.8 stable
#7

Updated by Ivan Tcholakov almost 9 years ago

I would like to share a premature suggestion for your consideration and comments.

We can (or we have to?) get rid of the "Character set" setting that hides in the Learning path tool.

It was for serving the old design with the frames, and it stores what the encoding of the manifest was.
The same encoding from the manifest was to be used for storing the lp-items in the database. So, this
setting presense leads us to the following problems:

- we have to open a new database connection which has to be adjusted to work with encoding that might be
different than the system encoding. This is a complication that is not acceptable, I think.
- then we have to use the functions from the PHP-library to rencode the retrieved from the database lp-items
to the system encoding - yet another complication.

The lp-item should alays be encoded in the database as UTF-8. What is to be done:

1. UTF-8 coversion in the database should be done during the upgrade, this is implemented.
2. When a new item is created using the LP-editor, it should be treated using the system encoding
(which may not be UTF-8). When the lp-item is passed to be stored in the database, the MySQL connection
converts it to UTF-8 transparently.
3. When SCORM-package is imported, its manifest should be recoded to the system encodeing also.
Then the correspondent items may be storred in the database as it is described in 2.
4. When we export LP or SCORM package there are two options:
- to always export as UTF-8 - not bad option;
- to always export using the system encoding - I think, this is the most acceptable way according to the user's
intuition or expecation;
- or if we want the setting "Character set" to stay, we may use its value. I can accept such a choice;
- or to ask the user about the encoding for export, but I don't like this way, the user should not deal with encodings.
5. The logic to the course archiver/restorer is to be checked, but probably changes there would not be needed.

I am not precise about some less important details. For example, if the system encoding is not UTF-8,
you will restricted to import SCORM packages with "mapable" encodings only. But if your system is
UTF-8 (which is the good choice), you would be able to import arbitrary encoded packages.

If we ignore the LP-setting "Character set", then our task would be relatively easy - we would have to strip
unnecessary PHP code for encoding conversions. The LP tool will work correctly with the encodings
and it will become faster, by the way.

#8

Updated by Ivan Tcholakov almost 9 years ago

12294:a899e1cecd22 Task #1765 - A transaction for the 1.8.7.1 release. Minor optimizations, replacement of the function calls include(), require(), include_once() and require_once() with the correspondent statements.
http://code.google.com/p/chamilo/source/detail?r=a899e1cecd229cbbe771f72124040ce7030dd615&repo=classic

#9

Updated by Ivan Tcholakov almost 9 years ago

12295:db8627f0a6bb Task #1765 - A transaction for the 1.8.7.1 release. Correction about avoiding errorneous second attempt for loading the library 'text.lib.php'.
http://code.google.com/p/chamilo/source/detail?r=db8627f0a6bbe3873061833d94c3cbad01eef67d&repo=classic

#10

Updated by Julio Montoya almost 9 years ago

Ivan Tcholakov wrote:

I would like to share a premature suggestion for your consideration and comments.

We can (or we have to?) get rid of the "Character set" setting that hides in the Learning path tool.

It was for serving the old design with the frames, and it stores what the encoding of the manifest was.
The same encoding from the manifest was to be used for storing the lp-items in the database. So, this
setting presense leads us to the following problems:

- we have to open a new database connection which has to be adjusted to work with encoding that might be
different than the system encoding. This is a complication that is not acceptable, I think.
- then we have to use the functions from the PHP-library to rencode the retrieved from the database lp-items
to the system encoding - yet another complication.

The lp-item should alays be encoded in the database as UTF-8. What is to be done:

good idea

1. UTF-8 coversion in the database should be done during the upgrade, this is implemented.
2. When a new item is created using the LP-editor, it should be treated using the system encoding
(which may not be UTF-8). When the lp-item is passed to be stored in the database, the MySQL connection
converts it to UTF-8 transparently.
3. When SCORM-package is imported, its manifest should be recoded to the system encodeing also.
Then the correspondent items may be storred in the database as it is described in 2.

I think is a good idea to override the package encoding and use always utf8, i'm tired with the problems regarding the lp encoding when migrating, or uploading new packages

4. When we export LP or SCORM package there are two options:
- to always export as UTF-8 - not bad option;
- to always export using the system encoding - I think, this is the most acceptable way according to the user's

It sounds good to export always to the system encoding.

intuition or expecation;
- or if we want the setting "Character set" to stay, we may use its value. I can accept such a choice;
- or to ask the user about the encoding for export, but I don't like this way, the user should not deal with encodings.
5. The logic to the course archiver/restorer is to be checked, but probably changes there would not be needed.

I am not precise about some less important details. For example, if the system encoding is not UTF-8,
you will restricted to import SCORM packages with "mapable" encodings only. But if your system is
UTF-8 (which is the good choice), you would be able to import arbitrary encoded packages.

If we ignore the LP-setting "Character set", then our task would be relatively easy - we would have to strip
unnecessary PHP code for encoding conversions. The LP tool will work correctly with the encodings
and it will become faster, by the way.

#12

Updated by Ivan Tcholakov almost 9 years ago

  • Status changed from New to Assigned
  • Assignee set to Ivan Tcholakov
#13

Updated by Ivan Tcholakov almost 9 years ago

12534:c846dfed09cb Task #1765 - Careful start of removing the useless encoding management layer within the LP tool.
http://code.google.com/p/chamilo/source/detail?r=c846dfed09cb8a0b0745eba46ca458e79f215066&repo=classic

#14

Updated by Ivan Tcholakov almost 9 years ago

12536:b3ada42a2549 Task #1765 - Code cleaning - newscorm/audiorecorder.inc.php, newscorm/display_audiorecorder.php.
http://code.google.com/p/chamilo/source/detail?r=b3ada42a25493a408ebdcd4ddfa70021a4b361cb&repo=classic

12535:faa08e46ca7f Task #1765 - newscorm/display_audiorecorder.php: Removing obsolete encoding management code.
http://code.google.com/p/chamilo/source/detail?r=faa08e46ca7f889bd5099762b9103ef4d5f5be38&repo=classic

#15

Updated by Ivan Tcholakov almost 9 years ago

12543:b4088c524b74 Task #1765 - Cleaning the files for the Tracking tool.
http://code.google.com/p/chamilo/source/detail?r=b4088c524b7464dfcfb5992d414c061cea3f90fb&repo=classic

12542:9f7c10512fce Task #1765 - Cleaning the file newscorm/lp_view.php.
http://code.google.com/p/chamilo/source/detail?r=9f7c10512fce118efc9268cfebed8b83980af92e&repo=classic

12541:07e95632ecf3 Task #1765 - Cleaning the file newscorm/lp_toc.php.
http://code.google.com/p/chamilo/source/detail?r=07e95632ecf3d4ac3318ff486aa093b8230d70af&repo=classic

12540:27300a55f039 Task #1765 - Cleaning the file newscorm/lp_author_image.php.
http://code.google.com/p/chamilo/source/detail?r=27300a55f039a3abc0c2f5b0c5e6b2525fe2182b&repo=classic

12539:836f34299793 Task #1765 - Cleaning the file learnpathList.class.php.
http://code.google.com/p/chamilo/source/detail?r=836f34299793230ee6847c51483ed9b5e1d75936&repo=classic

12538:574cfa684b79 Task #1765 - learnpathList class: A modification for ignoring the encoding value stored in the database and always using the system encoding.
http://code.google.com/p/chamilo/source/detail?r=574cfa684b796b0a260a7fd253a8e4249ef2119f&repo=classic

12537:2ef4358f7342 Task #1765 - Cleaning previously disabled code.
http://code.google.com/p/chamilo/source/detail?r=2ef4358f7342342240d72de02e5a9517b38457bf&repo=classic

#16

Updated by Ivan Tcholakov almost 9 years ago

12545:ce0baf837bfd Task #1765 - newscorm/lp_controller.php: Elimination of $stats_charset variable.
http://code.google.com/p/chamilo/source/detail?r=ce0baf837bfd87fd49c5e2c49a9011aff0ba3b90&repo=classic

12544:7c49edc28c2b Task #1765 - newscorm/lp_stats.php: The extra- encoding management layer has been removed, the system encoding is always to be used.
http://code.google.com/p/chamilo/source/detail?r=7c49edc28c2b99441c265f560942a46ac40270e4&repo=classic

#18

Updated by Ivan Tcholakov almost 9 years ago

I have new information from Yannick:

"
...
I wanted to let you know that some of the learning paths imported as
SCORM, sometimes, mention an encoding that is not correct (in their XML
definition).

This leads to the system trying to convert from the given encoding to
the system encoding, which is a flawed conversion, and then the only way
to counter that is to manually select other "original encodings".
...
"

Here is what I am going to do:

1. In order to preserve the possibility for manual encoding fixing I decided
to revert the changes done so far (under the task #1765).
2. All the targeted for changes files should be cleaned first.
3. Then I will make the second attempt for solving all encoding related problems,
keeping in mind this new information.

#19

Updated by Ivan Tcholakov almost 9 years ago

12551:91ddacc8600c Task #1765 - Reverting previous changes, a new attempt for modification of encoding management will be done.
http://code.google.com/p/chamilo/source/detail?r=91ddacc8600c4db9691abfbd9db438c74012f33e&repo=classic

#28

Updated by Ivan Tcholakov almost 9 years ago

12565:d23c7ed45f84 Task #1765 - LP tool, cleaning files (10). The files are obsolete (newscorm/resourcelinker*.php).
http://code.google.com/p/chamilo/source/detail?r=d23c7ed45f8475b899c8ad653a85cfca55a5541c&repo=classic

#30

Updated by Ivan Tcholakov almost 9 years ago

12567:980713579679 Task #1765 - LP tool, cleaning files (12). Removing PHP4 specific, obsolete code.
http://code.google.com/p/chamilo/source/detail?r=9807135796790919f59ffd9ec43be0de70a0e260&repo=classic

#32

Updated by Ivan Tcholakov almost 9 years ago

12578:0ee07598cf95 Task #1765 - The remaining calls api_get_setting('platform_charset') have been replaced with api_get_system_encoding().
http://code.google.com/p/chamilo/source/detail?r=0ee07598cf95ebeb36361f8a8f5356aa02c91e16&repo=classic

#33

Updated by Ivan Tcholakov almost 9 years ago

12581:885b1a2b559f Task #1765 - LP tool, cleaning files (14). Tabs are converted automatically to 4 spaces according to the last versions 11 of our coding conventions. The tool AnyEdit ( http://andrei.gmxhome.de/anyedit/ ) has been used for this conversion.
http://code.google.com/p/chamilo/source/detail?r=885b1a2b559f66ff3fa71313f923816dd35c4231&repo=classic

#34

Updated by Ivan Tcholakov almost 9 years ago

12588:21c2faadd01f Task #1765 - Some functions related to xml-processing have been moved to the library text.lib.php.
http://code.google.com/p/chamilo/source/detail?r=21c2faadd01f8b986372a3581d0288e9c213ee38&repo=classic

12587:b1c962290f5e Task #1765 - Tabs conversion to spaces in some files.
http://code.google.com/p/chamilo/source/detail?r=b1c962290f5e8472322693d3299e30e0205949ea&repo=classic

#35

Updated by Ivan Tcholakov almost 9 years ago

12589:c6767437edef Task #1765 - Corrections about the function for changing encoding of xml-texts.
http://code.google.com/p/chamilo/source/detail?r=c6767437edefa1d858391aaa6a75c73915362aaf&repo=classic

#36

Updated by Ivan Tcholakov almost 9 years ago

12591:d34661a1ec6c Task #1765 - newscorm/scorm.class.php: A new private static method detect_manifest_encoding() has been added. The purpose of this method is to determine the encoding of the input XML text (the manifest). Detection tries to resolve cases of missing encoding declaration or wrongly declared encoding.
http://code.google.com/p/chamilo/source/detail?r=d34661a1ec6c1f55ac89a6f68b17dd8805022cc7&repo=classic

#37

Updated by Ivan Tcholakov almost 9 years ago

12602:07eaa9ba17ad Task #1765 - A sophisticated method for encoding detection is applied before parsing a SCORM manifest file (XML).
http://code.google.com/p/chamilo/source/detail?r=07eaa9ba17ad05de432dbd5e25af1205f7d23051&repo=classic

12599:ec27409cd96d Task #1765 - A correction within the function _api_convert_encoding_xml(): In the xml header the encoding option should precede the standalone option, otherwise DOMDocument fails to load the xml document.
http://code.google.com/p/chamilo/source/detail?r=ec27409cd96d2ea4eb4f0cf55c7ccbc6dd8c2102&repo=classic

#38

Updated by Ivan Tcholakov almost 9 years ago

12609:ccc45b32f17c Task #1765 - Shallow cleaning for some files in the Tracking tool.
http://code.google.com/p/chamilo/source/detail?r=ccc45b32f17c8043561cbc34135a1b4a82a903e8&repo=classic

#39

Updated by Ivan Tcholakov almost 9 years ago

Now it gets interesting.
12610:6b973d0c2771 Task #1765 - Removing/changing obsolete code for encoding management (1). LP tool, Tracking tool.
http://code.google.com/p/chamilo/source/detail?r=6b973d0c27712a34dbac28eda6b721d9e6c22399&repo=classic

#40

Updated by Ivan Tcholakov almost 9 years ago

12611:24b791ef00a8 Task #1765 - Removing/changing obsolete code for encoding management (2).
http://code.google.com/p/chamilo/source/detail?r=24b791ef00a84e3b40fcc78690ec9a982f4ce8c4&repo=classic

#41

Updated by Ivan Tcholakov almost 9 years ago

12612:744540c6948a Task #1765 - Removing/changing obsolete code for encoding management (3).
http://code.google.com/p/chamilo/source/detail?r=744540c6948a879b2f613ffe477d263bc17c2841&repo=classic

#42

Updated by Ivan Tcholakov almost 9 years ago

12613:38482f42d747 Task #1765 - Removing/changing obsolete code for encoding management (4). A comment has been added.
http://code.google.com/p/chamilo/source/detail?r=38482f42d747480353c03855db0615fbc674f04b&repo=classic

#43

Updated by Ivan Tcholakov almost 9 years ago

12616:bbcf740e38dc Task #1765 - Removing/changing obsolete code for encoding management (6). A change in the Surveys tool.
http://code.google.com/p/chamilo/source/detail?r=bbcf740e38dc37f707e1548a2c3b4905b34d72a1&repo=classic

12615:91512458ec2d Task #1765 - Removing/changing obsolete code for encoding management (5).
http://code.google.com/p/chamilo/source/detail?r=91512458ec2d1b0b8aeb198d883ffb1546f19154&repo=classic

#44

Updated by Ivan Tcholakov almost 9 years ago

12617:ba6ac8c42b5b Task #1765 - Removing/changing obsolete code for encoding management (7). The changes are mostly about the SCORM expot functionality.
http://code.google.com/p/chamilo/source/detail?r=ba6ac8c42b5be808db7c70404ce273d63e0356a3&repo=classic

#45

Updated by Ivan Tcholakov almost 9 years ago

12618:95bc2c0c8f56 Task #1765 - Removing/changing obsolete code for encoding management (8).
http://code.google.com/p/chamilo/source/detail?r=95bc2c0c8f561b8b568f1c22b904ea4de4f9cb06&repo=classic

#46

Updated by Ivan Tcholakov almost 9 years ago

12619:29d15a93ca0f Task #1765 - Removing/changing obsolete code for encoding management (9).
http://code.google.com/p/chamilo/source/detail?r=29d15a93ca0fa56a1d96bd6072e3e63a17bb2173&repo=classic

#47

Updated by Ivan Tcholakov almost 9 years ago

  • Status changed from Assigned to Needs more info
  • % Done changed from 0 to 80

It is time for testing and complaints. :-)

#48

Updated by Ivan Tcholakov almost 9 years ago

12635:a267685a7da0 Task #1765 - newscorm/scorm.class.php: A comment has been removed, it is not actual anymore.
http://code.google.com/p/chamilo/source/detail?repo=classic&r=38482f42d747480353c03855db0615fbc674f04b

#50

Updated by Ivan Tcholakov over 8 years ago

Julio, have you spotted other problems here?

#51

Updated by Julio Montoya over 8 years ago

Ivan Tcholakov wrote:

Julio, have you spotted other problems here?

hello Ivan, I only found the problem in the function "detect_manifest_encoding" but you already fix it, so no problems here

#52

Updated by Ivan Tcholakov over 8 years ago

  • Status changed from Needs more info to Feature implemented
  • % Done changed from 80 to 100

Also available in: Atom PDF