Project

General

Profile

Support #620

Reimport variables > 255 chars from 1.8.6.2 to translate.chamilo.org

Added by Yannick Warnier over 9 years ago. Updated over 9 years ago.

Status:
Feature implemented
Priority:
High
Category:
-
Target version:
Start date:
18/02/2010
Due date:
18/02/2010
% Done:

100%

Estimated time:
2.00 h
Complexity:
Normal
SCRUM pts - complexity:
5

Files

import.php (8.46 KB) import.php Yannick Warnier, 18/02/2010 22:14
import.php (6.31 KB) import.php Anonymous, 19/02/2010 16:30
bulgarian-language-good-sample.png (98.5 KB) bulgarian-language-good-sample.png Ivan Tcholakov, 22/02/2010 14:48

History

#1

Updated by Yannick Warnier over 9 years ago

The first import to translate.chamilo.org was truncated because fields were limited to 255 characters.

Download the last language files from http://lang.chamilo.org/

Unzip the file.

Write a script that goes into each directory (languages), includes each file (packs) and identifies variables > 255 chars (warning: verify that strlen deals with UTF-8, otherwise use mb_strlen or something like that) and writes these new files into a new directory (identical structure) and convert to UTF-8 on the fly (see script attached).

#2

Updated by Yannick Warnier over 9 years ago

  • Estimated time changed from 1.00 h to 2.00 h
#3

Updated by Anonymous over 9 years ago

Attaching the right script to do the work and closing the task.

#4

Updated by Ivan Tcholakov over 9 years ago

Hello Guillaume,

Just for the record, I want a simple fact to be known. :-)

I don't like this script. It does not do the right job. I am sure, you don't like it too.

More respect to the translators' work should be shown, two hours is not enough. I translated almost all the Bulgarian language in the past and I am getting concerned about the final result.

Could you or someone else specify with normal words:
- the format of the old files, in what shape they were inside - UTF-8 strings, non-UTF-8 strings, htmlentities, mixtured encodings;
- the format of the final format of the newly created language files. What the goal is? What are trying to achieve? Open in your IDE, for example a Greek language file, make a snapshot and show it, please.

Kind regards,
Ivan Tcholakov.

#5

Updated by Anonymous over 9 years ago

Hello Ivan,

I don't like this script either, but can you please prove the fact that "it does not do the right job" ?

The goal of this script is to identify all the language variables that have more than 255 characters (because apparently, during the creation of translate.chamilo.org, it was assumed that a translated string would never be longer than 255 characters), convert those variables into UTF-8 if needed, and write them into a new directory, in a new file that will contain only variables with more than 255 characters. These new files will then be imported by Yannick in the new translate.chamilo.org.

Therefore, to answer your questions:
  • the format of the old files is the same as the format of the new files, the only differences being that in the new files, everything is in UTF-8 and the new files only contain variables with more than 255 characters
  • the new files are not a final format. As I said, they will be imported by Yannick in the new translate.chamilo.org

Also, two hours is the estimated time. I spent probably about 4 hours doing and debugging the script, running it on some test files before finally running it on the final lang package. After that, I checked a few files of the new lang package, and they were all fine, containing only variables with more than 255 characters.

Finally, yes the script is not very clean, but to me this uncleanliness is due to the fact that the translations are stored as php variables (you therefore have to do all that business of including the file and playing with the GLOBALS array, but unfortunately that's the only way to do it). I believe translations should be stored in an easily parsable format such as YAML. Unfortunately, I'm not sure Chamilo has the time and resources to do this change.

If you want, I can send you the newly generated package by email.

Kind regards,

Guillaume.

#6

Updated by Ivan Tcholakov over 9 years ago

Thank you, Guillaume, for clarifying the picture.

I would like to help. I started to prepare a new version of the script, based on your work.

1. This is the idea, the current state:
"... convert those variables into UTF-8 if needed, and write them into a new directory, ..."

becomes:
"... convert those variables into UTF-8, and write them into a new directory, ..."

2. Saying just in case - in my definition UTF-8 does not mean converting into html-entities.

When I get ready, I intend to send the script here. I would like you to spend just a few minutes (not during the holydays) to look at it.

I would be grateful if you send me your current results and an instruction (if it is necessary) how to download the last "old-fashion" language files. My address is ivantcholakov at gmail dot com

Kind regards,
Ivan Tcholakov

#7

Updated by Anonymous over 9 years ago

Hi Ivan,

I sent you the two packages (the "old-fashion" languages files and the new ones) by email.

The "if needed" in my sentence meant that some languages packs have already been converted to utf-8 (such as arabic_unicode), and for those language packs, doing a conversion would mess up the pack, so the conversion is not being done.

In my definition, converting into UTF-8 does not mean converting into html-entities either.

You can write a new script which will definitely be cleaner, but I would be surprised if it gives you a different result than mine. Also, just to clarify more things, the script I uploaded here is supposed to be used only once. Once the variables with more than 255 chars are reinserted properly in the new translate.chamilo.org, this script should never be used again... (in other words, it's not something that will be incorporated into translate.chamilo.org)

Best regards,

Guillaume.

#8

Updated by Ivan Tcholakov over 9 years ago

Thank you, I received the e-mail.
Yes, the script is to be used only once, I realize that. Copy/paste here helps a lot.

Thank you, again.

#9

Updated by Ivan Tcholakov over 9 years ago

I've just sent back an e-mail with a reworked script and results. The script is to be called by the browser.

For testing UTF-8, I have committed the resulting language files into the repository.
http://code.google.com/p/chamilo/source/detail?r=3c82709d2d39b482eb902ac0523ceafc7221870a&repo=classic

#10

Updated by Anonymous over 9 years ago

Hello Ivan,

I took a look at the files you sent me by email. Unless I misunderstood something in what Yannick told me, this is not what Yannick wanted. What he wanted in the new files are only variables that have more than 255 characters. The new files you sent me contain ALL variables...

If you take a look at the files I generated, they contain only variables which have more than 255 characters...

#11

Updated by Ivan Tcholakov over 9 years ago

Yes, indeed.

What Yannick did is "First attempt of language import from CDA (UTF-8)".
http://code.google.com/p/chamilo/source/detail?r=7758e2cc38c9874775cfbd56f7150987a40d1633&repo=classic

There will be more attempts, I guess. And I don't consider this my attemt as the last. The next week I will try to get more familiar with the CDA tool, so some more corrections might be needed.

As I said, I don't like the previous script, it did not convert correctly all the variables (no matter how long they are). I hope, this fact became obvious. All the variables have to be renewed.

It is up to Yannick to decide, I hope, he would realize the problem. I did what I can for the moment.

#12

Updated by Yannick Warnier over 9 years ago

Hi Ivan,

Our first bit is to re-import the vars > 255 chars correctly (the current zip still has escaping problems). After that, I think the easiest fix is to ask someone that has access to the database to substitute all strings with \\" to \", which is what is currently the only bug left that causes the parse errors as far as I have seen (so far and in only 3 languages).

#13

Updated by Ivan Tcholakov over 9 years ago

No. this is not the solution. I repeat, all the languages are not previously imported correctly.

In the attached picture I am showing a good result to be achieved.
1. Pure UTF-8 strings.
2. No html-entities.

Also, I proposed patches about CDA. If they are correct, they should be applied to the all imported data.
http://support.chamilo.org/issues/627

I expect quality work, please.

#14

Updated by Yannick Warnier over 9 years ago

The problem is that several translations have already been modified and we want to avoid overwriting them.

Only a few languages have the htmlentities problem. Can we focus on these ones and let (at least Spanish, English, French and Galician) alone?

I also expect quality work, so let's get to work:
1. some of the languages are already supposed to be UTF-8 (these are put into the $utf array in the script)
2. some translations were both in UTF-8 and in ISO (with HTML entities), so I have picked the one I think is the most up to date (only the languages in the $langs array actually get treated)
3. the script only treats >255 chars variables, but once we've done that, we could reuse it for <255 for languages of which the first import failed
4. Spanish, English, Galician and French have already received a lot of work since the import, so I don't want to overwrite them with a new import)
5. the basque<>euskera and turkish<>turkce languages work can be done manually before the import
6. some of the languages were historically translated using HTML entities because there was no other way to do that on an ISO-based portal. These should probably be re-imported using an HTML entities to UTF-8 conversion

Could you let me know if I'm missing something?

Also: I'm seeing the Greek language clearly in my IDE

Sent to Sven just now:

De: Yannick Warnier
À: Vanpoucke Sven
Cc: Ivan Tcholakov
Sujet: Replace strings in CDA
Date: 22/02/2010 08:41:44

Hi Sven,

Could you please run the following query into the translate.chamilo.org
in order to get rid of our parse errors?

UPDATE `cda_variable_translation` SET translation =
REPLACE

This will replace all occurrences of \\" by \" and will effectively
sanitize a whole bunch of things.
This is also far more efficient than having us go through all
languages and fix all occurrences of this one.

Please run the query TWICE, as maybe some \\\" might survive the first
run.

Please take a backup of that table before you do (just in case). I have
run the tests locally and they give the right results (leaving only \"
when necessary).

Thanks,

Yannick

#15

Updated by Yannick Warnier over 9 years ago

What's the method you used to move Bulgarian to UTF-8?

#16

Updated by Ivan Tcholakov over 9 years ago

Yannick Warnier wrote:

What's the method you used to move Bulgarian to UTF-8?

I used the modified script that I sent to Guillaume Viguier. Just in case, I will send it to you too, it is about 10MB message (initial language files are inside). See the logic of the script, it is better.

Ok, for Latin-origin languages let us not do anything, they could be excuded. For the other languages I doubt that there was translation activity. I think, we have to repeat the import using my script, and applying the patches I proposed for CDA.

I am sending as they were sent to Guillaume Viguier, without touching them...

#17

Updated by Ivan Tcholakov over 9 years ago

Actually, the resulting language files (as a result by using the script) also are inside in the message.

#18

Updated by Yannick Warnier over 9 years ago

The week isn't starting very well in terms of spare time so I'll have to ignore this task for a few days. Sorry about that. I'll try to import at least the >255 strings. Would you benefit from me giving you admin right on translate.chamilo.org to re-import the files (with the restrictions mentioned previously about latin languages)?

#19

Updated by Yannick Warnier over 9 years ago

  • Assignee changed from Anonymous to Ivan Tcholakov
#20

Updated by Ivan Tcholakov over 9 years ago

Ok, I will make the next attempt for the non-latin languages.

#21

Updated by Yannick Warnier over 9 years ago

  • Status changed from Feature implemented to Assigned
  • Priority changed from Immediate to High
#22

Updated by Ivan Tcholakov over 9 years ago

10574:26cb2b597947 Tasks #620 and #627 - The second attempt to use exported Chamilo 1.8.x translations from CDA, http://translate.chamilo.org
http://code.google.com/p/chamilo/source/detail?r=26cb2b597947c809fda7ce190b72ca5b6bd4c426&repo=classic

#23

Updated by Ivan Tcholakov over 9 years ago

10575:57ed73ae60f4 Tasks #620 and #627 - Fixing manually syntax errors. Still I have a problem (overescaping) with the exported data.
http://code.google.com/p/chamilo/source/detail?r=57ed73ae60f4275a302d7d4ee595de9353a526b0&repo=classic

#24

Updated by Ivan Tcholakov over 9 years ago

10582:d825c902a982 Tasks #620 and #627 - The third attempt to use exported Chamilo 1.8.x translations from CDA, http://translate.chamilo.org. I think, this is it - the translations are UTF-8 and they pass syntax check.
http://code.google.com/p/chamilo/source/detail?r=d825c902a982ae9a679bfc99d2f541288d80b8b6&repo=classic

#25

Updated by Ivan Tcholakov over 9 years ago

  • Status changed from Assigned to Feature implemented
  • % Done changed from 0 to 100

10583:09e632452c01 Tasks #620 and #627 - A "normal" export from CDA - final selective corrections for some language files.
http://code.google.com/p/chamilo/source/detail?r=09e632452c01f5029594c253e68858a2a82c68aa&repo=classic

Also available in: Atom PDF