a

CJKV Dict - a smart dictionary for polyglots learning ZH, JA, KO, VI - Emmanuel Ternon | PG 2019

thank you very much thank you everyone

for coming to this presentation my name

is Emmanuel turn-on and I'm a software

engineer by profession but I also have a

passion for languages as everyone here

and more specifically East Asian

languages and today I'm going to talk

about CJ cave edict which is basically a

small dictionary for polyglots that I

created

if you learning Chinese Japanese Koreans

and all Vietnamese at the same time as

you'll be able to discover throughout

the presentation okay so before we start

talking about the dictionary itself is

anyone able to tell me what this map

depicts what it represents yeah

more specifically I think yeah okay well

that's that that's the idea it's

actually a map of the sinus fear and the

sinus fear is the region of the world in

which China has historically had a very

important linguistic and cultural

influence so yeah what happened is that

throughout the centuries Chinese

characters in many words of Chinese

origin have been imported into Japanese

Korean and Vietnamese and as a result of

this massive import of vocabulary and

characters there is now a large

proportion of words of Chinese origin in

the vocabulary of Japanese Korean and

Vietnamese and these words make up about

60% of the Japanese vocabulary

two-thirds of the Korean vocabulary and

about one-third of the Vietnamese

vocabulary and yeah direct results of

this massive important vocabulary is

that there are now many common words

between

he's Japanese Korean and Vietnamese and

one example is this word which means all

for everything so yes you know it can be

a shampoo in Mandarin Chinese but it

also has other readings in Cantonese and

other regional Chinese languages I'm not

gonna going to attempt to pronounce this

because I don't speak Cantonese it's

pronounced ember in Japanese Jambu in

Korean and you can also pronounce it in

Vietnamese thank you

so yeah this property this property of

having Chinese characters that represent

the same word in multiple languages is

very interesting if you're learning

multiple of these languages at the same

time so and I actually took advantage of

this property myself because I've been

learning Mandarin Japanese and Korean at

the same time so to figure out what

words can be shared between these

languages what I've been doing is using

multiple dictionaries to look up the

common words between these languages so

I use MD bg4 Mandarin a dictionary

called G show.org for Japanese and a

dictionary called Naver for Korean so if

you're learning some of these languages

you might have heard of these

dictionaries and even use them yourself

but they weren't exactly designed to be

used the web is using them so I'll show

you what I was doing let's say you open

Naver dictionary so a Korean dictionary

and you look up the word is born so

you're able to discover that in minced

Japan and that it can be written with

these two Chinese characters so if you

copy these Chinese characters you open

another daiquiri

like this org and you paste them into

disorder org Terra it works you you

realize that they can also be used in

Japanese to also mean Japan and that

they can be read in Iran another example

if you use a Chinese dictionary so

and dbg here if you look up these two

Chinese characters you're able to learn

that the reference were hua and that

they mean culture or civilization so

here again you can copy them and paste

them in in another dictionary like to

show dog so if you do this you're able

to learn that they're read bunka and

that they also mean civilization or

culture in japanese so yeah as you can

see so far so good this trick of using

multiple Chinese well Chinese Japanese

Korean dictionary seems to be pretty

nice and seems to work out pretty well

but let's try a couple of other examples

so if I look up these this word see home

in a Korean dictionary so I'm able to

learn that it means exam or test and

that is written with this Chinese

characters all right so I copy them and

I paste them into DiGeorge and this is

what I get

sorry couldn't find any worse matching

these two characters so what does that

mean does that mean that these

characters do not exist in Japanese and

that this word does not exist well not

exactly because they're basically the

Chinese characters for the word skin

which also means exam in Japanese okay

let's try something else

if I search for these two characters on

show.org I'm able to learn that they're

pronounced ksi and that I mean economy

economics so if I copy them and I paste

them into a Chinese dictionary like MD

BG I get this message no results found

that's pretty strange because I was

pretty sure that they are the Chinese

characters for the word Jing Zhi for the

Mandarin were Jing Zhi so yeah as you

can see this trick of using multiple

dictionaries to look up common Chinese

character words does not always work do

you have any idea why not exactly

no no no no no yeah yes but no it's not

it's not the reason actually is because

Chinese characters were simplified in

Chinese and in Japanese so what happened

is that in mainland China

so the mainland Chinese authorities

created a set of Chinese character who

chanted which are a simplified version

of the traditional well the historical

forms of Chinese characters and they did

something similar in Japan they created

a set of Chinese character called shinji

tyne that are also a simplified form of

the traditional Chinese characters and

one of the major issues with the

simplification schemes is that they

happened independently from one another

so as a result if you try to look up

these two characters in a Japanese

dictionary you're not going to get any

results because the Japanese dictionary

would expect this form of characters and

vice-versa so this is basically what I

was trying to do I was trying to look

for these Japanese simplified character

forms into a Chinese dictionary but he

didn't work out but actually it's only

part of the problem because if you look

at Korean and Vietnamese Chinese

characters were not simply fine but they

were completely eliminating eliminated

sorry from the writing system so in

Korea all sino-korean words were changed

from Chinese characters to Hangul the

Korean phonetic writing system

yeah but it's declining and nowadays you

basically only see Hangul in Korean yeah

yeah but very very sparsely like if you

look at a standard Korean text you will

only see Hangul basically and yeah the

same thing happened with with Vietnamese

so in Vietnamese Chinese characters were

changed to a phonetic writing system to

the Vietnamese alphabet that's based on

the the Roman alphabet okay so if you

look at the simplified Chinese form the

simplified Japanese form the phonetic

form in Korean and the phonetic form in

Vietnamese if you were to find a kind of

common denominator between all these

forms what could you do well it's very

simple you can just revert to the

traditional Chinese character form and

based on this principle you can

implement a language switching mechanism

so if you take one of the simplified of

phonetic forms you can then convert it

to traditional Chinese characters so

that you can then reconvert it to

another simplified and phonetic form so

by doing this you can switch between

Chinese and Korean or Vietnamese and

Japanese etc but yeah the thing is this

language switch the switching mechanism

is slightly difficult to implement

because first of all chinese and

japanese dictionaries too often do not

include traditional forms of chinese

characters they only either include the

simplified form yeah so yeah it's the

case for the g-shot object dictionary

that only includes Japanese simplified

forms MDB G is an exception it includes

both simply very traditional but there

are a lot of Chinese dictionaries for

learners that are mainly simplified so

to solve this problem you would need

some kind of simplify to traditional

converter but these converters us yeah

they're a little bit difficult to use

and

yeah they're not really accessible to

learners who don't have like deep

knowledge of Chinese characters and same

goes for Korean in Vietnamese because

Korean or Vietnamese dictionaries rarely

include Chinese characters at all so

yeah you would need some kind of

phonetic to character converter which is

even more difficult to find and to

implement because sometimes you have

many Chinese cap well um ooh like a

phonetic transcription can correspond to

many combinations of Chinese characters

so yeah if you want to implement this

language switching mechanism you would

have to use a large number of two and

even if you have all the tools you

require yeah it's gonna take you some

time because you need to manually

convert from simplify and traditional

than two phonetic again etc so it's very

cumbersome so I thought to myself what

if this language switching mechanism

could be implemented in a single step so

in a single sim click of the map and

this is when I got the idea for see Jack

every date so Cindy Karadzic's

is basically an online tool that's a

very available sorry at cj kv - dicks

comm it's completely for free anyone can

use it

and I will now guide you through all of

its features so that you can know what

you can do right so the main feature of

the dictionary is that when you search

for a specific word in Chinese

characters it gives you results for

Chinese Japanese Korean and Vietnamese

so for example if I open the dictionary

and I search for these two characters

then I'll get information of what this

word means in Chinese Japanese Korean

Vietnamese so you get the pronunciation

in Mandarin Cantonese and all the other

languages and you also get information

about simplified Chinese character forms

which is very useful if you're learning

Chinese we simplify characters or if

you're learning Japanese so that you can

recognize this word even when it's using

a simplified Chinese character set and

speaking of simplified characters one of

the coolest features of CG

headaches is that it can automatically

convert simplified Chinese characters to

traditional Chinese characters so if I

look up these two characters in the

dictionary the dictionary is able to

detect that they are simplified Chinese

form and instead it will look up the

traditional form so that will allow you

to not only figure out that this these

are the Chinese characters for the

Chinese what am wha that means telephone

but you will also learn that this word

also exists in over the languages

something you would not have been able

to do if you only had the simplified

Chinese form and that they also mean

telephone in all other all of other

languages sorry okay so of course this

feature also works for simplified

Japanese characters so if I look up

these two characters see Jacob Redux

able to detect that there is simplified

simplified form so instead we will look

up the traditional form and thanks to

this you will not only learn that there

are the Chinese characters for well the

kanji in Japanese for the word Yankee

that means something like vitality is

basically what you saying or genki deska

in Japanese winning when you greet

someone and you will learn that this

word also means the same thing in

Chinese and in Korean is so it's France

you can see in Chinese and one key in

Korean just for a minute sometimes

sometimes see Jack every dates well yeah

actually if it doesn't know whether or

not a word exists in one language it

will simply tell you so in this case it

was not able to find any word in the

database so it will tell you no result

found in Vietnamese dictionary it could

be that this word exists in Vietnamese

but well sometimes it doesn't sometimes

it does okay then I need to extend the

database good and yeah another

interesting feature is that CJ gave

edict is also able to automatically

what what I call Chinese character

variants so yeah let me explain what

that means by showing you an example if

you look at the word for inspection when

you write this word in traditional

Chinese characters so even when taking

into account the Chinese character

simplification and elimination reforms

you will get this form for Chinese and

this form for Japanese so as you can see

these two characters are written in a

slightly different way but they

basically mean the same thing so and

this is what Chinese character variants

are they're basically two ways to write

the same character and so yeah the cool

thing with CJ Karadzic's is that if you

enter one of these two forms it will

automatically let you know that the word

has a variance and it will display the

word that match these variants ones as

well so you will not only get results

for the basic form here in Chinese and

Vietnamese but also for the variant form

in Japanese and Korean in this case yeah

of course you can also search for words

in Chinese characters using the phonetic

transcription so yeah for Chinese you

can search for Jun for huh so boku mo

for the phonetic transcription used in

Taiwan for Mandarin you can use high

opinion without spaces either with

number tones or without tones and you

can use git Bing which is the content is

organization system yeah without spaces

either with stones or without tones when

it comes to Japanese you're allowed to

use kana the phonetic Japanese writing

system and you can search for romanize

Japanese using the Hepburn romanization

system which is basically the most

widely used as for Korean you can search

for words using Hangul the phonetic

writing system used in Korean or you can

use the revised romanization system

which is the most commonly used

oran ization system for korean and last

but not least if you want to search for

Vietnamese words you can use the

Vietnamese alphabet

either with diacritics or even without

okay so when you search for a word using

a phonetic transcription so here if I

search for Dan wah for example in kana

the dictionary is able to detect that

there the connection friction of these

two characters so you will then only get

the results for Japanese but also for

Chinese Korean and Vietnamese of the

word that uses the same characters right

so there are a couple of other search

features that you can use such as yeah

you can search for English words for

example telephone if you write telephone

you will then get all words that have

the word telephone in their definition

you can use wildcards you can use them

either well after the word so if you

search for chump or star you will get

results matching Tongo Tongo around

jungle cetera if you search for star

Garko you will get like words that end

with ghako so not also a cocoa rockledge

cetera any if you search for a star

tester you will get like results that

match anything containing test inside a

word so test testing contest protesting

etc yeah another useful feature is that

CJK read it can also give information

about individual characters comprised in

a word so for example if I use the word

telephone I'm able to discover that it

has that the first character means

electricity and that the second

character means stalk and I can get all

the pronunciation in all various

languages or the Sonisphere and of

course simplified forms either

simplified Chinese or simplified

Japanese if they exist all right so I

think now you've had a good overview of

the dictionary so yeah you are basically

ready to use it yourselves

congratulations but before you do this

there's something important its value

and this is what CJK verdict is not see

Jake a verdict is not a chai

two Japanese Japanese two Korean all

Korean two Vietnamese dictionary or any

other combination of these languages for

that matter and why is that

that's because sometimes the same

Chinese character means something

completely different in different

languages so in other words there are a

lot of fools friends between all these

languages but the good news is that CJK

Vedic can also help you to detect close

friends so for example if I search for

these two characters I'm able to learn

that they're the characters for Khumbu

in Korean which means to study so I

didn't come Buhari but in Japanese I

mean they are at KU and they mean to

Seoul something ingeniously or scheming

whereas in Chinese there at Khufu and

they mean something like time provided

sorry yeah effort or labor yeah so as

you can see having the same Chinese

characters in a word is not a guarantee

that it's going to have the same

definition in all languages right all

right another example if I search for

these two characters there at bankee or

in Japanese like benkyo series to study

in Japanese but in Chinese and in

Vietnamese I mean something like to do

reluctantly unwillingly etc so yeah

unless unless you really hate studying

unless you really hate studying we can

safely say that they refer to a

completely different concept right yeah

and sometimes they can even be funnier

than that so if I look at these two

characters for example okay some people

already know so in Japanese the red

tegami they mean letter but in Chinese

they mean toilet paper

and the other like soldiers so unless

you want to ask your Chinese friends

whether or not they received the toilet

paper you send them a while ago do

yourself a favor and don't use CJ

comedic as a Chinese to Japanese

dictionary you've been warned all right

so before we wrap things up I'd just

like to give you a quick summary of all

the features of CJ comedic so as you

understood its main purpose is to look

up words in Chinese Japanese Korean a

vietnamese written with the same chinese

characters it's able to automatically

convert simplified characters to

traditional characters it's able to

automatically detect Chinese character

variants you can also search for Chinese

Japanese Korean and Vietnamese words

using a wide range of phonetic

transcription of standard phonetic

transcriptions and it helps you to

detect false friends once again it's

available online as c JK v - datacom you

can also like our page on twitter on

facebook sorry city can be big so it's

facebook.com slash c JK v dot sticks and

you can follow city comedic on twitter

there it is so it's twitter.com slash c

JK v underscore dick so you can see the

underscore but it's in in here and I

particularly recommend you to do so so

to follow us on Twitter - like the

Facebook page because I regularly post

words that have same or similar meaning

in in all of these languages in all the

CJ TV languages and that are written

with the same Chinese characters okay so

we're now reaching the end of the

presentation but before I start

answering your question there is one

more thing I like to tell you so yeah

what if you want to use CJ Carey dicks

but you don't have access to the

Internet where you are well the good

news is that CJ k verdict is also

available as an iOS and Android app the

app contains an offline database for the

dictionary so you can use it even

when you don't have access to the

internet and you can download it on the

App Store on the Apple App Store and on

the Google Play Store and the best thing

is it's available completely free of

charge so if you want to find to

financially contribute to cgk redux

there is a possibility for you to do so

you can click on the donate button on

the webpage but no pressure it's

completely voluntary all right so before

I let you go there's actually one other

thing I like to tell you so yeah you

probably remember that during the

presentation I told you that traditional

Chinese characters act as a common

denominator between all the languages of

the Sun a sphere right between Chinese

Japanese Korean Vietnamese and as a

matter of fact traditional Chinese

characters are fundamental they're very

important when you're learning these

languages especially when you're

learning many of them and if you want to

learn why I recommend you to read the

book I have just published yeah so the

title is traditional Chinese characters

a trans lingual writing system and it's

available for purchase on Amazon so on

all these websites in the US Canada UK

Germany France Spain Italy and even in

Japan yeah yeah it's traditional Chinese

characters patrons lingual writing

system ok so I'd like to thank you for

your attention

[Applause]

great there's some there's some very

positive comments here so your friends

should get a screenshot of this okay

well thank you how do you handle the

cases when one simplified characters

okay all right so I'll repeat the

question so how do you handle situations

when you have one simplified character

that corresponds to many traditional

characters so usually the the yeah when

this happens in most cases the character

correspond the simplified character

corresponds to multiple traditional ones

only on its own but when it's inside a

word you can usually match it to one of

these possibilities so in 90% of

situations it's possible but there are

some situations in which like even when

you have multiple characters you can

have multiple combination in

traditionals in this case it will simply

show you the whole list so you will have

for example two words in Chinese but

this automatic simplified to traditional

conversion feature will not necessarily

work because you need like to be able to

match one unique word for this to work I

haven't counted them but yeah I

basically I reused open-source databases

for for the dictionary so you've

probably heard of them CCC dicts J and

for for Chinese JM dict for Japanese

there's a database called King dict for

Korean and also use dictionary for

Vietnamese and for Korean

yeah then on the website yeah if you

click the about link then you will have

the whole list yeah that's an excellent

question so yeah you might think you

have contain to Nam so I would rather

say no unfortunately and the reason

behind it is because first of all

they're they only concerned Vietnamese

and so they're not really transferring

well and the main purpose of the

dictionary is to show like the the

common words between the between all

languages and the second reason is that

they are not really properly

standardized at the moment so it's

difficult to find a proper database and

and proper forms that can support these

these two non characters so the

Vietnamese they're included in the

database but they're not displayed

because it's actually slightly better

when you're learning Japanese or Koreans

we get used to the romanization system

for that language especially for Korean

romanized Korean can get really messy so

but maybe in the future I could probably

add the option to to adds the

possibility to display organized

meetings oh yeah yeah yeah that's an

excellent question so for okay so yeah

two things there is cleaning and

processing one of the databases the

database for Korean was very messy very

old it hadn't been updated since 2011 or

something of this and there were a lot

of corrupted entries so I had a lot of

you know I I needed to manually clean

many of them I'm not 100% on so if you

if you look up some Korean words you

know sometimes you see there's no

definition or the definitions a bit

strange but it's difficult to maintain a

database of this size on

on my own so yeah and the second thing

is processing so for example the Chinese

database contains both simply found a

traditional so I could use it just as is

but when it comes to the Japanese I had

to convert the all characters from

simplified traditional using a tool I

made on my own yeah I even made like a

Python library just for that just for

that purpose yes yes yeah that's that's

what you can do as I said you can use

any if the character it can be written

in in HoN to basically not in to non

like in entrace lingual like Chinese

characters then you can usually match

the phonetic reading to the character

yet it works no no yes yeah actually

there are very few teachers that can do

that and yeah it's it's quite tricky

because sometimes you know that would

correspond that would be writing in one

language but only the other so

segmenting a phrase in multiple

languages at the same time it's quite

tricky so no no yet no that's actually a

good idea I know that there are some

databases of like I know the towbar

database so for Japanese writes but no

yeah but yeah maybe one day who knows

how much time did you spend on earth Oh

yeah give up on your social life now

that you know it's like it's like what

people say when you love something you

don't count really you don't really

count the time you're investing in so

yeah sometimes I could just spend like a

whole weekend just coding for I don't

know ten hours each day so yeah yeah

yeah it's not only feasible and if you

used tools like this to really manage to

make mnemonic associations between all

these various languages you can do it

it's like basically learning French

Spanish and Italian you know like you

need to be aware of fools friends like

preservative Oh for example you know and

but it's yeah it's the same kind of the

same concept with East Asian languages

yeah yeah I I started with Japanese yeah

actually it's it's yes mostly because of

Japanese because you know when you learn

Japanese you need to learn kanji so

Chinese characters and when I started

learning Korean I discovered you know

the similarities between the

vocabularies and I was using this trick

of using of typing Japanese into a

Korean dictionary but as you see as you

saw sometimes it doesn't work because

the simplifications etc so and same for

Chinese I learned Chinese on how the

same way and I felt very frustrated of

not being able to look up these common

words like very easily and that's that's

why I really felt the need for this for

this tool

yeah that's a good question

okay so in the case of Japanese if you

put the okuda gonna it's not going to

give you results in more than Japanese

because the aghanim I'm sorry okay yeah

yeah if you could do work yeah if you

put the work work in English it's gonna

give you all words yeah exactly yeah

yeah yeah so the most specific you you

get the less results you'll get so if

you type just work you probably get like

50 results per language yeah then you

probably get a bit less results I mean

yeah it works then you get like you'll

ha ha Terra Klugh etc and come to and

yeah in my book yeah I talk about well

that I would deserve a whole

presentation on its own accident but

yeah I talk about why I believe

basically that simplifying Chinese

characters in all the countries of the

Sun sphere was a big mistake and that it

should be reverted and yeah basically

but I give good reasons for it like I

analyzed the benefits of traditional

Chinese characters in all the languages

so not just Chinese or Japanese but also

Korean Vietnamese etc and I analyzed

also the yeah the disadvantage of having

a phonetic writing system for Korean

Vietnamese and of having simplified

characters for Chinese and Japanese more

less

I knew it's very crying to be allowed me

to give a little plug here at the end

there's a couple of us who are thinking

about organizing something like polyglot

gathering before the asia-pacific region

so if anyone is interested in that idea

for Pacific Australasia Southeast Asia

potential for North Asia in there as

well so we're going to have a

get-together of people interested in

that idea at 2:30 in the games area so

if any of you are interested in stories

might be easily please come along area

[Applause]

you