Page 1 of 6 123456 LastLast
Results 1 to 10 of 58
Discuss [Keyboard] Dictionary editor for 1.1.1 and later at the Tools - Hackint0sh.org; I've developed a small proggy which makes Keyboard Dictionary files (*-unigrams.dat, *-unigrams.idx and *-stems.dat) out ...
  1. #1
    Advanced Array

    Join Date
    Dec 2007
    Posts
    44
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default [Keyboard] Dictionary editor for 1.1.1 and later

    I've developed a small proggy which makes Keyboard Dictionary files (*-unigrams.dat, *-unigrams.idx and *-stems.dat) out of txt wordlist, to make Polish dictionary for iPhone. Anyone interested in it so he can make dictionary for his own language? Or maybe there is such program (besides of iPhoneShop, which don't work with 1.1.1 dict files) already?



  2. #2
    Advanced Array

    Join Date
    Sep 2007
    Posts
    47
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    Hello M4v3er.

    I would really love to be able to do that.

    I've already a list of the most used Spanish words, with their relative frequencies, just waiting for someone to develop such a program

    Just one question... would you be able to create also the -one-letter-words.dat and -two-letter-words.dat files?

    Thank you in advance.

  3. #3
    Advanced Array

    Join Date
    Dec 2007
    Posts
    44
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    Yes, I am able to generate these files also. I will post the program within a few days.

  4. #4
    Advanced Array

    Join Date
    Dec 2007
    Posts
    44
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    iPhoneShop v0.6 is ready, and it's superior to my program (supports all characters, while my program supports only letters), so go get it .

  5. #5
    Advanced Array

    Join Date
    Sep 2007
    Posts
    47
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    Thank you for the tip, M4v3er, I'll check it out.

    Anyway, your work is really appreciated. Would it be possible to see what information you have about the .dat files? You know, formats and such...
    Last edited by reycat; 12-10-2007 at 09:49 AM.


  6. #6
    Advanced Array

    Join Date
    Dec 2007
    Posts
    44
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    Sure. It's incomplete though.

    There are 5 files in /System/Library/KeyboardDictionaries for each language. These files are:

    *-unigrams.dat: Contains all the words from given language, with some additional info for each words. It begins with 4 byte length of the dictionary (word count). Then the word list follows. Each word record contains 4 or 8 byte header (depending on it's type), a word written in plain ASCII, terminated by 0x00. The header is as follows:
    - 2 bytes: letter count. It's encoded in a weird way, can be calculated with formula: 0x821 + (0x421 * letter count in word). Don't ask why, it's just it .
    - 1 byte 'use frequency': a number from 0 to 100 decimal, with 100 being most popular and 0 (never seen it actually) being least popular
    - 1 byte: word type. 02 is a 'normal word', 03 (and sometimes 01?) means it's a name. If it's a name, header contains 4 more bytes, that containe upper/lowercase bitmask for the name. 1 in this bitmask means uppercase letter, and 0 means lowercase letter. Eg. for word 'iPhone' the mask is 00 00 00 02, for the 'DNA': 00 00 00 07

    *-unigrams.idx: It contains a 'table of contents' for main dictionary for faster searching. It contains a number of 7-byte records for every first three letters for every word in dictionary. For eg. if dictionary contains words: art, artist, bang, banana, beer, the index will contain the following: art, ban, bee. The structure of the record:
    - 3 bytes: the three letters. Not in plain ASCII though, but specifically encoded. The iPhoneShop guys did good work at reversing it, my formula works only for letters: 0x28 + ( ( ASCII Code - 0x61 ) * 2 ).
    - 4 bytes: an offset for the first occurance of word that begins with the these three letters.

    *-stems.dat: This file enables iPhone to correct your typing errors on the fly, and to do it fast. It also contains three letter beginnings for the words (as in .idx file), and for each three there are 'mistakes' listed (every letters on the sides of actual letter on the keyboard). For eg. for 'abo' there will be: abp, abi, aco, sbi, sbo, etc.

    *-{one/two}-letter-words.dat: These two files hold one and two letter words for each language. They always start with 8 byte header, 6 bytes unknown, and 2 bytes are length of the file minus 8 bytes. Then the list of words follow, with 3 byte 'counter' at the beginning, and 3 (one-letter-words) / 6 (two-letter-words) 00-padded word in plain ASCII.
    Last edited by M4v3R; 12-11-2007 at 03:25 PM.

  7. #7
    Advanced Array

    Join Date
    Sep 2007
    Posts
    47
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    This info is awesome

    Looking forward to reading about one letter and two letter words

    By the way, I've had a look at iPhoneshop and it doesn't seem it is able to create these two files, is it?

    And one question about what you have already posted, about bitmasks for uppercase/lowercase.

    If the mask for iPhone is 00 00 00 02, should I understand that it is in reverse order?

    Thank you again.

  8. #8
    Advanced Array

    Join Date
    Dec 2007
    Posts
    44
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    And one question about what you have already posted, about bitmasks for uppercase/lowercase.

    If the mask for iPhone is 00 00 00 02, should I understand that it is in reverse order?
    Yes, it is in reverse order.

    By the way, I've had a look at iPhoneshop and it doesn't seem it is able to create these two files, is it?
    Nope, it doesn't seem so. Though these files have rather simple structure, I can post a proggy that will create them from txt files.
    Last edited by M4v3R; 12-10-2007 at 01:21 PM.

  9. #9
    Advanced Array

    Join Date
    Sep 2007
    Posts
    47
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    That would be perfect. Thank you very much

  10. #10
    Advanced Array

    Join Date
    Sep 2007
    Posts
    47
    Post Thanks / Like
    Downloads
    0
    Uploads
    0
    Rep Power
    0

    Default

    Finally I used iPhoneShop to generate a Spanish dictionary. It can be found here:

    http://rapid$hare.com/files/75716066/iPhone_1.1.1-1.1.2_autocorrecci_n_espa_ol.rar.html

    Looking forward to having your program to generate one-letter and two-letter files

    By the way, if it weren't for your info I wouldn't have noticed that iPhoneShop expects its input file in Unicode format, I would have gone nuts trying to figure it out


 

 
Page 1 of 6 123456 LastLast

Similar Threads

  1. MacNN: Sena Keyboard Folio for iPad includes built-in keyboard
    By hackint0sh in forum Latest Headlines
    Replies: 0
    Last Post: 08-30-2010, 09:20 PM
  2. Dictionary Eng -> Spa
    By MrtynKyn in forum AppStore Software
    Replies: 0
    Last Post: 07-11-2009, 03:43 AM
  3. Replies: 2
    Last Post: 01-13-2009, 06:20 PM
  4. [Dictionary] Virtual keyboard
    By MaLer in forum General
    Replies: 28
    Last Post: 04-07-2008, 10:21 AM
  5. Dutch keyboard / dictionary ?
    By ExOMaNiaC in forum General
    Replies: 16
    Last Post: 11-28-2007, 05:38 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Powered by vBulletin®
Copyright © 2014 vBulletin Solutions, Inc. All rights reserved.
Search Engine Friendly URLs by vBSEO
(c) 2006-2012 Hackint0sh.org
All times are GMT +2. The time now is 09:18 AM.
twitter, follow us!