chipplechipple

Blog - ASCII To HTML Entities

Technology ASCII To HTML Entities

I've made a small application that replaces high-ASCII characters (byte 128 and higher) in text files (8-bit ASCII) to the corresponding HTML entities. This may sound silly as this is a standard feature of most HTML editors and even some text editors, but the problem it intends to solve is beyond simple text replacement.

Using Japanese Windows makes it impossible for me to edit ISO-8859-1 (US-ASCII) files containing high-ASCII characters (such as accentuated characters) in a regular text or HTML editor, as all of these end up as garbage, combined with the next character to form kanji. The same problem also certainly exists on Chinese and Korean systems, and the same way it may be impossible for Cyrillic systems to comfortably edit ISO-8859-1 files, for example.

This application opens HTML and other text files as if they were binary, and reads them byte-by-byte, instead of character-by-character like text editors do. Any byte of a 128+ value is then converted to the corresponding HTML entity.

Program iconASCII To HTML Entities (AsciiToHtmlEnt.exe)

Features:

  • It comes with ready-to-use support for ISO-8859-1 (us-ascii, latin1), ISO-8859-15 (latin9 or latin0, the updated ISO-8859-1 including the Euro symbol), CP1252 (windows-1252, Windows' superset of ISO-8859-1).
  • You can customize existing replacement sets, or define your own replacement sets for other charsets. Just edit the simply formatted files (byte value<tab>replacement) in the charset folder, or make new ones.
  • It can process a single file or a whole folder (and optionally its sub-folders).
  • In the case of a folder, it takes a list of file extensions to be processed.
  • It can create backup files (option).
  • It accepts drag and drop from explorer for easy file/folder input.
  • It alerts on characters not included in the replacement set, such as a Windows-only CP1252 character found when processing a ISO-8859-1 file.
  • All options can be saved to a .INI file for next time.

Download version 1.0 (18K)
You will also need the Visual Basic 6.0 files if not already present on your computer. Download from Microsoft (one time 1MB download).

License: Freeware.
Not to be distributed without permission.

For comments, suggestions, or bug reports, please use the comments on this blog entry. You may also email me if you prefer. If you create your own replacement sets and think they could be useful to others (such as other charsets for example), please send them to me and they may be included in this distribution.

Credit: Muhammed Abubakar's great Drag_Component Visual Basic 6 (VB6) free code was used to implement drag and drop in this application.


Also check out Kaboom, a great free tool that does all kinds of character conversions, from/to a file or using the clipboard!

Sisulizer's Kaboom

Posted on May 1, 2005 at 14:30 | Tweet |


Trackback


Comments RSS


« YMCK for free | Back to main page | oyasumi, Enban »