^^Characters inside an HTML document. HTML source text format. HTML characterset, HTML Encoding.

codice HTML testo visualizzato
HTML text       

source text

hidden text

testo nascosto

marked up text

normal text

rendered text

visible text

testo visibile

rendered text

browser text

ix Come il linguaggio html descrive la formattazione del testo. Esercizio iniziale.

Control characters have nothing to do inside an HTML document, except 3 ctrl char

\t   \x09  horizontal tab & ¢

\n  \x0a  line feed (newline)

\r   \x0d  carriage return

wp/Unicode_and_HTML

HTML Reserved Characters:  5 make up the HTML language.

they can't be used in the normal text because the browser will try to interpret it as HTML. Therefore they are represented by entity name or number

Character   Entity Number   Entity Name   Description
< &#60; &lt; less-than
> &#62; &gt; greater-than
" &#34; &quot; quotation mark
' &#39; &apos; apostrophe 
& &#38; &amp; ampersand

ref: html.am/html-special-characters

Syntax:   &entity_name;   &#entity_number;     case sensitive

esadecimal:  &#110 equi &#x6E &#x6e    n equi n n.  

&#8801 is ≡

&nbsp; non-breaking Space, no-break space wp
&#8209;non-breaking hyphen

2 words separated by a non-breaking will stick together (not break into a new line).
This is handy when breaking the words might be disruptive.

Examples:

 10    10 km/h    10 PM   dovrebbero essere un'unit non rotta dall'andare a capo

aspira‑polvere aspira‑polvere aspira‑polvere aspira‑polvere aspira‑polvere aspira‑polvere aspira‑polvere aspira‑polvere

wp/Whitespace_character

&nbsp;  Non-collapsing

treat sequences of whitespace characters (space, newline, tab, form feed, etc.) as if they were a single character.

Such "collapsing" of whitespace allows the author to neatly arrange the source text using line breaks, indentation and other forms of spacing without affecting the final typeset result.

soup.tag.encode('Windows-1252") lo trasforma in \xa0

HTML characterset, HTML Encoding.

w3schools/html_charset ≡ HTML Encoding

To display an HTML page correctly, a web browser must know which character set to use.

The default character set for HTML5 is UTF-8.

Differences between Character Sets 

ASCII    Windows-1252    ISO-8859-1    UTF-8

0-7F sono tutti uguali, e qui si ferma ASCII

80-9F e' definito solo Windows-1252, sono 32 posti

A0-FF uguali

Numb ASCII cp1252 8859 UTF-8 Description
32         space
33 ! ! ! ! exclamation mark
126 ~ ~ ~ ~ tilde
127 DEL        
128       euro sign
137       per mille sign
147       left double quotation mark
148       right double quotation mark
149       bullet
159       Latin capital letter Y with diaeresis
160         no-break space
161   inverted exclamation mark
162   cent sign
170   feminine ordinal indicator
176   degree sign
255   Latin small letter y with diaeresis
           

Unicode UTF-8

windows-1252, caratteri NON ASCII del character set.

Links inet

wp/Content_sniffing

w3schools/charsets

 

<!DOCTYPE html>

wp/Document_type_declaration

DOCTYPE document type declaration, an instruction that

The HTML layout engines in modern web browsers perform DOCTYPE "sniffing" or "switching"; the DOCTYPE is retained in HTML5 as a "mostly useless.

DTD Document_type_definition

la sintassi con cui e' scritto il documento.

MIME type sniffing standard

defines how MIME types are supposed to be sniffed in web browsers.

ref: Data URL.  MIME type.   base64 code.

 

Text display, format:  word wrap = word wrappping = line breaking