English 360 (from angles)

Thread: English 360 (from angles)

Tags: None
  1. Sanmayce's Avatar

    Sanmayce said:

    Default English 360 (from angles)

    The thread in which I used to post is closed for some reason.

    It would be nice everyone who has some view/question to have a free way to express his/hers mind.

    As a continuation of my last post #1894(http://www.allthelyrics.com/forum/le...l#post876796):

    What a stupid blunder from my side: 'I have a friend of mine'.
    Where is the (real-time) detector for wrong n-grams to warn me that the proper phrase is: 'A friend of mine'?

    The point of that was to show the quatrain unfolded/n-gramed. And that was only the small part of the full n-graming.
    You understand that one unique 27-gram (here the unfolded quatrain) is built from much many n-grams as follows:

    1 2
    sub-grams: 1,2,1 2
    All: 2+(1)=3

    1 2 3
    sub-grams: 1,2,1 2,3,2 3,1 2 3
    All: 3+(2+1)=6

    1 2 3 4
    sub-grams: 1,2,1 2,3,2 3,1 2 3,4,3 4,2 3 4,1 2 3 4
    All: 4+(3+2+1)=10

    1 2 3 4 5
    sub-grams: 1,2,1 2,3,2 3,1 2 3,4,3 4,2 3 4,1 2 3 4,5,4 5,3 4 5,2 3 4 5,1 2 3 4 5
    All: 5+(4+3+2+1)=15

    ...

    That is, they are 27+...+1 = 27*(27+1)/2 = 378 in total. Only after realizing that each of them must be ranked individually you can see a GLIMPSE of the evaluation process.

    Having been posted all-of-them I guess the moderator would go bananas, ha-ha.
    My idea was (actually) to provide a visual food. Down-written they are more moving, I saw that only talking about them was a burden.

    And I sense some static view of yours regarding the quatrain's good-English-or-even-commonly-used-English-ness, while mine is neither static nor dynamic, that is the lyrics' author can coin his/hers own definitions (I mean free-for-all i.e. slang/vulgar/archaic/obsolete usage ranked not "judged").

    Your attention is on things that hardly have something in common with my passion: ripping and indexing the whole written English in order to create a brute-force phrase-checker (or n-gram-checker to be more precise).
    John, I want to salute you with a lyric (I "forgot" intentionally to mention the song info):

    [
    I look at the river
    But I'm thinking of the sea
    The past overwritten
    By what I hope to be

    I search for the essence
    It's my command
    That is the centre
    It's what I demand

    In my mind
    I'm standing in a glass house
    Looking below
    Don't wanna look without seeing
    Don't wanna touch without feeling

    I dream in colour,
    yeah I dream in colour
    When the world's in black and sepia tone
    Or in sleepy monochrome
    I dream in colour,
    yeah I dream in colour
    I see much further than this
    I see much further than this, yeah

    In a room with no windows
    And painted doors
    What once was my ceiling
    Now is the floor
    My landscape is changing
    From out of the dust
    It's some kind of healing
    Like the sun's coming up

    In my mind
    I'm standing in a glass house
    Looking below
    Don't wanna hear without listening
    Don't wanna talk without speaking

    I can see through the clouds of grey
    Got a window on the world
    I can sweep them all away
    Got a window on the world
    Don't wanna look without seeing
    Don't wanna touch without feeling

    I dream in colour,
    yeah I dream in colour
    When the world's in black and sepia tone
    Or in sleepy monochrome
    I dream in colour,
    yeah I dream in colour
    I see much further than this
    There is gotta be more than this, yeah ...
    ]

    Just wanted you to feel what is my place in this world (similar to the classic 'The man from the silver mountain' here A man from the glass house).

    P.S.
    My hurry mode is always buggy: the name of the thread should be 'from all angles'.
    Last edited by Sanmayce; 06-19-2011 at 11:24 AM.
     
  2. LycaNightmareLuc's Avatar

    LycaNightmareLuc said:

    Default

    This interests me from the angle of (a) free and unbridled expressions of thought, but also of course because (b) it's been said countless times that mathematics can express every known concept in the universe....like the idea that even the seeming randomness of weather can be expressed in a Markov chain and thus predicted, although almost no one seems to be able to adequately interpret the computational results. This is to say more or less that you really shouldn't feel bad about that "blunder"

    May I ask, what is your particular core of study? And is it your goal to develop an institutional teaching aid?
    It's never as [.....] as it seems.
     
  3. Sanmayce's Avatar

    Sanmayce said:

    Default

    Hi LycaNightmareLuc,
    glad I am that at last someone dared to enter this 2,200,000,000 (yes, 2+billion people using English) deep water as you did.
    I see you feel the theme, and believe me it take NOT a scientist to get how important no-no better said fundamental are these n-grams for all kind of analyses.

    My English is broken and simple but this (amazingly it is my advantage) helps me a lot when I must figure out some approaches as how to do transitions from 1-gram all the way to the full-meaning n-gram let say 8-gram.
    And I follow the Diamond (not golden) rule: 'keep it simple'. I am saying this in order to emphasize that Markov's and other similar (AI) conceptions are too complex (at least for me) I stick to the basics, yet. As you know there is a "classic song" named 'Walk this way' meaning that transition to more complex things must be paved by traversing the basics stuff first, don't you think? I wonder how it is possible big/rich organizations powered by some data-center cluster having not implemented yet a free on-line PHRASE-CHECKER (one of my dreams).

    Answering your questions:
    My core is x-grams, as to your next (wow, you are playing on my finest strings) I would like a lot - but the problems are too many though - after all I am only an amateur with no solid background neither in mathematics/programming nor linguistics, grumble.
    You are (along with other people interested in exploring the basics) welcome to my free-and-open-sub-project: Leprechaun at http://www.sanmayce.com/Downloads/index.html#Leprechaun
    Feel free to ask whatever interests you.
    Get down get down get down get it on show love and give it up
    What are you waiting on?
     
  4. Sanmayce's Avatar

    Sanmayce said:

    Default

    Giving definitions and basic goals is essential for any further explanations but for now I will skip them.
    Only to state the difference between n-grams and (my) x-grams: since there are no definitive definitions here come mine:
    1) A n-gram is a sequence of words, for example for a given sequence of files (i.e. texts/contexts) with length (measured in words) 30 trillion we can say it is a single 30,000,000,000,000-gram. That is not as scary as it seems because the chunks (which constitute this big sequence i.e. files) can be mixed randomly with no significant impact on the current approach (targeted on smaller orders i.e. books/chapters/paragraphs/sentences).
    Even clearly said: let some 30 billion files form these 30 trillion long n-gram, whether the former are in one or other order/sequence - it doesn't matter because we assume that each one of these files is a context by itself.
    In my current view the usable electronic English can be roughly 4-grammed down to 10x800,000,000 or 8 billion phrases 4 words in length.
    2) An x-gram is a n-gram derived by applying some rules on the latter.

    I have so much to say... But here comes only my previous post 2-grammed (my wish is to show how insufficiently is to stop somewhere in-betweens i.e. not to make full mix by going up and down through the rest x-grams).

    The following dump (2-grams_checked_with_Gamera_corpus.txt) was generated after 2-gramming (resulting in 245 distinct 2-grams) the post and got checked (resulting in 231 found i.e. familiar 2-grams) against 124,669,942 Gamera corpus 2-grams:
    The first 14 2-grams are unfamiliar to this corpus - it means either the corpus is not rich enough or phrases are either wrong or suspicious:
    The numbers at right show the occurrences:

    ai_conceptions
    background_neither
    basics_stuff
    hi_lycanightmareluc
    html_leprechaun
    leprechaun_at
    leprechaun_feel
    named_walk
    next_wow
    nor_linguistics
    organizations_powered
    phrase_checker
    say_gram
    similar_ai
    a_classic .................. 0,012,381
    a_free ..................... 0,069,761
    a_lot ...................... 0,258,571
    a_scientist ................ 0,007,458
    advantage_helps ............ 0,000,006
    after_all .................. 0,223,436
    all_i ...................... 0,099,114
    all_kind ................... 0,002,310
    all_the .................... 2,562,629
    along_with ................. 0,211,834
    am_only .................... 0,013,582
    am_saying .................. 0,012,284
    am_that .................... 0,007,216
    amateur_with ............... 0,000,054
    amazingly_it ............... 0,000,021
    an_amateur ................. 0,004,950
    and_believe ................ 0,015,673
    and_i ...................... 2,039,312
    and_open ................... 0,040,190
    and_other .................. 0,624,922
    and_simple ................. 0,028,709
    answering_your ............. 0,000,707
    approaches_as .............. 0,000,672
    are_along .................. 0,000,433
    are_playing ................ 0,005,494
    are_these .................. 0,033,237
    are_too .................... 0,059,746
    as_how ..................... 0,010,652
    as_to ...................... 0,905,845
    as_you ..................... 0,573,837
    ask_whatever ............... 0,000,207
    at_http .................... 0,056,920
    at_last .................... 0,474,990
    at_least ................... 0,813,060
    basics_welcome ............. 0,000,005
    be_paved ................... 0,000,286
    believe_me ................. 0,045,296
    better_said ................ 0,000,328
    big_rich ................... 0,000,091
    billion_people ............. 0,001,440
    broken_and ................. 0,012,883
    but_the .................... 1,651,981
    but_this ................... 0,300,958
    by_some .................... 0,145,557
    by_traversing .............. 0,000,459
    center_cluster ............. 0,000,030
    checker_one ................ 0,000,003
    classic_song ............... 0,000,027
    cluster_having ............. 0,000,016
    com_downloads .............. 0,001,878
    complex_at ................. 0,000,808
    complex_things ............. 0,000,292
    conceptions_are ............ 0,000,946
    core_is .................... 0,002,888
    dared_to ................... 0,036,908
    data_center ................ 0,010,922
    deep_water ................. 0,007,289
    diamond_not ................ 0,000,017
    do_transitions ............. 0,000,004
    don_t ...................... 2,521,353
    downloads_index ............ 0,000,057
    emphasize_that ............. 0,004,050
    english_deep ............... 0,000,001
    english_is ................. 0,005,444
    enter_this ................. 0,005,529
    exploring_the .............. 0,011,097
    feel_free .................. 0,015,551
    feel_the ................... 0,069,388
    figure_out ................. 0,028,661
    finest_strings ............. 0,000,004
    follow_the ................. 0,109,596
    for_all .................... 0,435,338
    for_me ..................... 0,409,489
    free_and ................... 0,039,828
    free_on .................... 0,006,985
    free_to .................... 0,065,917
    from_gram .................. 0,000,219
    full_meaning ............... 0,002,540
    fundamental_are ............ 0,000,032
    get_how .................... 0,000,096
    glad_i ..................... 0,012,858
    golden_rule ................ 0,003,688
    gram_all ................... 0,000,004
    gram_let ................... 0,000,001
    grams_for .................. 0,000,167
    having_not ................. 0,002,642
    helps_me ................... 0,001,961
    how_important .............. 0,006,141
    how_it ..................... 0,135,087
    how_to ..................... 0,675,794
    i_follow ................... 0,006,966
    i_must ..................... 0,349,202
    i_see ...................... 0,245,025
    i_stick .................... 0,001,332
    i_wonder ................... 0,076,102
    i_would .................... 0,524,124
    implemented_yet ............ 0,000,219
    important_no ............... 0,000,085
    in_exploring ............... 0,002,022
    in_mathematics ............. 0,007,647
    in_order ................... 0,667,182
    interested_in .............. 0,148,777
    interests_you .............. 0,001,517
    is_broken .................. 0,020,062
    is_my ...................... 0,173,353
    is_possible ................ 0,160,823
    it_is ...................... 6,850,417
    it_simple .................. 0,002,551
    it_take .................... 0,009,024
    keep_it .................... 0,064,214
    kind_of .................... 0,563,256
    know_there ................. 0,013,646
    last_someone ............... 0,000,081
    least_for .................. 0,014,429
    let_say .................... 0,000,029
    like_a ..................... 0,981,190
    line_phrase ................ 0,000,006
    lot_but .................... 0,000,356
    lot_when ................... 0,000,644
    many_though ................ 0,000,178
    markov_s ................... 0,000,104
    mathematics_programming .... 0,000,002
    me_it ...................... 0,022,484
    meaning_n .................. 0,000,127
    meaning_that ............... 0,023,299
    more_complex ............... 0,043,985
    must_be .................... 1,258,275
    must_figure ................ 0,000,384
    my_advantage ............... 0,001,364
    my_core .................... 0,000,144
    my_dreams .................. 0,007,886
    my_english ................. 0,001,838
    my_finest .................. 0,000,240
    my_free .................... 0,001,459
    n_gram ..................... 0,000,477
    n_grams .................... 0,000,337
    neither_in ................. 0,009,841
    no_better .................. 0,042,076
    no_no ...................... 0,023,610
    no_solid ................... 0,001,315
    not_a ...................... 0,771,246
    not_golden ................. 0,000,150
    not_implemented ............ 0,003,183
    of_analyses ................ 0,000,738
    of_my ...................... 0,965,421
    on_line .................... 0,025,976
    on_my ...................... 0,256,709
    one_of ..................... 2,356,901
    only_an .................... 0,039,438
    open_sub ................... 0,000,018
    order_to ................... 0,527,624
    other_people ............... 0,086,478
    other_similar .............. 0,007,512
    out_some ................... 0,018,895
    paved_by ................... 0,000,122
    people_interested .......... 0,000,823
    people_using ............... 0,002,410
    playing_on ................. 0,007,589
    possible_big ............... 0,000,010
    powered_by ................. 0,006,378
    problems_are ............... 0,014,630
    programming_nor ............ 0,000,011
    rich_organizations ......... 0,000,002
    s_and ...................... 0,131,390
    said_fundamental ........... 0,000,004
    saying_this ................ 0,010,057
    scientist_to ............... 0,000,777
    see_you .................... 0,144,477
    simple_but ................. 0,004,667
    solid_background ........... 0,000,316
    some_approaches ............ 0,000,384
    some_data .................. 0,006,241
    someone_dared .............. 0,000,012
    song_named ................. 0,000,007
    stick_to ................... 0,023,675
    strings_i .................. 0,000,201
    stuff_first ................ 0,000,081
    sub_project ................ 0,000,053
    t_you ...................... 0,340,406
    take_not ................... 0,002,317
    that_at .................... 0,103,975
    that_markov ................ 0,000,011
    that_transition ............ 0,000,514
    the_basics ................. 0,016,494
    the_diamond ................ 0,014,836
    the_full ................... 0,160,394
    the_problems ............... 0,040,573
    the_theme .................. 0,018,891
    the_way .................... 0,859,514
    there_is ................... 2,465,476
    these_n .................... 0,000,352
    things_must ................ 0,006,472
    this_amazingly ............. 0,000,066
    this_in .................... 0,076,097
    this_way ................... 0,218,235
    though_after ............... 0,001,198
    to_ask ..................... 0,217,968
    to_do ...................... 1,507,886
    to_emphasize ............... 0,013,045
    to_enter ................... 0,155,262
    to_get ..................... 0,856,721
    to_more .................... 0,046,096
    to_my ...................... 0,511,518
    to_the ..................... 9,999,999
    to_your .................... 0,449,579
    too_complex ................ 0,002,959
    too_many ................... 0,061,664
    transition_to .............. 0,009,243
    transitions_from ........... 0,001,349
    traversing_the ............. 0,004,304
    using_english .............. 0,000,306
    walk_this .................. 0,000,930
    water_as ................... 0,009,663
    way_meaning ................ 0,000,051
    way_to ..................... 0,551,881
    welcome_to ................. 0,042,132
    whatever_interests ......... 0,000,083
    when_i ..................... 0,725,337
    with_no .................... 0,179,336
    with_other ................. 0,141,155
    wonder_how ................. 0,013,046
    would_like ................. 0,122,628
    x_grams .................... 0,000,032
    yet_a ...................... 0,033,118
    you_are .................... 1,943,806
    you_did .................... 0,100,879
    you_feel ................... 0,090,641
    you_know ................... 0,742,812
    you_think .................. 0,334,433
    your_next .................. 0,007,257
    your_questions ............. 0,006,720

    Note: to see aligned output use unproportional font as Courier.

    [XGq_uAJ3VSs]http://www.youtube.com/watch?v=XGq_uAJ3VSs&feature=related[/video]

    Back then in 2000 I had had a favorite song 'Crush' by Jennifer Page, from its lyrics I borrowed the phrase 'it doesn't take a scientist' and used an amplified variant 'it takes not a scientist' stupidly omitting the 's'.
    As you can see other dumb-dumbs like me have been polluted the corpus by using the wrong phrase 'it_take' 9,024 times, it is tricky how to make an effective screening of such bad entries.
    As far as I can see such bad "collocations" are result either of pure errors or of not using commas as delimiters in sentences as: 'If you are really into it[,] take off.' Or, mostly because of phrases like: 'why_did_it_take_us_so_long_to', 'how_long_would_it_take_you_to_isolate', 'it_and_make_it_take_care_of_scrolling', 'let_it_take_her_wherever_she_wished_to' ...
    Here however comes the reinforcement of (skipping order 3) 4-gram checking where the phrases 'it_take_not_a', 'me_it_take_not' along with 'believe_me_it_take' are already marked as unfamiliar i.e with zero occurrences. Thus I see the screening: no algorithms, no heuristics whatsoever - just smooth transition between orders (1 to 8 is enough generally):
    Again the first 156 4-grams (out of total 216) are unfamiliar whereas the bottom ones (60) are familiar (presumably correct):
    Apparently the 879,557,846 distinct 4-grams being used are far from needed to cover (i.e. to evaluate fully) a general text like my post (do the 8 billion x-grams mentioned before look as an exaggerated number now?):

    a_classic_song_named
    a_scientist_to_get
    advantage_helps_me_a
    ai_conceptions_are_too
    all_kind_of_analyses
    am_that_at_last
    amateur_with_no_solid
    amazingly_it_is_my
    and_open_sub_project
    and_other_similar_ai
    and_simple_but_this
    approaches_as_how_to
    are_playing_on_my
    are_these_n_grams
    are_too_complex_at
    are_too_many_though
    ask_whatever_interests_you
    at_last_someone_dared
    background_neither_in_mathematics
    basics_welcome_to_my
    be_paved_by_traversing
    believe_me_it_take
    better_said_fundamental_are
    big_rich_organizations_powered
    billion_people_using_english
    broken_and_simple_but
    but_this_amazingly_it
    by_some_data_center
    by_traversing_the_basics
    center_cluster_having_not
    checker_one_of_my
    classic_song_named_walk
    cluster_having_not_implemented
    complex_things_must_be
    conceptions_are_too_complex
    core_is_x_grams
    data_center_cluster_having
    deep_water_as_you
    diamond_not_golden_rule
    do_transitions_from_gram
    emphasize_that_markov_s
    english_deep_water_as
    english_is_broken_and
    exploring_the_basics_welcome
    figure_out_some_approaches
    finest_strings_i_would
    follow_the_diamond_not
    for_me_i_stick
    free_and_open_sub
    free_on_line_phrase
    from_gram_all_the
    full_meaning_n_gram
    fundamental_are_these_n
    get_how_important_no
    gram_all_the_way
    gram_let_say_gram
    grams_for_all_kind
    having_not_implemented_yet
    how_important_no_no
    how_to_do_transitions
    html_leprechaun_feel_free
    i_follow_the_diamond
    implemented_yet_a_free
    important_no_no_better
    in_exploring_the_basics
    in_mathematics_programming_nor
    is_a_classic_song
    is_broken_and_simple
    is_my_advantage_helps
    is_possible_big_rich
    it_is_my_advantage
    it_is_possible_big
    it_take_not_a
    last_someone_dared_to
    least_for_me_i
    leprechaun_feel_free_to
    line_phrase_checker_one
    lot_but_the_problems
    lot_when_i_must
    many_though_after_all
    markov_s_and_other
    mathematics_programming_nor_linguistics
    me_i_stick_to
    me_it_take_not
    meaning_n_gram_let
    meaning_that_transition_to
    more_complex_things_must
    must_be_paved_by
    my_advantage_helps_me
    my_english_is_broken
    my_finest_strings_i
    my_free_and_open
    n_gram_let_say
    n_grams_for_all
    named_walk_this_way
    neither_in_mathematics_programming
    no_better_said_fundamental
    no_no_better_said
    no_solid_background_neither
    not_a_scientist_to
    not_implemented_yet_a
    on_line_phrase_checker
    on_my_finest_strings
    only_an_amateur_with
    organizations_powered_by_some
    other_similar_ai_conceptions
    out_some_approaches_as
    paved_by_traversing_the
    people_interested_in_exploring
    people_using_english_deep
    phrase_checker_one_of
    playing_on_my_finest
    possible_big_rich_organizations
    powered_by_some_data
    problems_are_too_many
    rich_organizations_powered_by
    said_fundamental_are_these
    scientist_to_get_how
    see_you_feel_the
    similar_ai_conceptions_are
    simple_but_this_amazingly
    solid_background_neither_in
    some_approaches_as_how
    some_data_center_cluster
    someone_dared_to_enter
    song_named_walk_this
    strings_i_would_like
    take_not_a_scientist
    that_markov_s_and
    that_transition_to_more
    the_basics_stuff_first
    the_diamond_not_golden
    the_full_meaning_n
    these_n_grams_for
    things_must_be_paved
    this_amazingly_it_is
    this_way_meaning_that
    to_ask_whatever_interests
    to_do_transitions_from
    to_emphasize_that_markov
    to_get_how_important
    to_my_free_and
    to_your_next_wow
    too_complex_at_least
    too_many_though_after
    transitions_from_gram_all
    traversing_the_basics_stuff
    using_english_deep_water
    walk_this_way_meaning
    way_meaning_that_transition
    welcome_to_my_free
    when_i_must_figure
    with_no_solid_background
    would_like_a_lot
    yet_a_free_on
    you_feel_the_theme
    a_free_on_line ............... 0,000,005
    a_lot_but_the ................ 0,000,007
    after_all_i_am ............... 0,000,147
    all_i_am_only ................ 0,000,023
    all_the_way_to ............... 0,012,417
    along_with_other_people ...... 0,000,050
    am_only_an_amateur ........... 0,000,021
    am_saying_this_in ............ 0,000,001
    an_amateur_with_no ........... 0,000,002
    and_believe_me_it ............ 0,000,026
    and_i_follow_the ............. 0,000,036
    are_along_with_other ......... 0,000,002
    as_to_your_next .............. 0,000,009
    as_you_know_there ............ 0,000,028
    at_least_for_me .............. 0,000,218
    but_the_problems_are ......... 0,000,025
    complex_at_least_for ......... 0,000,002
    dared_to_enter_this .......... 0,000,007
    don_t_you_think .............. 0,024,484
    feel_free_to_ask ............. 0,008,604
    for_all_kind_of .............. 0,000,059
    free_to_ask_whatever ......... 0,000,002
    glad_i_am_that ............... 0,000,455
    helps_me_a_lot ............... 0,000,011
    how_it_is_possible ........... 0,001,278
    i_am_saying_this ............. 0,000,337
    i_must_figure_out ............ 0,000,008
    i_see_you_feel ............... 0,000,026
    i_stick_to_the ............... 0,000,100
    i_wonder_how_it .............. 0,000,467
    i_would_like_a ............... 0,000,724
    in_order_to_emphasize ........ 0,000,370
    interested_in_exploring_the .. 0,000,112
    know_there_is_a .............. 0,000,994
    like_a_lot_but ............... 0,000,010
    me_a_lot_when ................ 0,000,004
    must_figure_out_some ......... 0,000,002
    one_of_my_dreams ............. 0,000,136
    order_to_emphasize_that ...... 0,000,033
    other_people_interested_in ... 0,000,016
    s_and_other_similar .......... 0,000,010
    saying_this_in_order ......... 0,000,001
    stick_to_the_basics .......... 0,000,021
    that_at_last_someone ......... 0,000,006
    the_basics_welcome_to ........ 0,000,002
    the_problems_are_too ......... 0,000,016
    the_way_to_the ............... 0,020,412
    there_is_a_classic ........... 0,000,037
    this_in_order_to ............. 0,000,879
    though_after_all_i ........... 0,000,019
    to_more_complex_things ....... 0,000,005
    to_the_full_meaning .......... 0,000,046
    transition_to_more_complex ... 0,000,002
    water_as_you_did ............. 0,000,006
    way_to_the_full .............. 0,000,035
    with_other_people_interested . 0,000,003
    wonder_how_it_is ............. 0,000,222
    you_are_along_with ........... 0,000,003
    you_are_playing_on ........... 0,000,095
    you_know_there_is ............ 0,001,117

    I am like a talking (not smiling) cat throwing a toy from paw-to-paw while thoughts running free. I mean my mode of approaching things is not go-for-it type but rather play-with-it. In other words I do not strive after my dreams, here, I do not want to learn English language but to give a powerful and simple brute-force sidekick tool (as first stage of analysis) while playing with it.

    There are so many facets (blinking/screaming to be examined) remaining...
    Get down get down get down get it on show love and give it up
    What are you waiting on?
     
  5. Sanmayce's Avatar

    Sanmayce said:

    Default

    Caramba, I am doomed to commit/track/find all errors in existence, but this of course comes in handy, AGAIN, just to explain another important facet, namely pseudo-syntax-checking coming as a by-product of long x-grams.
    The dumb 'been' mistake follows:
    'As you can see other dumb-dumbs like me have BEEN polluted the corpus...'
    Of course 'have_been_polluted_the' yields no matches as well as 'been_polluted_the'.

    And to illustrate the same idea but this time by replacing/mistaking the correct 'had' with 'have':
    Correct one: 'Back then in 2000 I had had a favorite...'
    Incorrect one: 'Back then in 2000 I have had a favorite...'
    One of useful properties of x-grams is the auto-omission of '2000' not being a literal sequence but digital.
    The x-gram of order 6 'back_then_in_i_have_had' by not yielding a match suggests the incorrectness of 'have had' construction (due to usage of 'back then').
    And to be able to handle other scenarios like "60's" or "19th century" used in place of "2000" dictates the need of using WILDCARDS - but this hurts performance.
    So let us expand the MISTAKEN sub-sentences with some in-between x-grams in addition to 's' and 'th_century' (they are: 'the_s', 'late_s', 'the_late_s'):
    'Back then in 2000 I HAVE had a favorite...'
    'Back then in 60's I HAVE had a favorite...'
    'Back then in the 60's I HAVE had a favorite...'
    'Back then in late 60's I HAVE had a favorite...'
    'Back then in the late 60's I HAVE had a favorite...'

    'Back then in 19th century I HAVE had a favorite...'

    Here arises a very powerful feature/facet not examined yet: a branch of phrase-checking - phrase-suggesting.
    The point is: to write the sure part(s) and to get/receive some suggestions as feedback for in-between x-grams not only for preceding and following x-grams (the latter is already partially implemented in big search-sites).
    Let the sure part(s) are:
    1] Back then in
    2] I HAVE had a favorite
    No suggestions due to the wrong tense.
    Let the sure part(s) are:
    1] Back then in
    2] I had had a favorite
    Some possible suggestions (i.e. in-between x-grams):
    s
    the_s
    late_s
    the_late_s
    th_century


    How does a non-native English user know which one is possible/plausible? That is, is the definitive article necessary?
    Of course there are ways to achieve such functionality, but I need the fastest-and-simplest, any ideas?
    Get down get down get down get it on show love and give it up
    What are you waiting on?